Today the Globus research data management service announced the largest single file transfer in its history: a team led by Argonne National Laboratory scientists moved 2.9 petabytes of data as part of a research project involving three of the largest cosmological simulations to date.
"Storage is in general a very large problem in our community — the Universe is just very big, so our work can often generate a lot of data,” explained Katrin Heitmann, Argonne physicist and computational scientist and an Oak Ridge National Laboratory Leadership Computing Facility (OLCF) Early Science user. “Using Globus to easily move the data around between different storage solutions and institutions for analysis is essential.”
The data in question was stored on the Summit supercomputer at OLCF, currently the world’s fastest supercomputer according to the Top500 list published June 18, 2019. Globus was used to move the files from disk to tape, a key use case for researchers.
"Due to its uniqueness, the data is very precious and the analysis will take time,” said Dr. Heitmann. “The first step after the simulations were finished was to make a backup copy of the data to HPSS, so we can move the data back and forth between disk and tape and thus carry out the analysis in steps. We use Globus for this work due to its speed, reliability, and ease of use.”
"With exascale imminent, AI on the rise, HPC systems proliferating, and research teams more distributed than ever, fast, secure, reliable data movement and management are now more important than ever,” said Ian Foster, Globus co-founder and director of Argonne’s Data Science and Learning Division. “We tend to take these functions for granted, and yet modern collaborative research would not be possible without them.”
"Globus has underpinned groundbreaking research for decades," added Foster. "We could not be prouder of our role in helping scientists do their world-changing work, and we’re happy to see projects like this one continue to push the boundaries of what Globus can achieve. Congratulations to Dr. Heitmann and team!”
When it comes to data transfer performance, “the most important part is reliability,” says Dr. Heitmann. “It is basically impossible for me as a user to check the very large amounts of data upon arrival after a transfer has finished. The analysis of the data often uses a subset of the data, so it would take quite a while until bad data would be discovered and at that point we might not have the data anymore at the source. So the reliability aspects of Globus are key.”
"Of course, speed is also important. If the transfers were very slow, given the amount of data we transfer, we would have had a problem. So it’s good to be able to rely on Globus for fast data movement as well. We are also grateful to Oak Ridge for access to Summit and for their excellent setup of data transfer nodes enabling the use of Globus for HPSS transfers. This work would not have been possible otherwise.”
For details about this project’s use of Globus, read the Q&A blog with Dr. Heitmann.
See original story on insideHPC.com: https://insidehpc.com/2019/07/argonne-team-breaks-record-with-2-9-petabytes-globus-data-transfer/