Berkeley Lab and ESnet Paper Cites Globus for Accelerating Climate Data Movement
In a recent paper entitled “An Assessment of Data Transfer Performance for Large‐Scale Climate Data Analysis and Recommendations for the Data Infrastructure for CMIP6,” experts from Lawrence Berkeley National Laboratory (Berkeley Lab) and ESnet (the Energy Sciences Network, (http://www.es.net/)) document the data transfer workflow, data performance, and other aspects of transferring approximately 56 terabytes of climate model output data for further analysis.
The data, required for tracking and characterizing extratropical storms, needed to be moved from the distributed Coupled Model Intercomparison Project (CMIP5) archive to the National Energy Research Supercomputing Center (NERSC) at Berkeley Lab.
The authors found that there is significant room for improvement in the data transfer capabilities currently in place for CMIP5, both in terms of workflow mechanics and in data transfer performance. In particular, the paper notes that performance improvements of at least an order of magnitude are within technical reach using current best practices.
To illustrate this, the authors used Globus to transfer the same raw data set between NERSC and Argonne Leadership Computing Facility (ALCF) at Argonne National Lab. The performance achieved was far superior -- orders of magnitude higher in fact -- and required no manual intervention for the transfer to successfully complete.
These results indicate the need for Globus, a high-performant managed transfer solution, to be more widely available in support of the larger research goals of projects like CMIP6, since data nodes will be serving much larger data sets that in previous projects.
- “Using Globus, the entire 56TB data set was transferred from NERSC to ALCF in about 48 hours with a minimal commitment of human time.”
- “Climate scientists interested in extreme weather need tools like Globus so they can quickly and easily move the massive data sets generated by climate models."
- "Without the high performance data movement provided by Globus, researchers are hindered in their ability to collaborate and to make progress apace.”
- “Performance improvements of at least an order of magnitude can be gained by incorporating current best practices such as Globus.”
- “In the move from NERSC to ALCF, each of the two directory trees (one for the historical experiment and one for the rcp85 experiment) were transferred using a single Globus transfer request each.”
- “This performance difference is stark, and is an indication of what could be achieved through infrastructure investments at the major ESGF data centers.”
To access the paper, visit https://arxiv.org/abs/1709.09575.
Globus was happy to be a part of this study, and we look forward with continuing to work with ESGF on making Globus transfer and sharing more prevalent amongst the ESGF sites.