Automate your data management tasks to accelerate discovery
July 11, 2023 | Susan Tussy
Today’s researchers must automate some of their data management tasks as they utilize the newest instruments, many of which generate terabytes of data daily. Globus Flows does just that. It defines and automates simple as well as complex, scalable multi-step, human-in-the-loop data flows securely and reliably. At our recent GlobusWorld conference several Globus users described their implementation of the service, and how it benefits them.
Globus at the Advanced Photon Source (APS) at Argonne National Laboratory (ANL), a Department of Energy (DOE) User Facility
Globus Flows enables technique and instrument specific data processing workflows
Laurent Chapon, the Associate Laboratory Director for the Photon Sciences at the Argonne National Laboratory, gave one of the keynote addresses at the conference, where he described their upgrade at the lab, and how the upgrade is creating a “data deluge.”
The APS and the ANL is one DOE facility that houses the most powerful light sources in the world. Here the beamlines are in much demand, with researchers waiting as much as six months for time on these machines. Currently they are in the process of upgrading their facility which will increase the brightness by a factor of 500. The amount of data that is captured will increase from 67 PB/year to hundreds of PBs by 2025 as a result of the upgrade. New complex experiments are being conducted, and new techniques are employed to deal with the volume, velocity, variety and veracity of data being collected. With Globus the APS is able to provide researchers with real-time analysis of their data, and to automate and build custom data flows.
Globus is the glue connecting the APS to advanced computing resources. The APS is leveraging Globus as a computational fabric to enable advanced computing and data management.”
–Laurent Chapon, Argonne National Laboratory
Globus at the University of Calgary
Automating data sharing of medical images
David Deepwell at the University of Calgary was involved In a multi-year project involving clinics and hospitals across Canada in order to access expertise from multiple places to study a disease. The project involved data validation, data inventory, archiving data for backup, and data distribution. Globus Flows was deployed for data orchestration to enable secure, reliable data transfer and sharing, with automatic deletion of the data once it’s successfully transferred.
Globus at the Rosalind Franklin Institute
The Globus platform is leveraged throughout the data lifecycle
Silvia Ramos, a senior research software engineer at Rosalind Franklin Institute gave a talk on how Globus is used throughout the data lifecycle. The Rosalind Franklin Institute is a research institute that collaborates with industry in order to develop new techniques and instruments, which are delivered to user facilities. The institute works with biologists and studies images of the organ, tissue, and cells, to observe key events in disease and the effect of interventions. Here Globus is a key component of its research data lifecycle. With over 26 instruments running Globus Connect Personal, they use Globus Flows to determine which files have been transferred, and once transferred, to extract metadata from the files.
Globus at Rockefeller University
Globus Flows simplifies and automates the transfer and sharing of instrument data
Jason Banfelder, the director of HPC systems and applications gave a talk on how they are ramping up with Globus Flows. Rockefeller University is a biomedical research institute with 85 labs. At Rockefeller Globus is used to automate the transfer and sharing of the mountain of data that its scientific instruments are generating. For example, the instruments in its Cryo-EM facility generate 100TBs of raw data each month. With Globus Flows researchers are able to automate the reliable data transfer from ingestion over to their HPC cluster, and share the data across multiple labs. The bio-imaging facility, which houses light microscopes also generates lots of data, and this data must be shared with other institutions. Researchers deploy Globus to easily manage multi-institutional sharing.
With Globus guest collections sneaker net is no longer needed.”
–Jason Banfelder, Rockefeller University