Data without Borders: the role of Globus in international genome research

May 14, 2024  | AARNet News

The Biomolecular Resource Facility Laboratory at ANU
The Biomolecular Resource Facility Laboratory at ANU. Credit: Tim Levy from the ACRF.

The Biomolecular Resource Facility (BRF) is located in Canberra within the John Curtin School of Medical Research at the Australian National University (ANU). As a core laboratory providing research projects with access to state-of-the-art techniques and equipment for molecular, genetic and protein-based studies, BRF provides consultancy and services for researchers at ANU and the broader Australian scientific community. The facility also works with both national and international collaborators and needed a fast and reliable solution for transferring the huge amounts of data generated by their powerful scientific instruments.

Genome sequencing

Recently, Dr Carolina Correa-Ospina, Technologies Specialist at BRF, collaborated with scientists at Plant & Food Research at the University of Otago, New Zealand on a project focused on studying a variety of species of agricultural importance. The projects aimed to improve agricultural practices, including growing, fishing and harvesting processes. Dr Correa-Ospina’s team sequenced the genomes of the species they were studying using the Oxford Nanopore PromethION platform, a tool for long-read DNA and RNA sequencing that can produce up to 1.6 TB of data per run. The challenge they then faced was how to efficiently and safely transfer this vast amount of data overseas, and without any risk of data loss.

Large data transfer with Globus

As the operator of Australia’s national research and education network dedicated to moving research data, AARNet was well placed to solve this data movement challenge. After working closely with Carolina’s team to understand the requirements, AARNet recommended the Globus data management tool as a solution. Globus allows researchers to transfer research data efficiently and securely between systems anywhere in the world, making it easy to collaborate and share large-scale datasets across organisational boundaries.

As Globus’ partner for universities and research institutes in Australia, AARNet also provided assistance with implementing the Globus solution. BRF didn’t have the capacity to store all the data on-site, so AARNet worked with the National Computational Infrastructure (NCI) at ANU to establish a Globus endpoint, which is a network location, and storage. NCI is Australia’s leading organisation for high-performance computing (HPC) and data storage services. NCI collaborates with the Australian Government and the research sector to provide resources and services that individual institutions may otherwise be unable to access.

As the Globus service requires another endpoint on the receiving end, AARNet was able to leverage the existing Globus endpoint and storage at the University of Otago to facilitate the transfer from BRF.

Data received within hours

Carolina explained that data is transferred from BRF instruments to NCI’s Gadi supercomputer, where it is processed and stored for sharing. “When data collection is ready, we can simply log in to the user interface and share the data with our collaborator in minutes,” said Carolina. “It has tremendously sped up the data delivery, as there is no extra copying required on our end and we don’t need to spend extra time and resources to compress the data into smaller files prior to transfer.”

“It has tremendously sped up the data delivery, as there is no extra copying required on our end and we don’t need to spend extra time and resources to compress the data into smaller files prior to transfer.”

Dr. Carolina Correa-Ospina, Technologies Specialist
Biomolecular Resource Facility (BRF)

For previous projects, the technical specialists at BRF had created their own in-house methods for transferring large amounts of data, but it was inefficient and difficult to maintain. For local projects, they would manually copy data onto hard drives and then hand deliver them directly to their collaborators, which was a slow and laborious process.

For a recent project, Carolina and her team were able to transfer over 3 TB of genomic data to Plant & Food Research in New Zealand in only three hours using Globus.

“Globus is extremely easy to use and incredibly fast,” Carolina commented. “Our collaborators very much like that they can receive their data within hours instead of days or weeks.”