The University of Pittsburgh together with the Pittsburgh Supercomputing Center are one of five funded components contributing to the infrastructure for the NIH-funded Human BioMolecular Atlas Program (HuBMAP), which relies heavily on Globus for software infrastructure to build a frictionless research platform.
HuBMAP is developing a framework to map the human body at single-cell resolution. The goal is to enable the development of an atlas of the human body with high-resolution 3D maps of human tissues and organs. This will lead to an improved understanding of the relationships between the organization of cells, tissues and biological functions; and will serve as a reference map to better understand human disease.
The HuBMAP consortium consists of 63 contributing institutions from around the world. Access to the data is through the HuBMAP portal. Currently the portal includes data from seven organs, and researchers can search by data type, organ, specimen type, status, creator, and affiliation. The majority of data is open access data and is to be used for research of human biology. The portal currently contains 10 terabytes of data, and is expected to grow to 10-15 petabytes over the course of the eight year project.
The project required the group to overcome a number of challenges, including the ability to provide multiple levels of data access and management to a broad, global community that would need to upload, transfer and share large amounts of data in a secure, fast and reliable manner. The Globus platform enabled the group to overcome these challenges using the Globus Auth, Transfer and Sharing services. The high assurance option that Globus offers enabled the group to comply with HIPAA and NIST standards for the protected data.
The project actually collects three types of data. There is raw data being collected off the scientific instruments, metadata describing the attributes of the sample, and analyzed data. All the raw and analyzed data is ingested, stored and transferred through Globus. The researchers access the portal, send the data through Globus, and there is a link in the portal to a Globus collection where they can retrieve the data. The portal also has search capabilities to enable researchers to locate data from a specific region or a particular organ.
Anyone who logs into the portal to ingest data does so through Globus Auth, a federated identity and access management service. Globus Auth also provides the fine-grained level of access control within the portal that is required by HuBMAP. There are three levels of data access: the most secure level is for protected data which must adhere to the HIPAA guidelines; the consortium level is for data that has not yet been published, but must still go through a quality assurance process; and the public level is for data which is published and freely accessible by anyone. Two Globus endpoints were set up to implement the required access controls—one as a consortium endpoint using Globus High Assurance features and the other as a public endpoint for public data. The portal has a link to the public data and anyone can log in with any credential from an identity provider supported by Globus Auth.