Accelerating Research at UCSD with Globus
Globus provides researchers, system administrators and developers alike with many data management benefits. Several years ago, the San Diego Supercomputer Center (SDSC) recognized this as they stood up their high-performance cluster and needed to transfer and share large amounts of data. Since the initial trial and adoption of the research data management service, the usage at UC San Diego across multiple departments and disciplines has grown. With a high assurance subscription in place, researchers are now able to reliably transfer and share large data sets, and even share protected data with ease.
Even researchers on the same campus face data management challenges trying to share data and collaborate. Differences in identity management, networking, or the underlying storage technology can create situations where data can’t flow between groups. With the increase in Globus managed endpoints, researchers at UC San Diego can easily transfer and share data both internally and with researchers outside UC San Diego.
The Globus collection HTTPS capability, especially the ability to allow public (anonymous) access, has been critical in addressing researchers’ concerns about technology “lock-in”. Knowing that data can be made available via only a browser lets researchers know that their collaborators or users will be able to securely access data without installing additional software.
The Mesirov Lab at UC San Diego is one group who added Globus services into their portal. Prior to integrating Globus it was difficult to transfer large datasets to and from the cloud-hosted GenePattern server due to transient errors that oftentimes occurred through the web browser. Users also had to have a certain level of technical expertise to access the hundreds of available genomics analysis tools. Now that Globus is integrated into the GenePattern portal, researchers can use a point and click interface to easily log in, access, transfer large datasets, perform their analysis, and share the results.
Another UC San Diego research group, the Yeo Lab, which focuses on neural RNA binding proteins, RNA processing and single-cell analyses, wanted to build a cloud-based application for data sharing, data harmonization, and data processing and analysis across scRNA-Sequencing, Imaging, Electrophysiology and Proteomics. In lieu of building everything themselves, their goal was to use microservices like Globus, that provides services beyond the scope of their team. So, they integrated Globus into the Cell Reprogramming Database (CReD) portal to enable transfer and sharing of data.
Sherlock, a secure enclave at the San Diego Supercomputer Center on the UC San Diego campus, incorporated Globus into their platform to provide a secure data ingress and egress mechanism for their users. Sherlock also uses the Globus AWS S3 connector as part of their data flows between on-premise and cloud systems. They rely on the features of UC San Diego’s Globus high assurance subscription to enforce the necessary compliance requirements.
Other departments at UC San Diego are also exploring ways in which Globus can increase operational efficiencies. For example, the Library’s Research Data Curation Program is testing Globus as a simpler way for their users to submit large datasets. Groups within Health Sciences are interested in cloud solutions and Globus is being evaluated to deliver data to external users of scientific instruments.
Rick Wagner, a member of the UC San Diego Research IT team, works with different research groups to integrate Globus managed endpoints into research workflows as quickly and effectively as possible. His goal is to build a frictionless data management environment, and make the groups as self-sufficient as possible, particularly for those who have little to no technical support. By leveraging Globus functionality to set up some policy guidelines and rules in lieu of rigid processes, he is doing just that - enabling researchers and accelerating data-driven research and discovery.
“Our goal is to make it as fast and easy as possible for researchers to start using the Globus features that come with our subscription,” said Rick. “The subscription is a fixed cost–the more we use it, the bigger the return on investment. Besides the foundational ways to promote adoption–driving awareness, training, providing support for UCSD-specific issues–our strategy also includes making the right folks across campus subscription managers and running centralized endpoints that extend the capabilities of existing on-premise storage resources.”
As research groups and IT professionals become familiar with Globus, UC San Diego adds individuals as subscription managers. The new subscription managers are given two basic rules:
Only add UC San Diego operated endpoints to the subscription; Don’t remove managed status (i.e., subscription coverage) from endpoints you’re not involved with. This enables groups to independently deploy managed endpoints for production, testing, or the one-off situations that always arise. It also increases the likelihood that another group on campus will have a subscription manager near them organizationally, and avoid the classic hunt for “the right person” or where to submit a request.
UC San Diego Research IT Services also operates a central endpoint that can mount storage volumes for research groups and map those volumes to dedicated Globus collections. This is provided for groups using SDSC’s Universal Scale Storage (USS) at no additional cost. The collections enable research groups to share data with both campus affiliates and external collaborators. By using the central endpoint, research groups gain access to their data via Globus Transfer and HTTPS, while avoiding the need to run extra services on their systems.
Want to learn more? Watch a replay of What is Globus as it was presented to the UC San Diego Research Computing and Data community.
Our goal is to make it as fast and easy as possible for researchers to start using the Globus features that come with our subscription. "
The subscription is a fixed cost–the more we use it, the bigger the return on investment. "
Besides the foundational ways to promote adoption–driving awareness, training, providing support for UCSD-specific issues–our strategy also includes making the right folks across campus subscription managers and running centralized endpoints that extend the capabilities of existing on-premise storage resources."