At GlobusWorld 2015 in we announced that we are continuing to develop data publication and discovery capabilities to the Globus service. These features will make it simple to ensure that data are identified, described, curated, verifiable, accessible, and preserved at the appropriate levels of durability. Equally importantly, they will enable rich discovery by making it possible to search, browse, and access large published data sets, irrespective of where they may be stored.

Globus Data Publication Features

  • Publish large research data on your own storage
  • Use standard metadata and curation workflows
  • Publish to public and restricted collections

How it Works

Globus publishing capabilities are delivered through a hosted service. Metadata is stored in the cloud, while published data is stored on campus, institutional, and group resources that are managed and operated by external administrators. To associate a storage resource with a data collection, administrators configure Globus Connect Server for sharing and then associate the endpoint with the data collection through Globus.

Published datasets are organized by "communities" and their member "collections". For example, the Argonne National Laboratory community has several member collections: Advanced Photon Source, Center for Nanoscale Materials, Computing, Environment and Life Sciences, to name a few. Often, collections will map to a department or group within an institution, but this is not required. Globus users can create and manage their own communities and collections through the data publication service. A Collection enables the submission of datasets with policies regarding access.

A dataset comprises data and metadata. Policies can be set on communities or collections to manage:

  • Metadata (schema, requirements)
  • Access control (user and group based)
  • Curation workflow
  • Submission and distribution license
  • Storage

Datasets undergo curation based on a workflow defined by the community that will publish the data. Workflows may be customized by each community to capture their specific metadata and to reflect the community's review process. After the dataset is published, it is discoverable using faceted search that allows the researcher to progressively filter results and rapidly narrow in on the data of interest. The data may then be transferred to a Globus endpoint where the investigator can inspect and further process the data.

We demonstrated an evolving version of data publication and discovery functionality during Ian Foster's keynote at GlobusWorld 2015.