New Year's Reflections from Globus Co-Founder Ian Foster
Happy New Year!
As 2017 fades into memory, I thought it might be interesting to reflect a bit on where we’ve been and where we plan to be this year. Here are some things I’ve observed recently and a few that I plan to keep an eye on.
In 2017 we saw machine learning recognized as an effective method for scientific discovery, whether for finding lensing events in digital sky surveys, labeling crystallographic images, or predicting new materials with useful properties. For example, at my home institution, Argonne National Laboratory, we saw the creation of a new Data Science and Learning division (I’ll be directing it) and a planned new supercomputer, Aurora, designed to support data science as well as simulation. Of course, these developments will depend on ever more sophisticated methods for moving and managing data.
Globus grew its suite of platform services in 2017, broadening and strengthening its support of data-driven science services and applications. We worked with many groups to integrate Globus Auth and Transfer into their applications and we released the Modern Research Data Portal, a fully functional design pattern for data-intensive science. The University of Chicago’s Materials Data Facility (MDF) leveraged the Globus platform to assemble a unique collection of materials data, including contributions from more than 100 data archives worldwide. (Now the MDF team is incorporating machine learning capabilities.) The latest addition to the Globus platform, Globus search, was released late last year and is already a key part of the MDF infrastructure as well as the Canadian Federated Research Data Repository. I also want to mention that Globus was featured heavily in the new book, “Cloud Computing for Science and Engineering,” that Dennis Gannon and I co-authored, available online at cloud4scieng.org.
Looking ahead to the coming year, it will be interesting to watch as data rapidly and more often comes alive. We’ll take better care of our data--even exhuming some of our data graveyards--by indexing, annotating, and creating service interfaces, making it both accessible and computable in ways that allow many people to add value and build on the results of other contributors. The new National Institutes of Health “Data Commons Pilot Project” is pioneering important elements of this future “living data” world. Globus is at the center of that effort, too.
To ensure that we make the most of this opportunity in 2018, Globus is completing its goal of becoming suitable for use with protected data in HIPAA regulated environments. Another new feature that should have a big impact: Globus Connect Servers will speak HTTPS as well as GridFTP, enabling exciting new research automation use cases. On the storage front, you can already use Globus to access data in nearly any traditional file system found in research computer centers. We also support object stores, including Google Drive, Amazon S3, Spectra, and ActiveScale, and we’ll continue adding to this set. And on the research integration and user interface fronts, we’re integrating ever more closely with Jupyter, as part of a planned evolution of Globus from a platform for managing data to a global research desktop that will allow you to locate, access, and manipulate data, regardless of location.
With so much happening in the coming year, we need to stay in touch with each other and share our experiences and observations. One great place to do that will be our annual GlobusWorld conference, April 25–26, 2018 in Chicago, IL. I look forward to seeing you there.