Q&A with 2019 Babbage Award Recipient
The award announcement cited your "outstanding contributions in the areas of parallel computing languages, algorithms, and technologies for scalable distributed applications.” Can you tell us the top 2-3 projects you think contributed to your win?
[Dr. Foster] The Grid computing work that I started in the 1990s with colleagues such as Carl Kesselman and Steve Tuecke, for sure: that played a big role in the creation of the Large Hadron Collider Computing Grid, for example, that was used to process the data that identified the Higgs boson. The data-sharing mechanisms implemented in the Globus service are another. They move astonishing amounts of data over 10 and 100 gigabit/second networks, enabling a remarkable amount of science. If I was to get poetic, I could talk about the pulsing heart of science, circulating the data that is the lifeblood of discovery.
Parallel computing is a term that (for some of us) connotes old-school / foundational computing approaches, a time when cycle-scavenging was cutting-edge. Why was that work important, and how is it relevant today?
[Dr. Foster] Parallelism used to be the concern of just a few high-performance computing enthusiasts. Now it is ubiquitous, to the extent that we perhaps don’t realize how much it underpins our lives. My iPhone contains a six core CPU, graphical processing unit, and neural engine. Whenever I watch an animated movie, buy products on Amazon, perform a Google search, or look at the weather forecast, I benefit from computations performed on enormous parallel computers. Science depends on parallelism, whether via computations performed on a multi-core workstation or a million-core supercomputer, or because data sits in parallel file systems and is transferred via parallel data movers like Globus. Parallelism is everywhere.
In your role at Argonne, you're exposed to some of the most interesting – and challenging – computing projects out there. What are a few of the biggest challenges you see facing data science and learning (and you can't just say the sheer volume of data!)?
[Dr. Foster] We are at an exciting time for science, as AI methods (that is, methods that allow computers to learn from data, rather than being programmed explicitly) become relevant to a growing number of research tasks. For computer scientists, these developments translate into a lot of work, because they introduce a variety of new challenges. Where do we get the data required to train science AI? How do we allow science AI to leverage the vast theoretical knowledge accumulated over hundreds of years? How do we organize science AI to permit productive human-AI partnerships? What computer hardware and software will be needed to harness, apply, and manage AI?
Speaking of computers, the news about Aurora, the first US exascale computer coming to Argonne in 2021, was very exciting. What are your hopes for what scientists will be able to achieve with this resource?
[Dr. Foster] Aurora is exciting because it won’t just provide a factor of 100 increase in computational speed: it will be a computer designed to support both traditional numerical simulation and AI applications. We see Aurora enabling new approaches to discovery across a broad range of disciplines, based on new approaches to research in which human ingenuity leverages both simulation and AI capabilities at the same time. We’re interested, for example, in active learning approaches in which AI models are used to guide the choice of the next simulation or the next experiment.
The field of data science has exploded and continues to do so. What advice do you have for aspiring engineers looking to break into this field?
[Dr. Foster] There are amazing educational resources available online. Use them. And then there’s no substitute for experience. Start using data science and AI methods.
In closing, what do you most want people to know about the work being done in your division at Argonne?
[Dr. Foster] The establishment of the Data Science and Learning division is a recognition of the central role of data science and AI methods in science. DSL scientists are applying these methods to important problems in areas like cancer, microbiome, and materials science, and also pioneering new technologies that will allow for the large-scale application of those methods across science as a whole. The "Learning" is key here—our focus is not just on the practice of data science methods such as neural networking and data-intensive computing, but also on their application to learn or achieve something new. Data science isn't much good if we don't learn something in the end. We also believe in the power of collaboration to drive breakthroughs in both human and machine learning, so we take an interdisciplinary approach by working in teams that integrate mathematics, computer science, advanced architectures, and core domain science to solve problems. Our division is just over a year old; I'm proud of the work we've accomplished and look forward to seeing what our outstanding scientists will accomplish this year.
Many thanks Ian, and we look forward to seeing what awe-inspiring projects you’ll continue to direct in the future!