New AI Inference Service Now Ready for Science at Argonne
June 01, 2026 | HPCwire
By Alex Woodie
AI can help accelerate scientific discovery, but setting up and running a foundation model is not a simple task. Thanks to the work of the Argonne Leadership Computing Facility, scientists affiliated with DOE National Labs and the Genesis Mission can now tap into a new AI inference service running on ALCF supercomputers.
Dubbed the ALCF Inference Service, the new service enables DOE scientists to interact with dozens of open foundation models in batch and interactive modes. This includes Google’s Gemma series, Meta’s LLaMA models, and OpenAI’s GPT-OSS family, as well as domain-specific foundation models, computer vision models and in-house models developed at Argonne, like AuroraGPT.
“There are people doing science on it now,” Papka said. “What we really see is the scientific community building this into their workflows.”
The inference service will span roughly 35 AI models, with about 10 of them loaded at any given time. If a user requests a model that is not active there, they can make a request and the model will enter a queue to get spun up.
Dealing with a large number of concurrent users is no trivial task, but it’s something that the National Labs will need to address if it’s going to provide the sort of “dial tone” service for AI inference that the large commercial AI providers can deliver. The ALCF Inference Service is based on a 2025 paper by ANL scientists and University of Chicago professors on a product dubbed Federated Inference Resource Scheduling Toolkit, or FIRST.
FIRST consists of three main components, including an Inference Gateway API (based on OpenAI’s API) to process user requests; Globus Compute to execute tasks on HPC resources; and Model Serving Tools to efficiently perform the LLM inference. “The framework addresses the growing demand for private, secure, and scalable AI inference in scientific workflows, allowing researchers to generate billions of tokens daily on-premises without relying on commercial cloud infrastructure,” the paper’s authors write.
The AI inference service isn’t open just to ALCF users, but any users across 12 DOE Labs as part of the Genesis Mission projects, including the American Science Cloud (AmSc) and Transformational AI Models Consortium (ModCon), Papka said.
“What we’ve been slowly doing with DOE as part of the Genesis Mission is opening this up to the ModCon team, so researchers that are part of ModCon can say ‘We want to use this’ and then they just get added to a list where we’re leveraging Globus authentication,” Papka said. “They authenticate to their local Globus accounts, so they don’t necessarily need an ALCF account.”
This opens the door for researchers from Brookhaven National Lab, Pacific Northwest National Laboratory, and other labs to get access to the ALCF chat interface as well as the API, Papka said. When they register using their DOE accounts, they get tokens to use on the new inference service.
Those tokens are good for a length of time, which means that people don’t need to continually re-log-in to the system, Papka said. That’s a little different than how commercial AI services work, he said.
“We want to be very responsive to scientists. Their asks are likely to be different than what maybe the commercial vendors will be pursuing,” Papka said. “As an HPC facility, allowing for these long running tokens is something different and that’s how we’ve adapted to this new workflow.”