Globus: The Invisible Infrastructure Behind Global Research Data Exchange

January 27, 2026 | vmblog.com
By David Marshall

At the 66th edition of the IT Press Tour in Palo Alto, Rachana Ananthakrishnan, Executive Director of Globus at the University of Chicago, walked a room full of journalists through something most data center professionals probably have never heard of-yet, something that’s become absolutely foundational to modern research. Her message was simple but profound: for the past three decades, Globus has quietly grown into the backbone connecting research institutions, national laboratories, and supercomputing facilities worldwide. And here’s the thing nobody talks about: there’s no real competition for what it does.

That’s not hyperbole. It’s just the way the research infrastructure market actually works.

The Problem That Started It All

Let me set the stage for why Globus exists in the first place. Modern research isn’t conducted in isolation anymore. A geneticist at Stanford collaborates with colleagues at Oxford and Tokyo. Climate scientists share simulation data across continents. Physics researchers at CERN pool computational resources from dozens of facilities. That’s the reality of contemporary science-it’s distributed, international, and absolutely dependent on moving and sharing massive amounts of data across institutional boundaries.

But here’s where it gets messy. Each institution has its own security policies. Different storage systems. Different authentication systems. Different compliance requirements. A researcher at one university can’t just create an account at another institution’s data center. There’s no simple way to share sensitive research data without recreating infrastructure at each location. And when you’re talking about petabytes of climate data or real-time streams from scientific instruments, traditional file-sharing solutions completely fall apart.

This wasn’t a new problem in 2024-it’s been a challenge since the late 1990s, when researchers first realized that well-provisioned networks made distributed computing possible. But solving it required something most organizations have never attempted: building a genuinely vendor-agnostic, security-aware platform that could sit on top of existing research infrastructure without replacing it.

That’s exactly what Globus did.

From Toolkit to Platform: Three Decades of Evolution

The Globus story starts in 1996, when distributed computing was still largely theoretical. Ian Foster, Carl Kesselman, and Steve Tuecke-all working in research institutions-began exploring whether you could treat networked computing resources the way we think about electricity grids: plug in, and it works because everyone agrees on the same interfaces.

By the early 2000s, they’d developed the Globus Toolkit, an open-source software that research institutions could download and run locally. It worked. Better than worked, actually. The grid computing approach became so influential that it played an instrumental role in three separate Nobel Prize-winning discoveries. The IPCC’s climate assessment used grid computing to distribute simulation outputs across institutions. CERN’s discovery of the Higgs boson relied on it. And LIGO’s detection of gravitational waves couldn’t have happened without researchers pooling computational resources across observatories using grid technologies.

But by the mid-2000s, Ananthakrishnan’s team noticed something important: the organizations benefiting most from Globus were the ones with dedicated IT staff. Large national labs. Well-funded university research centers. Institutions that could afford to download the toolkit, install it, and maintain it. The problem was, there were thousands of smaller research labs-brilliant scientists with limited resources-who couldn’t access any of this. They lacked the IT expertise to install distributed software. They didn’t have the infrastructure budget to manage it.

That realization triggered a major product pivot around 2009. Instead of shipping software that researchers had to install and manage themselves, Globus moved to a software-as-a-service model. Browser-based. Hosted. Managed. Designed for three audiences: researchers who needed simplicity, system administrators who needed policy controls, and developers who wanted an extensible platform they could build on.

The shift paid off immediately. By 2012, when Globus won an R&D 100 Award for the rebranded service, adoption had expanded dramatically. Users in 80+ countries. 2,100+ institutional identity providers connected. Over 500 storage systems integrated into the platform worldwide.

By 2024, Globus had become something most people in IT infrastructure have never heard of-and yet something that’s now standard infrastructure at research institutions globally.

The Architecture: Not Cloud, Not On-Premise-Something Different

Here’s where understanding Globus gets interesting. It’s not a typical SaaS. You don’t migrate your data to Globus servers. Your storage stays where it is-in your data center, on your cloud provider, wherever you’ve decided to keep it.

Instead, Globus provides something more subtle: a management and orchestration layer that sits between your institutional infrastructure and researchers who need to access it. The way Ananthakrishnan described it during the briefing is worth repeating: “We can’t take over institutions’ resources. We have to figure out how to add valuable services as a layer to that resource.”

Here’s how it actually works. Your institution runs what Globus calls “agents”-small software components deployed on your infrastructure. These agents know how to talk to your specific storage systems, your compute schedulers, your security policies. They’re equipped with plugins for different storage types (POSIX filesystems, object stores, tape archives, cloud storage). They speak different job scheduler languages (Slurm, Kubernetes, PBS).

When a researcher wants to move data or run a computation, they interact with Globus’s web interface or APIs. Globus’s central management service determines what’s needed-checking bandwidth availability, security requirements, institutional policies-then coordinates with the agents to actually perform the work. The researcher’s data doesn’t flow through Globus’s cloud infrastructure. Commands do. Metadata does. The actual data moves directly between your storage systems and wherever it needs to go, optimized for whatever network capacity you have available.

This matters more than it might sound. If your research network has 400-gigabit connections between sites, Globus knows how to use all 400 gigabits. If one endpoint only has 10-gigabit connectivity, Globus knows not to overwhelm it. It’s not magic-it’s heuristics-based optimization built on 30 years of research into how to move data reliably across heterogeneous networks.

And because your data never leaves your infrastructure, there’s a layer of protection against the kind of geopolitical complications that worry European research institutions collaborating with US-based systems. Your storage systems remain under your control. Your native interfaces remain functional. Globus becomes one more way to access your systems, not the only way.

What Globus Actually Does: Five Core Capabilities

Globus’s product isn’t a single tool-it’s a platform made up of several interconnected services. Understanding each one explains why organizations keep choosing it.

Managed Data Transfer: The Original Problem Solver

The oldest part of the Globus platform is also still the most-used: managed data transfer. Before Globus solved this, researchers had to establish direct connections between systems and manually manage the process. Did my file transfer successfully? Did I lose connection mid-transfer? Do I need to restart? These weren’t theoretical problems-they were daily frustrations.

Globus’s data transfer service works like this: a researcher identifies source and destination systems, logs in with their institutional credentials, and submits a transfer request. Then they walk away. Globus handles everything-establishing connections, monitoring progress, retrying if something fails, verifying data integrity via automatic checksums. It’s “fire and forget” at its core.

For massive transfers, this is genuinely life-changing. Globus recently moved 7.5 petabytes of climate data between three national laboratories. For the researchers involved, it was a single request. Months of actual transfer time became invisible complexity handled by a backend system.

The protocol underlying this is called GridFTP, a high-performance FTP variant that’s been published and peer-reviewed across three decades of research. It’s not proprietary. It’s not fancy. It’s just really good at what it does.

The second major capability is data sharing, and it solves a problem that every organization with sensitive data struggles with: how do you let collaborators at other institutions access your files without creating accounts for them, without replicating data, without complex VPN setups?

Globus’s approach is elegant. It creates what Ananthakrishnan called a “permission overlay” on top of your existing storage. You navigate to files you want to share, select specific collaborators (using their institutional email, GitHub account, or organization), set whatever permissions and time restrictions you need, and that’s it. The actual files never move. The collaborators authenticate using their own institutional credentials. They access the files through Globus. Globus maintains the permission layer between them and the storage system.

No account provisioning. No data staging areas. No local file system changes. The person who owns the files still owns them in your system-Globus just manages who can see them.

This became critical for the T2T Human Genome Consortium, an international effort to fill the final 4% of the human genome that wasn’t completely sequenced by the original Human Genome Project. Researchers across dozens of institutions needed to securely share genetic data. Rather than creating accounts and managing access at each site, they used Globus to grant permissions. It scaled to international collaboration with essentially no additional infrastructure burden on any participating institution.

Compute Orchestration: Python Anywhere

About four years ago, Globus introduced Globus Compute, which applies similar thinking to computational resources. The idea is to take the function-as-a-service model that cloud providers popularized and make it work in research environments where compute resources live in different institutions, with different schedulers, different security models.

Here’s how it works: a researcher writes Python code-just regular Python-describing the computation they want to run. They submit it to Globus Compute with a list of target resources where it can execute. Globus handles shipping the code to those resources, interfacing with whatever job scheduler is running there (Kubernetes, Slurm, PBS), executing the code, and returning results.

The researcher doesn’t change their code based on where it runs. They don’t need to learn Slurm syntax or Kubernetes YAML. They just say “run this function wherever you have cycles.”

During COVID research, this capability became particularly visible. ML-based drug screening projects needed to evaluate 4 billion molecules as quickly as possible. Rather than being tied to a single supercomputer, researchers could submit the same code to Argonne, to Texas, to NVIDIA GPU systems-wherever resources were available. Globus Compute handled all the complexity underneath.

Metadata & Data Discovery: Search for the Research Era

Globus Search provides hosted search infrastructure, but with features built specifically for research. It supports fine-grained visibility controls-important when datasets are embargoed or contain sensitive metadata. It handles dynamic schema detection, so researchers starting new projects don’t have to pre-define exactly what their data structure will look like. It’s built on AWS infrastructure but adds a research-aware layer on top.

The Canadian Federated Research Data Repository uses Globus Search to let researchers across the country discover published datasets, including support for geospatial search. It’s the kind of quiet infrastructure that enables entire research communities to function more efficiently.

Workflow Automation: Connecting Everything Together

The newest part of the Globus platform is workflow automation through Globus Flow. The concept is simple: many research projects involve multiple steps. Move data from an instrument. Process it. Run analysis. Check results. Move processed data to cloud storage. Publish metadata. Traditionally, researchers write scripts to automate this sequence.

Globus Flow lets researchers define these workflows declaratively using JSON, using a visual editor, or through command-line tools. The workflows are event-driven-they can trigger automatically when data arrives, or be manually initiated. They can incorporate compute steps, data transfer steps, calls to external services. Globus manages the reliability, error handling, and orchestration.

What’s interesting is how different organizations use this. Advanced Photon Source, a Department of Energy facility, uses Globus Flow to automatically process data coming off beamlines. A structure that used to take weeks to solve now takes hours because processing happens automatically as data is collected. But Franklin & Marshall, a small liberal arts college, found an entirely different use. A biology professor defined a workflow for processing field data. Instead of training volunteers on 10 separate steps, they now click a button in the Globus web app, answer a questionnaire, and the automated workflow handles the rest. It wasn’t about scale-it was about accessibility and bringing more people into research.

The Network Effect: Why There’s No Competition

Here’s the thing about Globus that makes it nearly impossible for competitors to emerge: it’s not just a product, it’s an ecosystem built on two decades of standardization and integration.

Globus is connected to 2,100+ institutional identity providers through research federation networks like InCommon (US), EduGain (Europe), and AARNET (Australia). Researchers can log in with their home institution credentials automatically, with no separate provisioning. That federated authentication layer is something individual vendors can’t replicate-it requires buy-in from thousands of institutions across multiple countries.

The platform also has 500+ storage systems integrated worldwide. When a researcher wants to move data between systems, both are already plugged into Globus. Most organizations trying to build something similar would be starting from zero on integration work.

There are point solutions in each category. Vendors offer specialized data transfer tools, compute schedulers, or metadata search engines. But nobody else has built an end-to-end platform that sits on top of distributed institutional infrastructure, handles security across multiple domains, and provides all these capabilities in one place.

The Compliance Story: An Emerging Growth Market

One of the more revealing parts of Ananthakrishnan’s briefing was about how Globus is expanding into healthcare and research institutions dealing with protected data. HIPAA-regulated data (PHI), personally identifiable information (PII), controlled unclassified information (CUI)-all of it requires special handling.

Globus can manage protected data. It complies with NIST 800-53 and 800-171 controls, which map to GDPR and other international privacy standards. Over 55 US institutions have Business Associate Agreements with University of Chicago to use Globus for HIPAA-regulated research. That’s a relatively small number today, but it’s growing. Hospital systems and academic medical centers increasingly need tools for collaborating across institutions on sensitive research-and Globus is becoming the go-to platform.

This is where new revenue comes from. Basic data transfer is free for nonprofit research. But compliance-related subscriptions, advanced features for administrators, and dedicated support are all part of the paid tier.

The Business Model: Freemium with Research Budgets

Understanding how Globus makes money actually reveals something interesting about the research market. It’s not like SaaS for enterprises, where you charge per user or per gigabyte transferred.

Globus uses what they call a freemium model. Basic features are free for nonprofit research institutions anywhere in the world. You can move data. You can access shared files. Basic level of security and compliance. As long as you’re doing academic research, it costs nothing.

Paid subscriptions unlock additional features: advanced data sharing controls, administrator visibility dashboards, priority support, and access to sensitive data management. The subscription is institution-wide and annual. Flat pricing, no overages. The price tier is determined by the institution’s research budget-not the IT operations budget, specifically research funding. That’s used as a proxy for how many researchers might use it and how heavily they’ll use the platform.

Commercial organizations (pharma, biotech, oil and gas companies) get a completely separate pricing model. No free tier. Premium pricing. It’s a different market segment.

As of the briefing, Globus had over 250 paid subscriber institutions worldwide. But there’s also massive free usage-millions of researchers moving data and sharing files without paying anything.

For University of Chicago, which operates Globus as a nonprofit auxiliary unit, the subscription revenue funds the platform team. There’s also a separate Globus Labs team conducting ongoing research, which produces new capabilities that eventually make their way into the product. It’s a sustainable model for research software-rare enough to be worth noting.

Why IT Infrastructure Leaders Should Pay Attention

If you work in research computing infrastructure-as a data center administrator, engineer, or IT leader at a university or national laboratory, Globus should already be on your radar. But if it’s not, here’s why it matters.

First, your institution is probably already using Globus in some way. Researchers may be leveraging it without IT even knowing. That’s both opportunity and risk. You could be managing agents and connectors as part of your infrastructure strategy, or you could be leaving capacity and security on the table.

Second, understanding Globus’s architecture tells you something about where institutional infrastructure is heading. The model isn’t about moving everything to cloud. It’s about providing services on top of distributed resources while maintaining institutional control. That’s increasingly how research computing works-and it’s probably how your organization needs to think about infrastructure anyway.

Third, there’s a real lesson in how Globus evolved. It started as point solution (the toolkit for moving data) and gradually expanded into a platform as it understood what researchers actually needed. That kind of evolutionary thinking-solving one problem really well, then expanding based on what you learn-is worth studying.

What’s Next: The Labs Are Already Exploring AI

During Q&A, Ananthakrishnan fielded questions about how Globus is preparing for what’s coming. AI-driven research is obviously a big part of that conversation. Globus Labs is actively investigating how agentic AI systems could use research infrastructure through the platform. The security fabric that connects everything would need to support autonomous agents making decisions about where to run computations and how to move data.

The team is also exploring how AI could augment documentation and support-though they’re being cautious about implementation, given compliance requirements. And there are ongoing investments in helping administrators understand their infrastructure better: storage system insights, usage patterns, capacity planning.

The broader investment areas are around integrated solutions. Rather than requiring researchers to build custom workflows from scratch, Globus is working on providing pre-built patterns. And there’s increasing focus on policy-driven automation-particularly for secure research environments where data has to pass through automated scanning and approval workflows before moving anywhere.

The Bigger Picture

What makes Globus interesting isn’t that it solved a problem-it’s that it solved a problem so specific to research that it became nearly impossible for commercial cloud providers to compete. AWS doesn’t understand research federations. Microsoft doesn’t care about GridFTP. Google isn’t managing institutional identity providers.

Globus exists in a gap between traditional enterprise IT infrastructure and what the cloud industry builds. It’s been working in that gap for 30 years, solving increasingly complex problems as research has become more collaborative and data-intensive.

The reason Ananthakrishnan was at the IT Press Tour talking to journalists is partly marketing-a small team spreading the word about capabilities people don’t know exist. But it’s also about recognizing that research infrastructure managers are now central to whether institutions can compete for research funding and talent. Helping those professionals understand what Globus does, what it can enable, and how to integrate it into their infrastructure planning is increasingly important.

If you’re managing research infrastructure, spend some time exploring what Globus offers. Chances are good your researchers are already using it. The question is whether you’re supporting it strategically or just letting it happen.

View the original article

Older
Argonne's upgraded X-ray source being linked with nation's top supercomputers to handle increased data flow