In “Anatomy of the Grid," published in 2001, Ian Foster, Carl Kesselman, and Steve Tuecke posited that the central problem in Grid computing is “coordinated resource sharing and problem solving in dynamic, multi-institutional virtual organizations.” 10 years later, the virtual organization problem is still a very much at the heart of multi-institutional scientific collaboration.
Through their parent organizations, scientists are often beholden to particular sets of tools and authentication mechanisms that present barriers for sharing data and knowledge. One university may be using Confluence as a wiki, configured to authenticate users with a local campus LDAP server, while another university may be using a wiki plugin for Drupal, configured to use an intra-campus InCommon provider for authentication. Sharing documents across those two wikis will always require out-of-band communication between users via email or through some third-party application, such as Google Docs. That potentially means that information that ought to be managed in one system may end up being replicated in several less secure systems, there will be no reliable way to audit access to the information and it will make it harder to track changes to that information. Another side effect is that there may be less incentive for scientists to keep using the systems their IT departments have invested considerable time and effort in deploying and operatin
How to Approach the Problem?
There are several ways we can approach this problem. We could try to convince all universities to adopt the same sets of tools, but that approach would be highly unfeasible for a number of obvious reasons. Another approach would be to see if the tools universities have adopted could inter-operate in some manner. For instance, both Confluence and Drupal have support for plugging in external authentication services. We might convince the universities in our example above to configure their wikis to allow scientists to authenticate with their Google email accounts via OpenID. This latter approach happens to be something we are exploring at Globus Online. We are developing support for several standard authentication protocols such that Globus Online can act as an identity provider via Open ID, oAuth, SAML and Shibboleth. However, taking this action alone is not sufficient. For, in addition to establishing trust as an identity provider, there are still a number of security, technical and socio-political hurdles we will need to overcome to enable scientists to more easily share resources across institutional boundaries. University A would still need to create accounts on its wiki for specific members of University B and would need to be sure appropriate permissions are in place to prevent unwanted access to pages by University B members, and so forth.
Looking Through the Policy Lens
To build a robust, long-term strategy requires us to revisit the virtual organization problem, which ultimately defines collaborative computing as a matter of policy. Looking through the policy lens allows us to ask under what circumstances would a research group at University A be willing and able to share access to its managed resources with a research group at University B? We would need to be cognizant of the needs of University A as well, so we should also ask under what conditions would University A be willing to allow non-members access to its resources? Moreover, we’d need to recognize the needs of University B and its members. We’d need to understand whether University B places restrictions on its members and their use of resources. Finally, of course, the resources in question may have specific policies associated with them. For example, if we are talking about a wiki page, that wiki page may contain sensitive information about an upcoming proposal or research results that could be misused by competing researchers. Or perhaps we are talking about medical data pertaining to a sampling of patients, in which case access control may be governed under the Health Insurance Portability and Accountability Act (HIPAA), auditing mechanisms may need to be supported, etc. Thus, a key part of the virtual organization approach is to build a system that allows us to accurately model the world of our users, the institutes to which they belong, the resources they want to share, the tools they are using and the policies that need to be enforced.
Can Cloud Help?
Another key to the solution will be to build a system that enables policy management in a coordinated fashion. This is where the Cloud computing paradigm compliments the virtual organization model. In the past, and I would call this an example of the Grid computing paradigm, we might have tried building a federated solution that required each participating institute to host its own business logic for publishing resources and enforcing policies. A key premise to the Grid paradigm was that each resource provider should maintain autonomy over its own resources and policy enforcement. But taking this approach would come at a significant cost to each institute in terms of development time, operations and support. Another cost perspective to consider is that many universities get their research funding from government grants, and a solution that requires each university to build or host its own resource management services may not be the best use of government funding. Instead, we can greatly reduce overall costs and the barrier to entry by building a cloud-hosted solution that models all pertinent resources in one system and supports interoperability through Internet standards. And we can still offer autonomy by delegating all access rights to resources and policies to their providers.
The Security Question
Arguably the main attractive feature of the Grid approach is that, in theory, it distributes the security problem among its resource providers such that a security breach of one resource provider does not imply that other providers will be affected. However, this notion of security is somewhat of a myth. The more complicated and distributed a system is, the more likely its users are to abuse policies to make their lives easier. Back in 2001 when “Anatomy of the Grid” was written, I observed that many users would copy their x509 certificates and highly sensitive private key files to multiple computers just to make it easier to use GSS-API based applications. Also, in the event a system-wide breach in a distributed system is found, then getting each institute to deploy relevant fixes in an efficient manner is non-trivial.
In place of the federated model, Globus Online is adopting the software-as-a-service (SaaS) model to define virtual organizations of users, resources and policies. Our aim is to provide a simple yet powerful online tool that integrates with applications through commonly supported standards to enable research teams to manage access to applications and data. Of course, we recognize there are many challenges ahead, so we are taking great care to build a system that protects the privacy and security of our users. We are also taking every precaution to sandbox the access policies of each major community we aim to support. After all, for reasons I hope I’ve established here, policy is our #1 concern. In future posts, I’ll discuss the Globus Online virtual organization approach in more technical detail, including how how we model the world, how we integrate with applications and how we intend for users to manage policies.