GASS: Requirements
- Make the design as
simple as possible. We do not want to write another distributed filesystem.
- Support global name spaces for files, via URL syntax, and allowing access to files via
HTTP, FTP, and GASS servers:
http://SERVER-NAME/REMOTE-PATHNAME
Read/write file on public HTTP server. This will work either without authentication, or with SSL authentication, depending upon the web server. (SSL is of limited utility, as HTTP server must be using Globus CA, or similar, but we get this more or less for free.)
ftp://SERVER-NAME/REMOTE-PATHNAME
Read/write file on anonymous FTP server, without authentication (i.e., anonymous ftp). This will likely also work with SSL enhanced ftp (see http://www.gbnet.com/public/security/Crypto/SSLapps/).
x-gass://SERVER-NAME:SERVER-PORT/REMOTE-PATHNAME
Access file on GASS server, with authentication.
- Support the following security model:
- Access to anonymous ftp or http servers is provided in the usual way, with no authentication.
- A GASS server can be accessed only by a computation operating on behalf of the user that started it, with process-to-process authentication performed by the Globus Security Infrastructure (GSI).
- Access to SSL-authenticated ftp or http servers may be supported in the future through the use of the GSI.
In the future, we may decide to support GASS servers that can run as root, so as to provide access to multiple Globus users. In this case, authentication and global-to-local credential mapping would be done in the same way as in the GRAM gatekeeper to control access. However, we don't intend to support this type of access initially.
- Support three access patterns within an application program:
- Read of a remote file using conventional Unix file input functions
Achieved by copying the entire file from the remote to local system when opening the file.
- Write a remote file using conventional Unix file output functions
- Make the simplifying assumption that we do NOT support coherency or locking between multiple writers
- Note that changes to file need not be reflected remotely until file close: hence, one can write locally, transfer file when file closed
- Append
- Support append operations to a remote file
- To support logging, perform "flush" after each write so that the appending happens incrementally as write occur
- Enable multiple
writers to operate on the same file simultaneously but allow concurrent write to be
interleaved
- Read of a remote file using conventional Unix file input functions
- Support the concept of a "file cache" at a site
- Caches are associated with users, hence allowing use of local resource management facilities to control user consumption of disk.
- "File open" operation places a remote file in local cache, hence avoiding repeat fetches when multiple processes at a site access the same remote file.
- Reference counting via tags on open and close is used to manage cache.
- There is no persistent cache daemon. Any program with access to the file cache can use it through a cache API.
- Multiple caches can be defined for a user, hence allowing the advanced user to stage
data to different locations, e.g., /tmp or a parallel file system on a parallel computer.
- Allow users to manage the cache remotely, via GRAM requests. In particular allow them
to:
- Request information about contents
- Put items into the cache (hence performing prestaging)
- Delete items from the cache
We may also want to allow remote "get" from a cache (??)
Two APIs have been discussed for achieving this remote management:- Simple "listcontents", "put", and "delete" programs that can be called via GRAM
- Extensions to the GRAM Resource Specification Language (RSL)
Note:
- Prestaging can be either coupled with or decoupled from application submission
- Prestage request increments reference count, cache delete decrements
- Make the libraries
stateless, so that things can be restarted, etc.
- Support customized GASS servers. A simple gass-server will be provided to give access to
remote files. However, a user should be able to extend the GASS server capabilities to
return data from any data source.
- The APIs should allow for very high performance implementations.