Software Links
Getting Started
- Doc Structure
- A Globus Primer
- Globus Is Modular!
- Quickstart
- Installing GT
- Platform Notes
- GT Developer's Guide
- GT User's Guide (coming soon)
- Migrating from GT2
- Migrating from GT3
Reference
- Best Practices
- Coding Guidelines
- API docs
- Public Interfaces (coming soon)
- Resource Properties
- Samples
- Glossary
- Performance Studies (coming soon)
Manuals
Common Runtime
Security
- Non-WS (General) Security
- WS Java Security
- Message-level
- Authz Framework
- CAS
- Delegation Service
- MyProxy
- GSI-OpenSSH
- SimpleCA
- SGAS
Data Mgt
MDS4
Execution Mgt
Table of Contents
GridFTP represents a service that a host is providing. Therefore, the service must be listening on a port waiting for client to request access to that service. This is generally handled one of two ways:
- Either an application daemon is running listening for connections, or
- inetd/xinetd is used.
The following list describes the process between the service listening for connection and an exchange of data taking place:
- These services (application daemon or inetd/xinetd) listen for connections.
- When a connection is received on a "well known" port such as 2811 for GridFTP, inetd does a fork/exec to start up a GridFTP server process and then does a Switch User (SU) so that the server is running in a user account rather than as root for security reasons. At this point, the client has established a control channel to the server.
- The client will then send a series of commands to configure or describe the transfer that it wants to take place.
There are basically four important components of the exchange:
- The first is security. You must authenticate, and for GridFTP, you must establish encryption on the control channel. The control channel is encrypted by default, though it can be switched off (see the security section for more detail).
- The second is setup and informational exchanges. The client may specify the type of the file (Binary or ASCII), the MODE of the transfer, he might request the size of a file before transferring it, etc..
- Third, the information and negotiation for the data channel must be done. How this is handled, depends on whether you are doing a client/server transfer or third party transfer.
- Finally, a store (STOR), retrieve (RETR), extended store (ESTO) or extended retrieve (ERET) to indicate direction of the transfer and to start data moving.
Architecturally, the Globus GridFTP server can be divided into 3 modules:
- the GridFTP protocol module,
- the (optional) data transform module, and
- the Data Storage Interface (DSI).
In the GT 4.1.3 implementation, the data transform module and the DSI have been merged, although we plan to have separate, chainable, data transform modules in the future.
![]() | Note |
|---|---|
This architecture does NOT apply to the WU-FTPD implementation (GT3.2.1 and lower). |
The GridFTP protocol module is the module that reads and writes to the network and implements the GridFTP protocol. This module should not need to be modified since to do so would make the server non-protocol compliant, and unable to communicate with other servers.
The data transform functionality is invoked by using the ERET (extended retrieve) and ESTO (extended store) commands. It is seldom used and bears careful consideration before it is implemented, but in the right circumstances can be very useful. In theory, any computation could be invoked this way, but it was primarily intended for cases where some simple pre-processing (such as a partial get or sub-sampling) can greatly reduce the network load. The disadvantage to this is that you remove any real option for planning, brokering, etc., and any significant computation could adversely affect the data transfer performance. Note that the client must also support the ESTO/ERET functionality as well.
The Data Storage Interface (DSI) / Data Transform module knows how to read and write to the "local" storage system and can optionally transform the data. We put local in quotes because in a complicated storage system, the storage may not be directly attached, but for performance reasons, it should be relatively close (for instance on the same LAN).
The interface consists of functions to be implemented such as send (get), receive (put), command (simple commands that simply succeed or fail like mkdir), etc..
Once these functions have been implemented for a specific storage system, a client should not need to know or care what is actually providing the data. The server can either be configured specifically with a specific DSI, i.e., it knows how to interact with a single class of storage system, or one particularly useful function for the ESTO/ERET functionality mentioned above is to load and configure a DSI on the fly.
[TODO: pointer to DSI development docs]
Last Update: August 2005
Working with Los Alamos National Laboratory and the High Performance Storage System (HPSS) collaboration (http://www.hpss-collaboration.org), we have written a Data Storage Interface (DSI) for read/write access to HPSS. This DSI would allow an existing application that uses a GridFTP compliant client to utilize an HPSS data resources.
This DSI is currently in testing. Due to changes in the HPSS security mechanisms, it requires HPSS 6.2 or later, which is due to be released in Q4 2005. Distribution for the DSI has not been worked out yet, but it will *probably* be available from both Globus and the HPSS collaboration. While this code will be open source, it requires underlying HPSS libraries which are NOT open source (proprietary).
![]() | Note |
|---|---|
This is a purely server side change, the client does not know what DSI is running, so only a site that is already running HPSS and wants to allow GridFTP access needs to worry about access to these proprietary libraries. |
Last Update: August 2005
Working with the SRB team at the San Diego Supercomputing Center, we have written a Data Storage Interface (DSI) for read/write access to data in the Storage Resource Broker (SRB) (http://www.npaci.edu/DICE/SRB). This DSI will enable GridFTP compliant clients to read and write data to an SRB server, similar in functionality to the sput/sget commands.
This DSI is currently in testing and is not yet publicly available, but will be available from both the SRB web site (here) and the Globus web site (here). It will also be included in the next stable release of the toolkit. We are working on performance tests, but early results indicate that for wide area network (WAN) transfers, the performance is comparable.
When might you want to use this functionality:
- You have existing tools that use GridFTP clients and you want to access data that is in SRB
- You have distributed data sets that have some of the data in SRB and some of the data available from GridFTP servers.
![[Note]](/docbook-images/note.gif)