Software Links
Getting Started
- A Globus Primer
- Globus Is Modular!
- Quickstart
- Installing GT
- Platform Notes
- GT Developer's Guide
- GT User's Guide
- Migrating Guides
Reference
Manuals
Common Runtime
Security
- GSI C
- GSI Java
- Java WS A&A
- C WS A&A (coming soon)
- CAS
- Delegation Service
- MyProxy
- GSI-OpenSSH
- SimpleCA
Data Mgt
WS MDS
Execution Mgt
Table of Contents
If you just want the "rules of thumb" on getting started (without all the details), the
following options using globus-url-copy will normally give
acceptable performance:
globus-url-copy -vb -tcp-bs 2097152 -p 4source_urldestination_url
where:
- -vb
specifies verbose mode and displays:
- number of bytes transferred,
- performance since the last update (currently every 5 seconds), and
- average performance for the whole transfer.
- -tcp-bs
specifies the size (in bytes) of the TCP buffer to be used by the underlying ftp data channels. This is critical to good performance over the WAN.
- -p
Specifies the number of parallel data connections that should be used. This is one of the most commonly used options.
The source/destination URLs will normally be one of the following:
One of the most basic tasks in GridFTP is to "put" files, i.e., moving a file from your
file system to the server. So for example, if you want to move the file /tmp/foo from a file system accessible to the host on which you are running your
client to a file name /tmp/bar on a host named remote.machine.my.edu running a GridFTP server, you would use this command:
globus-url-copy -vb -tcp-bs 2097152 -p 4 file:///tmp/foo gsiftp://remote.machine.my.edu/tmp/bar
![]() | Note |
|---|---|
In theory, |
A get, i.e, moving a file from a server to your file system, would just reverse the source and destination URLs:
![]() | Tip |
|---|---|
Remember |
globus-url-copy -vb -tcp-bs 2097152 -p 4 gsiftp://remote.machine.my.edu/tmp/bar file:///tmp/foo
Finally, if you want to move a file between two GridFTP servers (a third party transfer), both URLs would use
gsiftp: as the
protocol:
globus-url-copy -vb -tcp-bs 2097152 -p 4 gsiftp://other.machine.my.edu/tmp/foo gsiftp://remote.machine.my.edu/tmp/bar
If you want more information and details on URLs and the command line options, the Key Concepts gives basic definitions and an overview of the GridFTP protocol as well as our implementation of it.
If you want to access data in a non-POSIX file data source that has a POSIX interface, the standard server will do just fine. Just make sure it is really POSIX-like (out of order writes, contiguous byte writes, etc).
The following information is helpful if you want to use GridFTP to access data in DSIs (such as HPSS and SRB), and non-POSIX data sources.
Architecturally, the Globus GridFTP server can be divided into 3 modules:
- the GridFTP protocol module,
- the (optional) data transform module, and
- the Data Storage Interface (DSI).
In the GT 4.2.0 implementation, the data transform module and the DSI have been merged, although we plan to have separate, chainable, data transform modules in the future.
![]() | Note |
|---|---|
This architecture does NOT apply to the WU-FTPD implementation (GT3.2.1 and lower). |
The GridFTP protocol module is the module that reads and writes to the network and implements the GridFTP protocol. This module should not need to be modified since to do so would make the server non-protocol compliant, and unable to communicate with other servers.
The data transform functionality is invoked by using the ERET (extended retrieve) and ESTO (extended store) commands. It is seldom used and bears careful consideration before it is implemented, but in the right circumstances can be very useful. In theory, any computation could be invoked this way, but it was primarily intended for cases where some simple pre-processing (such as a partial get or sub-sampling) can greatly reduce the network load. The disadvantage to this is that you remove any real option for planning, brokering, etc., and any significant computation could adversely affect the data transfer performance. Note that the client must also support the ESTO/ERET functionality as well.
The Data Storage Interface (DSI) / Data Transform module knows how to read and write to the "local" storage system and can optionally transform the data. We put local in quotes because in a complicated storage system, the storage may not be directly attached, but for performance reasons, it should be relatively close (for instance on the same LAN).
The interface consists of functions to be implemented such as send (get), receive (put), command (simple commands that simply succeed or fail like mkdir), etc..
Once these functions have been implemented for a specific storage system, a client should not need to know or care what is actually providing the data. The server can either be configured specifically with a specific DSI, i.e., it knows how to interact with a single class of storage system, or one particularly useful function for the ESTO/ERET functionality mentioned above is to load and configure a DSI on the fly.
See Appendix A, Developing DSIs for GridFTP for more information.
Last Update: August 2005
Working with Los Alamos National Laboratory and the High Performance Storage System (HPSS) collaboration (http://www.hpss-collaboration.org), we have written a Data Storage Interface (DSI) for read/write access to HPSS. This DSI would allow an existing application that uses a GridFTP compliant client to utilize an HPSS data resources.
This DSI is currently in testing. Due to changes in the HPSS security mechanisms, it requires HPSS 6.2 or later, which is due to be released in Q4 2005. Distribution for the DSI has not been worked out yet, but it will *probably* be available from both Globus and the HPSS collaboration. While this code will be open source, it requires underlying HPSS libraries which are NOT open source (proprietary).
![]() | Note |
|---|---|
This is a purely server side change, the client does not know what DSI is running, so only a site that is already running HPSS and wants to allow GridFTP access needs to worry about access to these proprietary libraries. |
Last Update: August 2005
Working with the SRB team at the San Diego Supercomputing Center, we have written a Data Storage Interface (DSI) for read/write access to data in the Storage Resource Broker (SRB) (http://www.npaci.edu/DICE/SRB). This DSI will enable GridFTP compliant clients to read and write data to an SRB server, similar in functionality to the sput/sget commands.
This DSI is currently in testing and is not yet publicly available, but will be available from both the SRB web site (here) and the Globus web site (here). It will also be included in the next stable release of the toolkit. We are working on performance tests, but early results indicate that for wide area network (WAN) transfers, the performance is comparable.
When might you want to use this functionality:
- You have existing tools that use GridFTP clients and you want to access data that is in SRB
- You have distributed data sets that have some of the data in SRB and some of the data available from GridFTP servers.
Pipelining allows the client to have many outstanding, unacknowledged transfer commands at once. Instead of being forced to wait for the "Finished response" message, the client is free to send transfer commands at any time.
Pipelining is enabled by using the -pp option:
globus-url-copy -pp
GridFTP Where There Is FTP (GWTFTP) is an intermediate program that acts as a proxy between existing FTP clients and GridFTP servers. Users can connect to GWFTP with their favorite standard FTP client, and GWFTP will then connect to a GridFTP server on the client’s behalf. To clients, GWFTP looks much like an FTP proxy server. When wishing to contact a GridFTP server, FTP clients instead contact GWTFTP.
Clients tell GWFTP their ultimate destination via the FTP USER <username> command. Instead of
entering their username, client users send the following:
USER<GWTFTP username>::<GridFTP server URL>
This command tells GWTFTP the GridFTP endpoint with which the client wants to communicate. For example:
USER bresnaha::gsiftp://wiggum.mcs.anl.gov:2811/
![]() | Note |
|---|---|
Requires GSI C security. |
To transfer a single file to many destinations in a multicast/broadcast, use the new
-mc option.
![]() | Note |
|---|---|
To use this option, the admin must enable multicasting. Click here for more information. |
globus-url-copy -vb -tcp-bs 2097152 -p 4 -mcfilenamesource_url
The filename must contain a line-separated list of destination
urls. For example:
gsiftp://localhost:5000/home/user/tst1 gsiftp://localhost:5000/home/user/tst3 gsiftp://localhost:5000/home/user/tst4
For more flexibility, you can also specify a single destination url on the command line in addition to the urls in the file. Examples are:
globus-url-copy -MC multicast.file gsiftp://localhost/home/user/src_file
or
globus-url-copy -MC multicast.file gsiftp://localhost/home/user/src_file gsiftp://localhost/home/user/dest_file1
Along with specifying the list of destination urls in a file, a set of options for each
url can be specified. This is done by appending a ? to the
resource string in the url followed by semicolon-separated key value pairs. For example:
gsiftp://dst1.domain.com:5000/home/user/tst1?cc=1;tcpbs=10M;P=4
This indicates that the receiving host dst1.domain.com will use 4
parallel stream, a tcp buffer size of 10 MB, and will select 1 host when forwarding on data
blocks. This url is specified in the -mc file as described above.
The following is a list of key=value options and their meanings:
- P=
integer - The number of parallel streams this node will use when forwarding.
- cc=
integer - The number of urls to which this node will forward data.
- tcpbs=
formatted integer - The TCP buffer size this node will use when forwarding.
- urls=
string list - The list of urls that must be children of this node when the spanning tree is complete.
- local_write=
boolean: y|n - Determines if this data will be written to a local disk, or just forwarded on to the next hop. This is explained more in the Network Overlay section.
- subject=
string - The DN name to expect from the servers this node is connecting to.
In addition to allowing multicast, this function also allows for creating user-defined network routes.
If the local_write option is set to n,
then no data will be written to the local disk, the data will only be forwarded on.
If the local_write option is set to n and is used with the cc=1 option, the data will be
forwarded on to exactly 1 location.
This allows the user to create a network overlay of data hops using each GridFTP server as a router to the ultimate destination.
![[Note]](/docbook-images/note.gif)
![[Tip]](/docbook-images/tip.gif)