Software Links
Getting Started
- A Globus Primer
- Globus Is Modular!
- Quickstart
- Installing GT
- Platform Notes
- GT Developer's Guide
- GT User's Guide
- Migrating Guides
Reference
Manuals
Common Runtime
Security
- GSI C
- GSI Java
- Java WS A&A
- C WS A&A (coming soon)
- CAS
- Delegation Service
- MyProxy
- GSI-OpenSSH
- SimpleCA
Data Mgt
WS MDS
Execution Mgt
Table of Contents
If you just want the "rules of thumb" on getting started (without all the details), the
following options using globus-url-copy will normally give
acceptable performance:
globus-url-copy -vb -tcp-bs 2097152 -p 4source_urldestination_url
where:
- -vb
specifies verbose mode and displays:
- number of bytes transferred,
- performance since the last update (currently every 5 seconds), and
- average performance for the whole transfer.
- -tcp-bs
specifies the size (in bytes) of the TCP buffer to be used by the underlying ftp data channels. This is critical to good performance over the WAN.
- -p
Specifies the number of parallel data connections that should be used. This is one of the most commonly used options.
The source/destination URLs will normally be one of the following:
One of the most basic tasks in GridFTP is to "put" files, i.e., moving a file from your
file system to the server. So for example, if you want to move the file /tmp/foo from a file system accessible to the host on which you are running your
client to a file name /tmp/bar on a host named remote.machine.my.edu running a GridFTP server, you would use this command:
globus-url-copy -vb -tcp-bs 2097152 -p 4 file:///tmp/foo gsiftp://remote.machine.my.edu/tmp/bar
![]() | Note |
|---|---|
In theory, |
A get, i.e, moving a file from a server to your file system, would just reverse the source and destination URLs:
![]() | Tip |
|---|---|
Remember |
globus-url-copy -vb -tcp-bs 2097152 -p 4 gsiftp://remote.machine.my.edu/tmp/bar file:///tmp/foo
Finally, if you want to move a file between two GridFTP servers (a third party transfer), both URLs would use
gsiftp: as the
protocol:
globus-url-copy -vb -tcp-bs 2097152 -p 4 gsiftp://other.machine.my.edu/tmp/foo gsiftp://remote.machine.my.edu/tmp/bar
If you want more information and details on URLs and the command line options, the Key Concepts gives basic definitions and an overview of the GridFTP protocol as well as our implementation of it.
If you want to access data in a non-POSIX file data source that has a POSIX interface, the standard server will do just fine. Just make sure it is really POSIX-like (out of order writes, contiguous byte writes, etc).
The following information is helpful if you want to use GridFTP to access data in DSIs (such as HPSS and SRB), and non-POSIX data sources.
Architecturally, the Globus GridFTP server can be divided into 3 modules:
- the GridFTP protocol module,
- the (optional) data transform module, and
- the Data Storage Interface (DSI).
In the GT 4.2.0 implementation, the data transform module and the DSI have been merged, although we plan to have separate, chainable, data transform modules in the future.
![]() | Note |
|---|---|
This architecture does NOT apply to the WU-FTPD implementation (GT3.2.1 and lower). |
The GridFTP protocol module is the module that reads and writes to the network and implements the GridFTP protocol. This module should not need to be modified since to do so would make the server non-protocol compliant, and unable to communicate with other servers.
The data transform functionality is invoked by using the ERET (extended retrieve) and ESTO (extended store) commands. It is seldom used and bears careful consideration before it is implemented, but in the right circumstances can be very useful. In theory, any computation could be invoked this way, but it was primarily intended for cases where some simple pre-processing (such as a partial get or sub-sampling) can greatly reduce the network load. The disadvantage to this is that you remove any real option for planning, brokering, etc., and any significant computation could adversely affect the data transfer performance. Note that the client must also support the ESTO/ERET functionality as well.
The Data Storage Interface (DSI) / Data Transform module knows how to read and write to the "local" storage system and can optionally transform the data. We put local in quotes because in a complicated storage system, the storage may not be directly attached, but for performance reasons, it should be relatively close (for instance on the same LAN).
The interface consists of functions to be implemented such as send (get), receive (put), command (simple commands that simply succeed or fail like mkdir), etc..
Once these functions have been implemented for a specific storage system, a client should not need to know or care what is actually providing the data. The server can either be configured specifically with a specific DSI, i.e., it knows how to interact with a single class of storage system, or one particularly useful function for the ESTO/ERET functionality mentioned above is to load and configure a DSI on the fly.
See Appendix A, Developing DSIs for GridFTP for more information.
Last Update: August 2005
Working with Los Alamos National Laboratory and the High Performance Storage System (HPSS) collaboration (http://www.hpss-collaboration.org), we have written a Data Storage Interface (DSI) for read/write access to HPSS. This DSI would allow an existing application that uses a GridFTP compliant client to utilize an HPSS data resources.
This DSI is currently in testing. Due to changes in the HPSS security mechanisms, it requires HPSS 6.2 or later, which is due to be released in Q4 2005. Distribution for the DSI has not been worked out yet, but it will *probably* be available from both Globus and the HPSS collaboration. While this code will be open source, it requires underlying HPSS libraries which are NOT open source (proprietary).
![]() | Note |
|---|---|
This is a purely server side change, the client does not know what DSI is running, so only a site that is already running HPSS and wants to allow GridFTP access needs to worry about access to these proprietary libraries. |
Last Update: August 2005
Working with the SRB team at the San Diego Supercomputing Center, we have written a Data Storage Interface (DSI) for read/write access to data in the Storage Resource Broker (SRB) (http://www.npaci.edu/DICE/SRB). This DSI will enable GridFTP compliant clients to read and write data to an SRB server, similar in functionality to the sput/sget commands.
This DSI is currently in testing and is not yet publicly available, but will be available from both the SRB web site (here) and the Globus web site (here). It will also be included in the next stable release of the toolkit. We are working on performance tests, but early results indicate that for wide area network (WAN) transfers, the performance is comparable.
When might you want to use this functionality:
- You have existing tools that use GridFTP clients and you want to access data that is in SRB
- You have distributed data sets that have some of the data in SRB and some of the data available from GridFTP servers.
Pipelining allows the client to have many outstanding, unacknowledged transfer commands at once. Instead of being forced to wait for the "Finished response" message, the client is free to send transfer commands at any time.
Pipelining is enabled by using the -pp option:
globus-url-copy -pp
GridFTP Where There Is FTP (GWTFTP) is an intermediate program that acts as a proxy between existing FTP clients and GridFTP servers. Users can connect to GWFTP with their favorite standard FTP client, and GWFTP will then connect to a GridFTP server on the client’s behalf. To clients, GWFTP looks much like an FTP proxy server. When wishing to contact a GridFTP server, FTP clients instead contact GWTFTP.
Clients tell GWFTP their ultimate destination via the FTP USER <username> command. Instead of
entering their username, client users send the following:
USER<GWTFTP username>::<GridFTP server URL>
This command tells GWTFTP the GridFTP endpoint with which the client wants to communicate. For example:
USER bresnaha::gsiftp://wiggum.mcs.anl.gov:2811/
![]() | Note |
|---|---|
Requires GSI C security. |
To transfer a single file to many destinations in a multicast/broadcast, use the new
-mc option.
![]() | Note |
|---|---|
To use this option, the admin must enable multicasting. Click here for more information. |
globus-url-copy -vb -tcp-bs 2097152 -p 4 -mcfilenamesource_url
The filename must contain a line-separated list of destination
urls. For example:
gsiftp://localhost:5000/home/user/tst1 gsiftp://localhost:5000/home/user/tst3 gsiftp://localhost:5000/home/user/tst4
For more flexibility, you can also specify a single destination url on the command line in addition to the urls in the file. Examples are:
globus-url-copy -MC multicast.file gsiftp://localhost/home/user/src_file
or
globus-url-copy -MC multicast.file gsiftp://localhost/home/user/src_file gsiftp://localhost/home/user/dest_file1
Along with specifying the list of destination urls in a file, a set of options for each
url can be specified. This is done by appending a ? to the
resource string in the url followed by semicolon-separated key value pairs. For example:
gsiftp://dst1.domain.com:5000/home/user/tst1?cc=1;tcpbs=10M;P=4
This indicates that the receiving host dst1.domain.com will use 4
parallel stream, a tcp buffer size of 10 MB, and will select 1 host when forwarding on data
blocks. This url is specified in the -mc file as described above.
The following is a list of key=value options and their meanings:
- P=
integer - The number of parallel streams this node will use when forwarding.
- cc=
integer - The number of urls to which this node will forward data.
- tcpbs=
formatted integer - The TCP buffer size this node will use when forwarding.
- urls=
string list - The list of urls that must be children of this node when the spanning tree is complete.
- local_write=
boolean: y|n - Determines if this data will be written to a local disk, or just forwarded on to the next hop. This is explained more in the Network Overlay section.
- subject=
string - The DN name to expect from the servers this node is connecting to.
In addition to allowing multicast, this function also allows for creating user-defined network routes.
If the local_write option is set to n,
then no data will be written to the local disk, the data will only be forwarded on.
If the local_write option is set to n and is used with the cc=1 option, the data will be
forwarded on to exactly 1 location.
This allows the user to create a network overlay of data hops using each GridFTP server as a router to the ultimate destination.
The Java clients, rft and rft-delete commands are available for very simple transfers. For more options, use the programming instructions here.
Beginning with 4.2.0, RFT also offers a new C client, globus-crft.
To submit a transfer request the user must first create a 'transfer file'. Each line of this ASCII text file is a source/destination URL pair. There can be any number of of lines per file. An example file follows:
gsiftp://localhost:2811/etc/group gsiftp://localhost:2811/tmp/test_crft
gsiftp://ftp.globus.org:2811/pub/README gsiftp://myhost.here/home/user/file
This file requests two transfers. The first will user the GridFTP server running on the localhost to transfer /etc/group to /tmp/test_crft. The second will transfer the file /pub/README on ftp.globus.org to the file /home/user/file located on myhost.here
Once the transfer file is created globus-crft can be used in a variety of ways to transfer a file. The most simple is the blocking transfer:
% globus-crft -c -s -m -vb -f <transfer file> -e <container contact string>
Looking at each option individually, this command line does the following
-
-cCreate a new RFT server.,-sSubmit the transfer request. Since RFT is a 2 phase commit we allow the client the ability to do them in separate stages, however it is expected that the vast majority of the time -c and -s will be used together.
-mMonitor the transfers. When this option is used the client will block until all transfers have completed. It monitors the status of the transfers along the way and can report it to the user.
-vbDisplay verbose output. This just increases the level of diagnostic messages sent to stdout. When combined with -m it will allow the user to see the status of a transfer.
- -f <transfer file>
This option is a pointer to the transfer file described above.
- -e <container contact strings>
The contact string is in the following form:
https://hostname.com:8443/wsrf/services/
The strings ___ and ___ will be appended to the given string in order for the client to interact with that containers delegation service and RFT service.
The client can do non-blocking RFT submission. It can submit an RFT request and then terminate, returning later to monitor the status of the request. To accomplish this the client saves the EPR of the newly created RFT service to disk.
% globus-crft -c -s -f <transfer file> -e <container contact string> \
-ef <epr output file>
At some point later the client uses this same file to monitor the state of the transfer:
% globus-crft -ef <epr input file> --getOverallStatus
![]() | Note |
|---|---|
Note that in both cases the option |
Once a transfer request completes, the user should destroy the resources associated with it. If the user stored the EPR of the service it created, this can be done with:
% globus-crft -ef <epr input file> --destroy
To check whether your server is active you may use the globus-rls-admin(1) ping command.
% $GLOBUS_LOCATION/bin/globus-rls-admin -p rls://localhost
ping rls://localhost: 0 seconds
When the RLS server is first installed its database of replica location information will be empty, as expected. To create a replica location mapping, use the globus-rls-cli(1) create command. Replica information in RLS is represented as mappings from logical names to target names. Typically, the logical name will be a unique identifier for a given replicated data set and the target name will be a URL identifying a particular replica of the data set.
% $GLOBUS_LOCATION/bin/globus-rls-cli create my-logical-name-1 url-for-target-name-1 rls://localhost
![]() | Note |
|---|---|
The create command is intended for creating the initial replica mapping entry for a given logical name. If the user attempts to create another entry using an existing logical name, RLS will report a user error. To map additional target names to an existing logical name, see Section 4, “Adding replica location mappings”. |
To map additional target names to a logical name created by the previously described create command, use the globus-rls-cli(1) add command.
% $GLOBUS_LOCATION/bin/globus-rls-cli add my-logical-name-1 url-for-target-name-2 rls://localhost
Once your RLS server is populated with replica location mappings, you can query the server for useful information using the globus-rls-cli(1) query command.
% $GLOBUS_LOCATION/bin/globus-rls-cli query lrc lfn my-logical-name-1 rls://localhost
my-logical-name-1: url-for-target-name-1
my-logical-name-1: url-for-target-name-2
To remove unwanted replica location mappings from your RLS server, use the globus-rls-cli(1) delete command. The delete operation works directly on the mapping and indirectly on the logical and target names. When the delete operation is performed by the RLS server the association between the specified logical name and the specified target name is eliminated. However, there may still be other target names associated with the logical name, and there could still be other logical names associated with the target name, though the latter scenario is less likely. Only when all mapping associations for a given logical name (or a given target name) are eliminated (i.e., the specified logical name has no target names associated with it) will the logical (or target) name be deleted from the RLS server.
% $GLOBUS_LOCATION/bin/globus-rls-cli delete my-logical-name-1 url-for-target-name-1 rls://localhost % $GLOBUS_LOCATION/bin/globus-rls-cli query lrc lfn my-logical-name-1 rls://localhost my-logical-name-1: url-for-target-name-2 % $GLOBUS_LOCATION/bin/globus-rls-cli delete my-logical-name-1 url-for-target-name-2 rls://localhost % $GLOBUS_LOCATION/bin/globus-rls-cli query lrc lfn my-logical-name-1 rls://localhost globus_rls_client: LFN doesn't exist: my-logical-name-1
The globus-rls-cli(1) supports a variety of bulk operations that enhance productivity for users and reduce network connection overhead from making multiple, separate invocations of the client. The general pattern for bulk operation support as implemented by the client is a parameter list consisting of bulk command-name [command-modifiers] param-1 param-2 param-N, such as bulk query lrc lfn my-logical-name-1 my-logical-name-2 my-logical-name-3.
% $GLOBUS_LOCATION/bin/globus-rls-cli bulk create my-logical-name-1 url-for-target-name-1-1 my-logical-name-2 url-for-target-name-2-1 rls://localhost % $GLOBUS_LOCATION/bin/globus-rls-cli bulk add my-logical-name-1 url-for-target-name-1-2 my-logical-name-2 url-for-target-name-2-2 rls://localhost % $GLOBUS_LOCATION/bin/globus-rls-cli bulk query lrc lfn my-logical-name-1 my-logical-name-2 my-logical-name-3 rls://localhost my-logical-name-3: LFN doesn't exist my-logical-name-2: url-for-target-name-2-1 my-logical-name-2: url-for-target-name-2-2 my-logical-name-1: url-for-target-name-1-1 my-logical-name-1: url-for-target-name-1-2
The globus-rls-cli(1) supports an interactive mode in addition to the general command-line mode. To enter the interactive mode, simply invoke the client without any command.
% $GLOBUS_LOCATION/bin/globus-rls-cli rls://localhost
rls> query lrc lfn my-logical-name-2
my-logical-name-2: url-for-target-name-2-1
my-logical-name-2: url-for-target-name-2-2
rls> query lrc lfn my-logical-name-1
my-logical-name-1: url-for-target-name-1-1
my-logical-name-1: url-for-target-name-1-2
rls> bulk delete my-logical-name-1 url-for-target-name-1-1 my-logical-name-1
url-for-target-name-1-2 my-logical-name-2 url-for-target-name-2-1
my-logical-name-2 url-for-target-name-2-2
rls> bulk query lrc lfn my-logical-name-2 my-logical-name-1
my-logical-name-1: LFN doesn't exist
my-logical-name-2: LFN doesn't exist
rls> exit
Use the globus-replicalocation-createmappings(1) tool to create mappings.
% $GLOBUS_LOCATION/bin/globus-replicalocation-createmappings \
-s https://localhost:8443/wsrf/services/ReplicaLocationCatalogService \
mydata1 gsiftp://path/a/to/mydata1
No output is expect from this command when successful.
Use the globus-replicalocation-addmappings(1) tool to add mappings.
% $GLOBUS_LOCATION/bin/globus-replicalocation-addmappings \
-s https://localhost:8443/wsrf/services/ReplicaLocationCatalogService \
mydata1 gsiftp://path/b/to/mydata1
No output is expect from this command when successful.
Use the globus-replicalocation-defineattributes(1) tool to define attribute definitions.
% $GLOBUS_LOCATION/bin/globus-replicalocation-defineattributes \
-s https://localhost:8443/wsrf/services/ReplicaLocationCatalogService \
myattr1 logical string
No output is expect from this command when successful.
Use the globus-replicalocation-addattributes(1) tool to add attributes.
% $GLOBUS_LOCATION/bin/globus-replicalocation-addattributes \
-s https://localhost:8443/wsrf/services/ReplicaLocationCatalogService \
mydata1 myattr1 logical string attribute-value-goes-here
No output is expect from this command when successful.
Use the wsrf-query tool to query mappings.
% $GLOBUS_LOCATION/bin/wsrf-query \
-s https://localhost:8443/wsrf/services/ReplicaLocationCatalogService \
"query-target: mydata1" \
"http://globus.org/replica/location/06/01/QueryDialect"
<ns1:MappingStatusType ns1:logical="mydata1"
ns1:target="gsiftp://path/a/to/mydata1"
xmlns:ns1="http://www.globus.org/namespaces/2005/08/replica/location"/>
<ns1:MappingStatusType ns1:logical="mydata1"
ns1:target="gsiftp://path/b/to/mydata1"
xmlns:ns1="http://www.globus.org/namespaces/2005/08/replica/location"/>
Use the wsrf-query tool to query attributes.
% $GLOBUS_LOCATION/bin/wsrf-query \
-s https://localhost:8443/wsrf/services/ReplicaLocationCatalogService \
"query-logical-attributes: mydata1" \
"http://globus.org/replica/location/06/01/QueryDialect"
<ns1:AttributeStatusType ns1:key="mydata1" ns1:name="myattr1"
ns1:objtype="logical" ns1:status="attributeExists" ns1:valtype="string"
xmlns:ns1="http://www.globus.org/namespaces/2005/08/replica/location">
<_value xmlns="">attribute-value-goes-here</_value>
</ns1:AttributeStatusType>
A key parameter for any replication request is the request file. The replication request file is a text file containing CRLF-terminated rows of tab-delimited pairs of Logical Filename (LFN) names and destination (URL) locations. An example of such a file is shown.
% cat testrun.req
testrun-1 gsiftp://myhost:9001/sandbox/files/testrun-1
testrun-2 gsiftp://myhost:9001/sandbox/files/testrun-2
testrun-3 gsiftp://myhost:9001/sandbox/files/testrun-3
testrun-4 gsiftp://myhost:9001/sandbox/files/testrun-4
testrun-5 gsiftp://myhost:9001/sandbox/files/testrun-5
![]() | Note |
|---|---|
The LFNs in the left column of the request file (e.g., |
The initial step for any replication is to create the replication resource. Creating the resource depends on the availability of a DRS service, a delegated credential, and a properly formatted replication request file. The replication request file must be specified by its URL. Currently supported URL schemes for the request file include file, http, and ftp. If the replication client is run local to the service the file scheme is appropriate, whereas if the client is remote than the latter schemes must be used. It is a good practice to specify a filename to save the replication resource's endpoint reference. The endpoint reference is required for all other operations on the resource, such as getting resource properties, starting/stopping, and destroying it. Numerous options are available to influence the behavior of the data replication activities (see globus-replication-create(1)). One option of particular interest is the --start option, which immediately starts the replication activities following creation of the replication resource. An example of using the globus-replication-create(1) tool is shown.
% $GLOBUS_LOCATION/bin/globus-replication-create -s \ https://myhost:8443/wsrf/services/ReplicationService \ -C mycredential.epr -V myreplicator.epr file:///scratch/testrun.req
This command does not write to stdout when successful unless the --debug option is specified.
Once a replication resource has been create, the replication activities may be started. As mentioned in Create replication resource the replication may be immediately started after it is created. If the immediate start option is not specified, the globus-replication-start(1) tool must be used to start the replication.
% $GLOBUS_LOCATION/bin/globus-replication-start -e myreplicator.epr
No output is expect from this command when successful.
Throughout the lifecycle and after the completion of the replication resource, it will be important to lookup its Resource Properties. The standard WS-RF port types are supported and the supplied tools (e.g., wsrf-get-property) may be used with the DRS and its resources.
% $GLOBUS_LOCATION/bin/wsrf-get-property -e myreplicator.epr \
"{http://www.globus.org/namespaces/2005/05/replica/replicator}status"
<ns1:status xmlns:ns1="http://www.globus.org/namespaces/2005/05/replica/replicator">
Active</ns1:status>
And,
% $GLOBUS_LOCATION/bin/wsrf-get-property -e myreplicator.epr \
"{http://www.globus.org/namespaces/2005/05/replica/replicator}count"
<ns1:count xmlns:ns1="http://www.globus.org/namespaces/2005/05/replica/replicator">
<ns1:total>10</ns1:total>
<ns1:finished>0</ns1:finished>
<ns1:failed>0</ns1:failed>
<ns1:terminated>0</ns1:terminated>
</ns1:count>
Throughout the lifecycle and after the completion of the replication resource, it may be helpful to find individual replication items in the replication resource to inspect the detailed status of the replication activities. The globus-replication-finditems(1) tool is used to find replication items. The following example demonstrates the usage when finding a limited number of items, offset into the lookup results set, for a specified status.
% $GLOBUS_LOCATION/bin/globus-replication-finditems -e myreplicator.epr -S Pending -O 1 -L 2
<ns1:FindItemsResponse xmlns:ns1="http://www.globus.org/namespaces/2005/05/replica/replicator">
<ns1:items xsi:type="ns1:ReplicationItemType" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<ns1:uri xsi:type="xsd:string" xmlns:xsd="http://www.w3.org/2001/XMLSchema">testrun-2</ns1:uri>
<ns1:priority xsi:type="xsd:int" xmlns:xsd="http://www.w3.org/2001/XMLSchema">1000</ns1:priority>
<ns1:status xsi:type="ns1:ReplicationItemStatusEnumerationType">Pending</ns1:status>
<ns1:destinations xsi:type="ns1:DestinationType">
<ns1:uri xsi:type="xsd:string" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
gsiftp://myhost:9001/sandbox/files/testrun-2</ns1:uri>
<ns1:status xsi:type="ns1:DestinationStatusEnumerationType">Pending</ns1:status>
</ns1:destinations>
</ns1:items>
<ns1:items xsi:type="ns1:ReplicationItemType" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<ns1:uri xsi:type="xsd:string" xmlns:xsd="http://www.w3.org/2001/XMLSchema">testrun-3</ns1:uri>
<ns1:priority xsi:type="xsd:int" xmlns:xsd="http://www.w3.org/2001/XMLSchema">1000</ns1:priority>
<ns1:status xsi:type="ns1:ReplicationItemStatusEnumerationType">Pending</ns1:status>
<ns1:destinations xsi:type="ns1:DestinationType">
<ns1:uri xsi:type="xsd:string" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
gsiftp://myhost:9001/sandbox/files/testrun-3</ns1:uri>
<ns1:status xsi:type="ns1:DestinationStatusEnumerationType">Pending</ns1:status>
</ns1:destinations>
</ns1:items>
</ns1:FindItemsResponse>
When the replication is complete, the replication resource may be destroyed. Destroying the replication resource will help to free up system resources (namely, memory), especially in the case that the replication entailed a large amount of individual replication activities (i.e., many files specified in the replication request file). The standard WS-RF port types are supported and the supplied wsrf-destroy tool may be used to destroy the DRS resource.
% $GLOBUS_LOCATION/bin/wsrf-destroy -e myreplicator.epr
Destroy operation was successful
![[Note]](/docbook-images/note.gif)
![[Tip]](/docbook-images/tip.gif)