GT 3.9.4 Component Guide to Public Interfaces: GridFTP
- Semantics and syntax of APIs
- Semantics and syntax of WSDL
- Command-line tools
- GUIs
- Description of domain-specific interface data
- Configuration settings
- Environment variables
Semantics and syntax of APIs
Programming Model Overview
The Globus FTP Client library provides a convenient way of accessing files on remote FTP servers. In addition to supporting the basic FTP protocol, the FTP Client library supports several security and performance extensions to make FTP more suitable for Grid applications. These extensions are described in the Grid FTP Protocol document. In addition to protocol support for grid applications, the FTP Client library provides a plugin architecture for installing application or grid-specific fault recovery and performance tuning algorithms within the library. Application writers may then target their code toward the FTP Client library, and by simply enabling the appropriate plugins, easily tune their application to run it on a different grid. All applications which use the Globus FTP Client API must include the header file "globus_ftp_client.h" and activate the GLOBUS_FTP_CLIENT_MODULE . To use the Globus FTP Client API, one must create an FTP Client handle . This structure contains context information about FTP operations which are being executed, a cache of FTP control and data connections, and information about plugins which are being used. The specifics of the connection caching and plugins are found in the "Handle Attributes" section of the API documentation. Once the handle is created, one may begin transferring files or doing other FTP operations by calling the functions in the "FTP Operations" section of the API documentation. In addition to whole-file transfers, the API supports partial file transfers, restarting transfers from a known point, and various FTP directory management commands. All FTP operations may have a set of attributes, defined in the operationattr section, associated with them to tune various FTP parameters. The data structures and functions needed to restart a file transfer are described in the "Restart Markers" section of the API documentation. For operations which require the user to send to or receive data from an FTP server the must call the functions in the "globus_ftp_client_data" section of the manual. The globus_ftp_control library provides low-level services needed to implement FTP client and servers. The API provided is protocol specific. The data transfer portion of this API provides support for the standard data methods described in the FTP Specification as well as extensions for parallel, striped, and partial data transfer.
Component API
For information on the internationalization API, see the C Common Libraries Public Interface.
Semantics and syntax of the WSDL
GridFTP has no WSDL as it is not Web Service based at this time.
Command-line tools
globus-url-copy for GridFTP
Tool description
globus-url-copy is a scriptable, command line tool, that can do multi-protocol data movement. It supports gsiftp:// (GridFTP), ftp://, http://, https://, and file:/// protocol specifiers in the URL. For GridFTP, globus-url-copy supports all implemented functionality. Versions from GT3.2 and later support file globbing and directory moves.
Before you begin
YOU MUST HAVE A CERTIFICATE TO USE globus-url-copy!
| 1 | First, as with all things Grid, you must have a valid proxy certificate to run globus-url-copy. If you do not have a certificate, you must obtain one. If you are doing this for testing in your own environment, the Simple CA provided with the Globus Tookit should suffice. If not, you must contact the Virtual Organization (VO) with which you are associated to see from whom you should request a certificate. One common source is the DOE Science Grid CA, although you must confirm whether or not the resources you wish to access will accept their certificates. Instructions for proper installation of the certificate should be provided from the source of the certificate. |
| 2 | Now that you have a certificate, you must generate a temporary proxy. Do this by running: grid-proxy-init Further documentation for grid-proxy-init can be found here. |
| 3 | You are now ready to use globus-url-copy! See the following
sections for syntax and command line options. |
Command syntax
The basic syntax for globus-url-copy is:
globus-url-copy [optional command line switches] Source_URL Destination_URL
where:
[optional command line switches] |
See Command line options below for a list of available options. |
<sourceURL> |
Specifies the original URL of the file(s) to be copied. If this is a directory, all files within that directory will be copied. |
<destURL> |
Specifies the URL where you want to copy the files. If you want to copy multiple files, this must be a directory. |
Note: Any url specifying a directory must end with /
URL prefixes
As of GT 3.2, we support the following URL prefixes:
file://(on a local machine only)ftp://-
gsiftp:// http://https://
By default, globus-url-copy is expecting the same kind of host
certificates that globusrun expects from gatekeepers.
Note: We do not provide an interactive client similar to the generic FTP client provided with Linux. See Interactive Client for information on an interactive client developed by NCSA / NMI / TeraGrid .
URL formats
URLs can be any valid URL as defined by RFC 1738 that have a protocol we support. In general, they have the following format:
protocol://[host]:[port]/path
For example:
gsiftp://myhost.mydomain.com:2812/data/foo.dat |
Fully specified. |
http://myhost.mydomain.com/mywebpage/default.html |
Port not specified so uses protocol default, 80 in this case. |
file:///foo.dat |
Host not specified so it uses your local host, port not specified as before. |
file:/foo.dat |
This is also valid, but is not recommended because...??? |
Note: For FTP URLs, it is legal to specify a user name and password in the URL as follows:
ftp://myname:mypassword@myhost.mydomain.com/foo.dat
This is highly discouraged as you will be sending your username and password in plain text over the network. For servers provided in the Globus Toolkit, username and password is not a permitted authentication method and so this format will result in an error (??? what error ???). The exception to this is anonymous FTP access (how does this work in globus-url-copy).
Command line options
Notes about globus-url-copy
- A
globus-url-copyusing thegsiftpprotocol, with no options (using all the defaults) will do a binary, stream mode (which implies no parallelism) transfer, with whatever the host default TCP buffer size is, <feel like there should be a verb here> encrypted and checksummed control channel, and authenticated data channel.
- GridFTP (as well as normal FTP) defines multiple wire protocols, or MODES,
for the data channel.
Most normal FTP servers only implement stream mode, i.e. the bytes flow in order over a single TCP connection. GridFTP defaults to this mode so that it is compatible with normal FTP servers.
However, GridFTP has another MODE, called Extended Block Mode, or MODE E. This mode sends the data over the data channel in blocks. Each block consists of 8 bits of flags, a 64 bit integer indicating the offset from the start of the transfer, and a 64 bit integer indicating the length of the block in bytes, followed by a payload of length bytes. Because the offset and length are provided, out of order arrival is acceptable, i..e, the 10th block could arrive before the 9th because you know explicitly where it belongs. This allows us to use multiple TCP channels. If you use the -p | -parallelism option, globus-url-copy automatically puts the servers into MODE E.
Note: Putting-p 1is not the same as no-pat all. Both will use a single stream, but the default will use stream mode and-p 1will use MODE E.
- For more information on TCP buffer sizes and related information, try <here>.
- If you run a GridFTP server by hand, you will need to
explicitly specify the subject name to expect. You can use the
-ssflag to set the sourceURL subject, and-dsto set the destURL subject. If you use-salone, it will set both to be the same. You can see an example of this usage under the Verification section of this guide. Please note: This is the unusual case of using this client. Most times you only need to specify both URLs.
Limitations
There are no limitations for globus-url-copy in GT 3.9.4.
Interactive clients for GridFTP
The Globus Project does not provide an interactive client for GridFTP. Any normal FTP client will work with a GridFTP server, but it cannot take advantage of the advanced features of GridFTP. The interactive clients listed below take advantage of the advanced features of GridFTP.
There is no endorsement implied by their presence here. We make no assertion as to the quality or approriateness of these tools, we simply provide this for your convenience. We will not answer questions, accept bugs, or in any way shape or form be responsible for these tools, although they should have mechanisms of their own for such things.
UberFTP was developed at the NCSA under the auspices of NMI and TeraGrid. It is available through NMI (a convenient place to get Globus and other tools as well, btw), or directly from NCSA:
- NMI Download: http://nsf-middleware.org/
- NCSA Uberftp only download: http://dims.ncsa.uiuc.edu/set/uberftp/download/index.html
- UberFTP User's Guide: http://teragrid.ncsa.uiuc.edu/Doc/Data/uberftp.html
Overview of Graphical User Interface
Globus does not provide any interactive client for GridFTP, either GUI or text based. However, NCSA, as part of there TeraGrid activity, produces a text based interactive client called UberFTP, which you may want to check out. See Interactive clients for more information.
Semantics and syntax of domain-specific interface
Interface introduction
The Globus implementation of the GridFTP server draws on:
- three IETF RFCs:
- RFC 959
- RFC 2228
- RFC 2389
- an IETF Draft: MLST-16
- the GridFTP protocol specification, which is a Global Grid Forum (GGF) Standard: GFD.020
Syntax of the interface
The command line tools and the client library completely hide the details of the protcol from the user and developer. Unless you choose to use the control library, it is not necessary to have a detailed knowledge of the protocol.
Configuration interface
GridFTP Server Configuration Overview
Note: Command line options and configuration file options may both be used but the command line overrides the config file.
The configuration file is read from the following locations, in the given order. Only the first found will be loaded.
- Path specified with the
-c <configfile>command line option. - $GLOBUS_LOCATION/etc/gridftp.conf
- /etc/grid-security/gridftp.conf
Options are allowed one per line, with the format:
<option> <value>
If the value contains spaces, they should be enclosed in double-quotes (")
Flags or boolean options should only have a value of 0 or 1
Blank lines and lines begining with # are ignored.
For example:
port 5000 allow_anonymous 1 anonymous_user bob banner "Welcome!"
GridFTP Server Configuration Options
The table below lists config file options, associated command line options (if available) and descriptions. Note that any boolean option can be negated on the command line by preceding the specified option with '-no-' or '-n'. example: -no-cas or -nf.
- Informational Options
- Modes of Operation
- Authentication, Authorization, and Security Options
- Logging Options
- Single and Striped Remote Data Node Options
- Disk Options
- Network Options
- Timeouts
- User Messages
- Module Options
- Other
Configuring the GridFTP server to run under xinetd/inetd
Note: The service name used (gsiftp in this case) should
be defined in /etc/services with the desired port.
Here is a sample gridftp server xinetd config entry:
service gsiftp
{
instances = 100
socket_type = stream
wait = no
user = root
env += GLOBUS_LOCATION=(globus_location)
env += LD_LIBRARY_PATH=(globus_location)/lib
server = (globus_location)/sbin/globus-gridftp-server
server_args = -i
log_on_success += DURATION
nice = 10
disable = no
}
Here is a sample gridftp server inetd config entry: (read as a single line)
gsiftp stream tcp nowait root /usr/bin/env env \
GLOBUS_LOCATION=(globus_location) \
LD_LIBRARY_PATH=(globus_location)/lib \
(globus_location)/sbin/globus-gridftp-server -i
Environment variable interface
The GridFTP server or client libraries do not read any environment variable directly, but the security and networking related variables described below may be useful.