- Doc Structure
- A Globus Primer
- Installing GT
- Platform Notes
- Migrating from GT2
- Migrating from GT3
- PDF version
- Best Practices
- Coding Guidelines
- API docs
- Public Interfaces
- Resource Properties
- Performance Studies
- Aggregator Framework
A software framework used to build services that collect and aggregate data. MDS4 Services (such as the Index and Trigger services) are built on the Aggregator Framework, and are sometimes called Aggregator Services.
[link to agg framework doc]
- aggregator services
Services that are built on the Aggregator Framework, such as the MDS4 Index Service and Trigger Service.
- aggregator source
A Java class that implements an interface (defined as part of the Aggregator Framework) to collect XML-formatted data. MDS4 contains three aggregator sources: the query aggregator source, the subscription aggregator source, and the execution aggregator source.
- Apache Axis
The SOAP engine implementation used within the Globus Toolkit. See the Apache Axis website for details.
- batch scheduler
- Bloom filter
Compression scheme used by the Replica Location Service (RLS) that is intended to reduce the size of soft state updates between Local Replica Catalogs (LRCs) and Replica Location Index (RLI) servers. A Bloom filter is a bit map that summarizes the contents of a Local Replica Catalog (LRC). An LRC constructs the bit map by applying a series of hash functions to each logical name registered in the LRC and setting the corresponding bits.
- Certificate Authority ( CA )
An entity that issues certificates. [re: security - link to useful page]
- CA Certificate
The CA's certificate. This certificate is used to verify signature on certificates issued by the CA. GSI typically stores a given CA certificate in
/etc/grid-security/certificates/, where <hash> is the hash code of the CA identity.
See Also Certificate Authority.
- CA Signing Policy
The CA signing policy is used to place constraints on the information you trust a given CA to bind to public keys. Specifically it constrains the identities a CA is trusted to assert in a certificate. In GSI the signing policy for a given CA can typically be found in
/etc/grid-security/certificates/, where <hash> is the hash code of the CA identity. For more information see [add link].
A public key plus information about the certificate owner bound together by the digital signature of a CA. In the case of a CA certificate, the certificate is self signed, i.e. it was signed using its own private key.
- Certificate Revocation List (CRL)
A list of revoked certificates generated by the CA that originally issued them. When using GSI [not really defined in gt4...] this list is typically found in
/etc/grid-security/certificates/, where <hash> is the hash code of the CA identity.
See Also Certificate Authority.
- certificate subject
An identifier for the certificate owner, e.g. "
/DC=org/DC=doegrids/OU=People/CN=John Doe 123456". The subject is part of the information the CA binds to a public key when creating a certificate.
[connect this with GridFTP] FTP is a command/response protocol. The defining characteristic of a client is that it is the process sending the commands and receiving the responses. It may or may not take part in the actual movement of data.
Axis client-side WSDD configuration file. It contains information about the type mappings, the transport and other handlers. [need a little more info tying it to Java WS Core...]
- client/server transfer
[tie this to GridFTP or 'data management'] In a client/server transfer, there are only two entities involved in the transfer, the client entity and the server entity. We use the term entity here rather than process because in the implementation provided in GT4, the server entity may actually run as two or more separate processes.
The client will either move data from or to his local host. The client will decide whether or not he wishes to connect to the server to establish the data channel or the server should connect to him (MODE E dictates who must connect).
If the client wishes to connect to the server, he will send the PASV (passive) command. The server will start listening on an ephemeral (random, non-privileged) port and will return the IP and port as a response to the command. The client will then connect to that IP/Port.
If the client wishes to have the server connect to him, the client would start listening on an ephemeral port, and would then send the PORT command which includes the IP/Port as part of the command to the server and the server would initiate the TCP connect. Note that this decision has an impact on traversing firewalls. For instance, the client's host may be behind a firewall and the server may not be able to connect.
Finally, now that the data channel is established, the client will send either the RETR “filename” command to transfer a file from the server to the client (GET), or the STOR “filename” command to transfer a file from the client to the server (PUT).
See Also extended block mode (MODE E).
A job scheduler mechanism supported by GRAM. See http://www.cs.wisc.edu/condor/ for more information.
Also referred to as the "hosting environment." Provides a common runtime environment for web services. It manages the execution of services and resources, and manages their lifecycles. Provides security and data persistence infrasturcure, and other functionality such as managed threading and registry.
A default "standalone" container is provided with a default GT installation.
- End Entity Certificate (EEC)
A certificate belonging to a non-CA entity, e.g. you, me or the computer on your desk.
- execution aggregator source
An Aggregator Source (included in MDS4) that executes an administrator-supplied program to collect information and make it available to an Aggregator Service such as the Index Service.
[link to execution agg source doc]
See Also aggregator source.
- extended block mode (MODE E)
MODE E is a critical GridFTP components because it allows for out of order reception of data. This in turn, means we can send the data down multiple paths and do not need to worry if one of the paths is slower than the others and the data arrives out of order. This enables parallelism and striping within GridFTP. In MODE E, a series of “blocks” are sent over the data channel. Each block consists of:
- an 8 bit flag field,
- a 64 bit field indicating the offset in the transfer,
- and a 64 bit field indicating the length of the payload,
- followed by length bytes of payload.
Note that since the offset and length are included in the block, out of order reception is possible, as long as the receiving side can handle it, either via something like a seek on a file, or via some application level buffering and ordering logic that will wait for the out of order blocks. [TODO: LINK TO GRAPHIC]
- GAA configuration file
A file that configures the Generic Authorization and Access control GAA libraries. When using GSI [term not well described in gt4], this file is typically found in
A cluster monitoring tool (re: MDS4). See http://ganglia.sourceforge.net.
The GAR (Grid ARchive) file is a single file which contains all the files and information that the container needs to deploy a service. See the Java WS Core Developer's Guide for details.
See Also container.
A command line program used to submit jobs to a WS GRAM service. See the the WS GRAM Commandline page.
- grid map file
A file containing entries mapping certificate subjects to local user names. This file can also serve as a access control list for GSI enabled services and is typically found in
/etc/grid-security/grid-mapfile. For more information see the Gridmap file in Pre-WS Authorization & Authentication Developer's Guide ("Environmental Variables" section).
- grid security directory
The directory containing GSI configuration files such as the GSI authorization callout configuration and GAA configuration files. Typically this directory is
/etc/grid-security. For more information see Grid security directory in Pre-WS Authorization & Authentication Developer's Guide ("Environmental Variables" section).
- Grid Security Infrastructure (GSI)
- GSI authorization callout configuration file
A file that configures authorization callouts to be used for mapping and authorization in GSI [not really defined in gt4] enabled services. When using GSI this file is typically found in
A monitoring service for Condor Pools (re: GRAM). See http://www.cs.wisc.edu/condor/hawkeye/.
- host certificate
An EEC belonging to a host. When using GSI this certificate is typically stored in
/etc/grid-security/hostcert.pem. For more information on possible host certificate locations see the Pre-WS Authentication & Authorization Developer's Guide ("Environmental Variables" section) on Credentials.
- host credentials
The combination of a host certificate and its corresponding private key.
- improved extended block mode (MODE X)
[this term does not appear anywhere in the gridftp docs] This protocol is still under development. It is intended to address a number of the deficiencies found in MODE E. For instance, it will have explicit negotiation for use of a data channel, thus removing the race condition and the requirement for the sender to be the connector. This will help with firewall traversal. A method will be added to allow the filename to be provided prior to the data channel connection being established to help large data farms better allocate resources. Other additions under consideration include block checksumming, resends of blocks that fail checksums, and inclusion of a transfer ID to allow pipelining and de-multiplexing of commands.
See Also extended block mode (MODE E).
- Index Service
An aggregator service in MDS4 that serves as a registry similar to UDDI, but much more flexible. Indexes collect information and publish that information as WSRF resource properties.
- information provider
A "helper" software component that collects or formats resource information, for use in MDS4 by an aggregator source or by a WSRF service when creating resource properties.
Java Naming and Directory Interface (JNDI) API are used to access a central transient container registry. The registry is mainly used for discovery of the ResourceHome implementations. However, the registry can also be used store and retrieve arbitrary information. The jndi-config.xml files are used to populate the registry. See the JNDI Tutorial for details.
It is an XML-based configuration file used to populate the container registry accessible via the JNDI API. See in the Java WS Core Developer's Guide for details.
- job description
Term used to describe a WS GRAM job for GT4. [any useful link?]
- job scheduler
- Local Replica Catalog (LRC)
Stores mappings between logical names for data items and the target names (often the physical locations) of replicas of those items. Clients query the LRC to discover replicas associated with a logical name. Also may associate attributes with logical or target names. Each LRC periodically sends information about its logical name mappings to one or more RLIs.
See Also Replica Location Index.
- logical file name
A unique identifier for the contents of a file. (re: RLS)
- logical name
A unique identifier for the contents of a data item. (re: RLS)
A job scheduler mechanism supported by GRAM.
For more information, see http://www.platform.com/Products/Platform.LSF.Family/Platform.LSF/.
- Managed Executable Job Service (MEJS)
- Managed Job Factory Service (MJFS)
- Managed Multi Job Service (MMJS)
- MODE command
In reality, GridFTP is not one protocol, but a collection of several protocols. There is a protocol used on the control channel, but there is a range of protocols available for use on the data channel. Which protocol is used is selected by the MODE command. Four modes are defined: STREAM (S), BLOCK (B), COMPRESSED (C) in RFC 959 for FTP, and EXTENDED BLOCK (E) in GFD.020 for GridFTP. There is also a new data channel protocol, or mode, being defined in the GGF GridFTP Working group which, for lack of a better name at this point, is called MODE X.
A job that is itself composed of several executable jobs; these are processed by the MMJS subjob. (re: GRAM)
See Also MMJS subjob.
When speaking about GridFTP transfers, parallelism refers to having multiple TCP connections between a single pair of network endpoints. This is used to improve performance of transfers on connections with light to moderate packet loss.
- Portable Batch System (PBS)
A job scheduler mechanism supported by GRAM. For more information, see http://www.openpbs.org.
See Also scheduler.
- physical file name
The address or the location of a copy of a file on a storage system. (re: RLS)
- private key
The private part of a key pair. Depending on the type of certificate the key corresponds to it may typically be found in
$HOME/.globus/userkey.pem(for user certificates),
/etc/grid-security/hostkey.pem(for host certificates) or
/etc/grid-security/(for service certificates).
For more information on possible private key locations see Credentials in the Pre-WS Authentication & Authorization Developer's Guide ("Environmental Variables" section).
- proxy certificate
A short lived certificate issued using a EEC. A proxy certificate typically has the same effective subject as the EEC that issued it and can thus be used in its place. GSI uses proxy certificates for single sign on and delegation of rights to other entities.
- proxy credentials
The combination of a proxy certificate and its corresponding private key. GSI typically stores proxy credentials in
/tmp/x509up_u, where <uid> is the user id of the proxy owner.
- public key
The public part of a key pair used for cryptographic operations (e.g. signing, encrypting).
- Replica Location Index (RLI)
Collects information about the logical name mappings stored in one or more Local Replica Catalogs (LRCs) and answers queries about those mappings. Each RLI periodically receives updates from one or more LRCs that summarize their contents. (re: RLS)
- resource properties
A resource is composed of zero or more resource properties which describe the resource. For example, a resource can have the following three resource properties: Filename, Size, and Descriptors. The resource properties are defined in the web service's WSDL interface description.
- Resource Specification Language (RSL)
Term used to describe a GRAM job for GT2 and GT3. (Note: This is not the same as RLS - the Replica Location Service)
In Java WS Core, resources are managed and discovered via ResourceHome implementations. The ResourceHome implementations can also be responsible for creating new resources, performing operations on a set of resources at a time, etc. ResourceHomes are configured in JNDI and are associated with a particular web service.
See Also JNDI.
- RLS attribute
Descriptive information that may be associated with a logical or target name mapping registered in a Local Replica Catalog (LRC). Clients can query the LRC to discover logical names or target names that have specified RLS attributes.
Term used to describe a job scheduler mechanism to which GRAM interfaces. It is a networked system for submitting, controlling, and monitoring the workload of batch jobs in one or more computers. The jobs or tasks are scheduled for execution at a time chosen by the subsystem according to an available policy and availability of resources. Popular job schedulers include Portable Batch System (PBS), Platform LSF, and IBM LoadLeveler.
- scheduler adapter
The interface used by GRAM to communicate/interact with a job scheduler mechanism. In GT 4.x, this is both the perl submission scripts and the SEG program.
See Also scheduler.
- Scheduler Event Generator (SEG)
[re: GridFTP] The compliment to the client is the server. Its defining characteristic is that it receives commands and sends responses to those commands. Since it is a server or service, and it receives commands, it must be listening on a port somewhere to receive the commands. Both FTP and GridFTP have IANA registered ports. For FTP it is port 21, for GridFTP it is port 2811. This is normally handled via inetd or xinetd on Unix variants. However, it is also possible to implement a daemon that listens on the specified port. This is described more fully in in the Architecture section of the GridFTP Developer's Guide.
See Also client.
Axis server-side WSDD configuration file. It contains information about the services, the type mappings and various handlers. [re: a particular part of gt4? does this affect all ws parts?]
- service certificate
A EEC for a specific service (e.g. FTP or LDAP). When using GSI this certificate is typically stored in
/etc/grid-security/. For more information on possible service certificate locations, see Credentials in the Pre-WS Authentication & Authorization Developer's Guide ("Environmental Variables" section).
- service credentials
The combination of a service certificate and its corresponding private key.
SOAP provides a standard, extensible, composable framework for packaging and exchanging XML messages between a service provider and a service requester. SOAP is independent of the underlying transport protocol, but is most commonly carried on HTTP. See the SOAP specifications for details.
- stream mode (MODE S)
The only mode normally implemented for FTP is MODE S. This is simply sending each byte, one after another over the socket in order, with no application level framing of any kind. This is the default and is what a standard FTP server will use. This is also the default for GridFTP.
When speaking about GridFTP transfers, striping refers to having multiple network endpoints at the source, destination, or both participating in the transfer of the same file. This is normally accomplished by having a cluster with a parallel shared file system. Each node in the cluster reads a section of the file and sends it over the network. This mode of transfer is necessary if you wish to transfer a single file faster than a single host is capable of. This also tends to only be effective for large files, though how large depends on how many hosts and how fast the end-to-end transfer is. Note that while it is theoretically possible to use NFS for the shared file system, your performance will be poor, and would make using striping pointless.
- subscription aggregator source
An aggregator source (included in MDS4) that collects data from a WSRF service via WSRF subscription/notification.
See Also aggregator source.
- superuser do (sudo)
Allows a system administrator to give certain users (or groups of users) the ability to run some (or all) commands as root or another user while logging the commands and arguments. See http://www.courtesan.com/sudo/ for more information.
- target name
The address or location of a copy of a data item on a storage system. [re: RLS and anything else?]
- third party transfers
In the simplest terms, a third party transfer moves a file between two GridFTP servers.
The following is a more detailed, programmatic description [should be in documentation somewhere-key concepts?].
In a third party transfer, there are three entities involved. The client, who will only orchestrate, but not actually take place in the data transfer, and two servers one of which will be sending data to the other. This scenario is common in Grid applications where you may wish to stage data from a data store somewhere to a supercomputer you have reserved. The commands are quite similar to the client/server transfer. However, now the client must establish two control channels, one to each server. He will then choose one to listen, and send it the PASV command. When it responds with the IP/port it is listening on, the client will send that IP/port as part of the PORT command to the other server. This will cause the second server to connect to the first server, rather than the client. To initiate the actual movement of the data, the client then sends the RETR “filename” command to the server that will read from disk and write to the network (the “sending” server) and will send the STOR “filename” command to the other server which will read from the network and write to the disk (the “receiving” server).
See Also client/server transfer.
- transport-level security
Uses transport-level security (TLS) mechanisms. [link to something useful]
- Trigger Service
An aggregator service (in MDS4) that collects information and compares that data against a set of conditions defined in a configuration file. When a condition is met, or triggered, the specified action takes place (for example, an email is sent to a system administrator when the disk space on a server reaches a threshold).
See Also aggregator services.
- trusted CAs directory
The directory containing the CA certificates and signing policy files of the CAs trusted by GSI. Typically this directory is
/etc/grid-security/certificates. For more information see Grid security directory in the Pre-WS A&A Developer's Guide ("Environmental Variables" section).
- Universally Unique Identifier (UUID)
Identifier that is immutable and unique across time and space. [re: what in gt4?]
- user certificate
A EEC belonging to a user. When using GSI, this certificate is typically stored in
$HOME/.globus/usercert.pem. For more information on possible user certificate locations, see Credentials in the Pre-WS A&A Developer's Guide ("Environmental Variables" section).
- user credentials
The combination of a user certificate and its corresponding private key.
- web service
In MDS4, WebMDS is a web-based interface to WSRF resource property information that can be used as a user-friendly front-end to the Index Service or other WSRF services.
- Web Services Addressing (WSA)
The WS-Addressing specification defines transport-neutral mechanisms to address web services and messages. Specifically, it defines XML elements to identify web service endpoints and to secure end-to-end endpoint identification in messages. See the W3C WS Addressing Working Group for details.
See Also web service.
- Web Services Deployment Descriptor (WSDD)
An Axis XML-based configuration file. [re: which part of gt4?]
- Web Services Description Language (WSDL)
WSDL is an XML document for describing Web services. Standardized binding conventions define how to use WSDL in conjunction with SOAP and other messaging substrates. WSDL interfaces can be compiled to generate proxy code that constructs messages and manages communications on behalf of the client application. The proxy automatically maps the XML message structures into native language objects that can be directly manipulated by the application. The proxy frees the developer from having to understand and manipulate XML. See the WSDL 1.1 specification for details.
- Web Services Interoperability Basic Profile (WS-I Basic Profile)
The WS-I Basic Profile specification is a set of recommendations on how to use the different web services specifications such as SOAP, WSDL, etc. to maximize interoperability.
- Web Services Notification (WSN)
The WS-Notification family of specifications define a pattern-based approach to allowing Web services to disseminate information to one another. This framework comprises mechanisms for basic notification (WS-Notification), topic-based notification (WS-Topics), and brokered notification (WS-BrokeredNotification). See the OASIS Web Services Notification (WSN) TC for details.
- Web Services Resource Framework (WSRF)
Web Services Resource Framework (WSRF) is a specification that extends web services for grid applications by giving them the ability to retain state information while at the same time retaining statelessness (using resources). The combination of a web service and a resource is referred to as a WS-Resource. WSRF is a collection of different specifications that manage WS-Resources.
This framework comprises mechanisms to describe views on the state (WS-ResourceProperties), to support management of the state through properties associated with the Web service (WS-ResourceLifetime), to describe how these mechanisms are extensible to groups of Web services (WS-ServiceGroup), and to deal with faults (WS-BaseFaults).