This document is a work-in-progress and applies to this development release. The latest drafts of docs can be found in the Development Documentation directory. You are strongly encouraged to file bugs for both the development documentation and software on our Bugzilla page. We appreciate your participation.

GT 3.9.5 WS GRAM: User's Guide

Introduction

GRAM services provide secure job submission to many types of job schedulers for users who have the right to access a job hosting resource in a Grid environment. The existence of a valid proxy is in fact required for job submission. All GRAM job submission options are supported transparently through the embedded request document input. In fact, the job startup is done by submitting a client-side provided job description to the GRAM services. This submission can be made by end-users with the GRAM command-line tools.

New Functionality

Substitution variables

In GT 3.9.2, RSL substitution variables had been removed from GRAM. Starting with GT 3.9.5, substitution variables are available again, while preserving the simplicity of the job description XML schema (relatively to the GT3.2 RSL schema). Substitution variables can be used in any path-like string or URL specified in the job description. They are special strings that are replaced by the GRAM services with actual values that the client-side does not a priori know. An example of substitution variable is ${GLOBUS_USER_HOME}, which represents the path to the HOME directory on the file system visible by the GRAM services of the user on behalf of whom the job is executed.

Details of the RSL variables are in job description doc

Submission ID

A submission ID may be used in the GRAM protocol for robust reliability in the face of message faults or other transient errors in order to ensure that at most one instance of a job is executed, i.e. to prevent accidental duplication of jobs under rare circumstances with client retry on failure. The managed-job-globusrun tool always uses this feature, requiring either a submission ID to be passed in as input or a new unique ID to be created by the tool itself. If a new ID is created, it should be captured by the user who wishes to exploit this reliability interface. The ID in use, whether created or passed as input, will be written to the first line of standard output unless the quiet mode is in effect.

If a user is unsure whether a job was submitted successfully, he should resubmit using the same ID as was used for the previous attempt.

Job hold and release

It is possible to specify in a job description that the job be put on hold when it reaches a chosen state (see GRAM Approach documentation for more information about the executable job state machine, and see the job description XML schema documentationfor information about how to specify a held state). This is useful for instance when a GRAM client wishes to directly access output files written by the job (as opposed to waiting for the stage-out step to transfer files from the job host). The client would request that the file cleanup process be held until released, giving the client an opportunity to fetch all remaining/buffered data after the job completes but before the output files are deleted.

This is used by globusrun-ws in order to ensure client-side streaming of remote files in batch mode.

MultiJobs

The new job description XML schema allows for specification of a multijob i.e. a job that is itself composed of several executable jobs. This is useful in order to bundle a group of jobs together and submit them as a whole to a remote GRAM installation.

Job and process rendezvous

WS GRAM services implement a rendezvous mechanism to perform synchronization between job processes in a multiprocess job and between subjobs in a multijob. The job application can in fact register binary information, for instance process information or subjob information, and get notified when all the other processes or subjobs have registered their own information. This is for instance useful for parallel jobs which need to rendezvous at a "barrier" before proceeding with computations, in the case when no native application API is available to help do the rendezvous.

Common job submission tasks

Generating a valid proxy

In order to generate a valid proxy file, use the grid-proxy-init tool available under $GLOBUS_LOCATION/bin:
% bin/grid-proxy-init
Your identity: /O=Grid/OU=GlobusTest/OU=simpleCA.mymachine/OU=mymachine/CN=John Doe
Enter GRID pass phrase for this identity:
Creating proxy ................................. Done
Your proxy is valid until: Tue Oct 26 01:33:42 2004

Specifying and submitting a simple job

The specification of a job to submit is to be written by the user in a job description XML file.

Here is an example of a simple job description:

<?xml version="1.0" encoding="UTF-8"?>
<job>
    <executable>/bin/echo</executable>
    <argument>this is an example_string </argument>
    <argument>Globus was here</argument>
    <stdout>${GLOBUS_USER_HOME}/stdout</stdout>
    <stderr>${GLOBUS_USER_HOME}/stderr</stderr>
</job>

Submitting this job description using the globusrun-ws tool will give:

% bin/globusrun-ws -submit -f test_super_simple.xml
Submitting job...Done.
Job ID: uuid:c51fe35a-4fa3-11d9-9cfc-000874404099
Termination time: 12/17/2004 20:47 GMT
Current job state: Active
Current job state: CleanUp
Current job state: Done
Destroying job...Done.

Note the usage of the substitution variable ${GLOBUS_USER_HOME} wich resolves to the user home directory.

Here is an example with more job description parameters:

<?xml version="1.0" encoding="UTF-8"?>
<job>
    <executable>/bin/echo</executable>
    <directory>/tmp</directory>
    <argument>12</argument>
    <argument>abc</argument>
    <argument>34</argument>
    <argument>this is an example_string </argument>
    <argument>Globus was here</argument>
    <environment>
        <name>PI</name>
        <value>3.141</value>
    </environment>
    <stdin>/dev/null</stdin>
    <stdout>stdout</stdout>
    <stderr>stderr</stderr>
    <count>2</count>
</job>

Note that in this example, a <directory> element specifies the current directory for the execution of the command on the execution machine to be /tmp, and the standard output is specified as the relative path stdout. The output is therefore written to /tmp/stdout:

% cat /tmp/stdout
12 abc 34 this is an example_string  Globus was here

Obtaining reusable delegated credentials

It is possible to use delegation command-line clients to obtain and potentially refresh delegated credentials in order to use them when submitting jobs to WS GRAM. This for instance enables the submission of many jobs using the same delegated credentials (or set thereof) for all submissions, which can significantly decrease the number of remote calls if the number of jobs is important.

Finding which schedulers are interfaced by the WS GRAM installation

Unfortunately there is no option yet to print the list of local resource managers supported by a given GRAM service installation. Such information must currently be provided out of band to the user. The GRAM name of local resource managers for which GRAM support has been installed can be obtained by looking at the GRAM configuration on the GRAM server-side machine, as explained here.

The GRAM name of the local resource manager can be used with the factory type option of the job submission command-line tool to specify which factory resource to use when submitting a job.

Specifying file staging in the job description

In order to do file staging one must add specific elements to the job description. The file transfer directives follow the RFT syntax, which enables third-party transfers. Each file transfer must therefore specify a source URL and a destination URL. URLs are specified as GridFTP URLs (for remote files) or as file URLs (for local files).

For instance, in the case of staging a file in, the source URL would be a GridFTP URL (for instance gsiftp://job.submitting.host:2811/tmp/mySourceFile) resolving to a source document accessible on the file system of the job submission machine (for instance /tmp/mySourceFile). At run-time the Reliable File Transfer service used by the GRAM service on the remote machine would fetch the remote file using the GridFTP protocol and write it reliably to the specified local file (for instance file:///${GLOBUS_USER_HOME}/my_transfered_file, which resolves to ~/my_transfered_file). Here is how the stage-in directive would look like:

    <fileStageIn>
        <transfer>
            <sourceUrl>gsiftp://job.submitting.host:2811/tmp/mySourceFile</sourceUrl>
            <destinationUrl>file:///${GLOBUS_USER_HOME}/my_transfered_file</destinationUrl>
        </transfer>
    </fileStageIn>

Note: additional RFT-defined quality of service requirements can be specified for each transfer. See the RFT documentation for more information.

Here is an example job description with file stage-in and stage-out:

<job>
    <executable>my_echo</executable>
    <directory>${GLOBUS_USER_HOME}</directory>
    <argument>Hello</argument>
    <argument>World!</argument>
    <stdout>${GLOBUS_USER_HOME}/stdout</stdout>
    <stderr>${GLOBUS_USER_HOME}/stderr</stderr>
    <fileStageIn>
        <transfer>
            <sourceUrl>gsiftp://job.submitting.host:2811/bin/echo</sourceUrl>
            <destinationUrl>file:///${GLOBUS_USER_HOME}/my_echo</destinationUrl>
        </transfer>
    </fileStageIn>
    <fileStageOut>
        <transfer>
            <sourceUrl>file://${GLOBUS_USER_HOME}/stdout</sourceUrl>
            <destinationUrl>gsiftp://job.submitting.host:2811/tmp/stdout</destinationUrl>
        </transfer>
    </fileStageOut>
    <fileCleanUp>
        <deletion>
            <file>file://${GLOBUS_USER_HOME}/my_echo</file>
        </deletion>
    </fileCleanUp>
</job>

Note that the job description XML does not need to include a reference to the schema that describes its syntax. As a matter of fact it is possible to omit the namespace in the GRAM job description XML elements as well. The submission of this job to the GRAM services causes the following sequence of actions:

  1. The /bin/echo executable is transfered from the submission machine to the GRAM host file system. The destination location is the HOME directory of the user on behalf of whom the job is executed by the GRAM services (see <fileStageIn>).
  2. The transfered executable is used to print a test string (see <executable>, <directory> and the <argument> elements) on the standard output, which is redirected to a local file (see <stdout>).
  3. The standard output file is transfered to the submission machine (see <fileStageOut>).
  4. The file that was initially transfered during the stage-in phase is removed from the file system of the GRAM installation (see <fileCleanup>).

Specifying and submitting a multijob

The job description XML schema allows for specification of a multijob i.e. a job that is itself composed of several executable jobs, which we will refer to as subjobs (note: subjobs cannot be multijobs, so the structure is not recursive). This is useful for instance in order to bundle a group of jobs together and submit them as a whole to a remote GRAM installation.

Note that no relationship can be specified between the subjobs of a multijob. The subjobs are submitted to job factory services in their order of appearance in the multijob description.

Within a multijob description, each subjob description must come along with an endpoint for the factory to submit the subjob to. This enables the at-once submission of several jobs to different hosts. The factory to which the multijob is submitted acts as an intermediary tier between the client and the eventual executable job factories.

Here is an example of a multijob description:

<?xml version="1.0" encoding="UTF-8"?>
<multiJob xmlns:gram="http://www.globus.org/namespaces/2004/10/gram/job" 
     xmlns:wsa="http://schemas.xmlsoap.org/ws/2004/03/addressing">
    <factoryEndpoint>
        <wsa:Address>
            https://localhost:8443/wsrf/services/ManagedJobFactoryService
        </wsa:Address>
        <wsa:ReferenceProperties>
            <gram:ResourceID>Multi</gram:ResourceID>
        </wsa:ReferenceProperties>
    </factoryEndpoint>
    <directory>${GLOBUS_LOCATION}</directory>
    <count>1</count>

    <job>
        <factoryEndpoint>
            <wsa:Address>https://localhost:8443/wsrf/services/ManagedJobFactoryService</wsa:Address>
            <wsa:ReferenceProperties>
                <gram:ResourceID>Fork</gram:ResourceID>
            </wsa:ReferenceProperties>
        </factoryEndpoint>
        <executable>/bin/date</executable>
        <stdout>${GLOBUS_USER_HOME}/stdout.p1</stdout>
        <stderr>${GLOBUS_USER_HOME}/stderr.p1</stderr>
        <count>2</count>
    </job>

    <job>
        <factoryEndpoint>
            <wsa:Address>https://localhost:8443/wsrf/services/ManagedJobFactoryService</wsa:Address>
            <wsa:ReferenceProperties>
                <gram:ResourceID>Fork</gram:ResourceID>
            </wsa:ReferenceProperties>
        </factoryEndpoint>
        <executable>/bin/echo</executable>
        <argument>Hello World!</argument>        
        <stdout>${GLOBUS_USER_HOME}/stdout.p2</stdout>
        <stderr>${GLOBUS_USER_HOME}/stderr.p2</stderr>
        <count>1</count>
    </job>

</multiJob>

Notes:

  • The <ResourceID> element within the <factoryEndpoint> WS-Addressing endpoint structures must be qualified with the appropriate GRAM namespace.
  • Apart from the factoryEndpoint element, all elements at the enclosing multijob level act as defaults for the subjob parameters, in this example <directory> and <count>.
  • The default <count> value is overridden in the subjob descriptions.

In order to submit a multijob description, use a job submission command-line tool and specify the Managed Job Factory resource to be Multi. For instance, submitting the multijob description above using globusrun-ws, we obtain:

% bin/globusrun-ws -submit -f test_multi.xml
Delegating user credentials...Done.
Submitting job...Done.
Job ID: uuid:bd9cd634-4fc0-11d9-9ee1-000874404099
Termination time: 12/18/2004 00:15 GMT
Current job state: Active
Current job state: CleanUp
Current job state: Done
Destroying job...Done.
Cleaning up any delegated credentials...Done.

A multijob resource is created by the factory and exposes a set of WSRF resource properties different than the resource properties of an executable job. The state machine of a multijob is also different since the multijob represents the overall execution of all the executable jobs it is composed of.

Command-line tools

Job submission

Semantics and syntax of domain-specific interface data

Please see the job description document for details about the job description language used to define GRAM jobs.

Graphical user interfaces

There is no support for this type of interface for WS GRAM.

Usage statistics collection by the Globus Alliance

The following usage statistics are sent by default in a UDP packet (in addition to the GRAM component code, packet version, timestamp, and source IP address) at the end of each job (i.e. when Done or Failed state is entered).

  • job creation timestamp (helps determine the rate at which jobs are submitted)
  • scheduler type (Fork, PBS, LSF, Condor, etc...)
  • jobCredentialEndpoint present in RSL flag (to determine if server-side user proxies are being used)
  • fileStageIn present in RSL flag (to determine if the staging in of files is used)
  • fileStageOut present in RSL flag (to determine if the staging out of files is used)
  • fileCleanUp present in RSL flag (to determine if the cleaning up of files is used)
  • CleanUp-Hold requested flag (to determine if streaming is being used)
  • job type (Single, Multiple, MPI, or Condor)
  • gt2 error code if job failed (to determine common scheduler script errors users experience)
  • fault class name if job failed (to determine general classes of common faults users experience)

If you wish to disable this feature, please see the Java WS Core System Administrator's Guide section on Usage Statistics Configuration for instructions.

Also, please see our policy statement on the collection of usage statistics.

 

 


This document is a work-in-progress and applies to this development release. The latest drafts of docs can be found in the GT4 Documentation Roadmap.

You are strongly encouraged to file bugs for both the development documentation and software on our Bugzilla page.

Consider joining a new discussion list called gt4-friends@globus.org to exchange ideas about GT4 development, such as documentation and testing. To subscribe to gt4-friends, send an email to majordomo@globus.org which contains the words "subscribe gt4-friends" in the message body. You must subscribe to gt4-friends in order to send mail to it. We appreciate your participation.