GT 3.9.5 Component Guide to Public Interfaces: WS GRAM

Semantics and syntax of APIs

Programming Model Overview

This component consists abstractly of two interfaces: the managed job factory port type(MJFPT) and the manaaged job port type (MJPT).

In actuality there are three service/resource implementations, two of which implement the basic MJPT. The first one is the service which actually talks to a particular local resource manager to execute a process on the remote computer or cluster. This one is called a managed executable job service (MEJS) and it's resource is called the managed executable job resource (MEJR). The second is a special implementation which accepts a multi-job description, breaks the description up into single-job descriptions, and then submits each of these so-called "sub-jobs" to an MEJS. This implementation is called the managed multi-job service (MMJS). It's resource is called the managed multi-job resource (MMJR)

Because of the fact that these two job services use the same port type, the API for accessing both the MEJR and the MMJR are identical. The MJFS creates the appropriate job resource depending on the factory resource used to qualify the operation call. Most of the factory resources represent local resource managers used by the MEJS (Pbs, LSF, Condor). There is a special Multi factory resource which represents an abstract multi-job resource manager. The appropriate job description type is required for the two different types of managed job.

Component API

Java API Documentation Links (Javadoc) C API Documentation Links

    to be completed...

Semantics and syntax of the WSDL

Protocol overview

WS-GRAM allows for remote execution and management of programs through the creation of a managed job. The management of the job is taken care of primarily by core toolkit functionality (WS-ResourceLifetime and WS-BaseN implementations). Please see the Java WS Core documentation on notifications and resource lifetime (destruction) for more information.

Managed Job Factory Service (MJFS)

A single MJFS is used to create all jobs for all users. For each local resource manager, a dedicated Managed Job Factory Resource (MJFR) enables the MJFS to publish information about the characteristics of the compute resource, for example:

  • host information
  • GridFTP URL (for file staging and streaming)
  • compute cluster size and configuration, and so on...
In addition, there is a special MJFR which is used for creating MMJRs.

Managed Executable Job Service (MEJS)

A single MEJS is used to manage all executable jobs for all users. Each Managed Executable Job Resource (MEJR) enables the MEJS to publish information about the individual job the MEJR represents. This information can be accessed by querying the MEJS for the resource properties of a given MEJR, such as the:

  • current job state
  • stdout location
  • stderr location
  • exit code, and so on.

Managed Multi-Job Service (MMJS)

A single MMJS is used to manage all multi-jobs for all users. Each Managed Multi-Job Resource (MMJR) enables the MMJS to publish information about the individual multi-job the MMJR represents. This information can be accessed by querying the MMJS for the resource properties of a given MMJR, such as the:

  • current overall job state
  • list of sub-job EPRs

Operations

There are just two operations defined in the GRAM port types (not counting the Rendezvous port type which is used for MPI job synchronization): "createManagedJob" in the Managed Job Factory port type, and "release" in the Managed Job port type. All other operations (such as canceling/killing the job and querying for resource properties) are provided by the underlying WSRF implementation of the toolkit.

ManagedJobFactoryPortType

  • createManagedJob: This operation creates either a MEJR or MMJR, subscribes the client for notifications if requested, and replies with one or two endpoint references (EPRs). The input of this operation consists of a job description, an optional initial termination time for the job resource, and an optional state notification subscription request.

The first EPR:

  • is qualified with the identifier to the newly created MEJR or MMJR
  • points to either the MEJS or MMJS.

The second EPR:

  • is only present if a notification subscription was requested
  • is qualified with the identifier to the newly created subscription resource
  • points to the subscription manager service.

It should be noted that using the optional subscription request is the only way to guarantee that all notification messages will reach the client.

The ManagedJobFactoryPortType also has all the operations and publishes all the resource properties (via the MJFR) defined in the following WS-ResourceProperties port types:

  • GetResourceProperty
  • GetMultipleResourceProperties
  • QueryResourceProperties

ManagedJobPortType

  • release: This operation takes no parameters and returns nothing. Its purpose is to release a hold placed on a state through the use of the "holdState" field in the job description. See the domain-specific WS GRAM component documentation for more information on the "holdState" field.

The ManagedJobPortType also has all the operations and publishes all the resource properties (via the MJFR) defined in the following port types:

WS-ResourceProperties port types:

  • GetResourceProperty
  • GetMultipleResourceProperties
  • QueryResourceProperties

WS-ResourceLifetime port types:

  • ScheduledResourceTermination
  • ImmediateResourceTermination

WS-BaseNotification port type:

  • NotificationProducer

Managed Executable Job Port Type

This port type does not define any new operations. See the section on resource properties.

Managed Multi-Job Port Type

This port type does not define any new operations. See the section on resource properties.

Resource properties

Managed Job Factory Port Type

  • localResourceManager: The local resource manager type (i.e. Condor, Fork, LSF, Multi, PBS, etc...)
  • globusLocation: The location of the Globus Toolkit installation that these services are running under.
  • hostCPUType: The job host CPU architecture (i686, x86_64, etc...)
  • hostManufacturer: The host manufacturer name. May be "unknown".
  • hostOSName: The host OS name (Linux, Solaris, etc...)
  • hostOSVersion: The host OS version.
  • scratchBaseDirectory: The directory recommended by the system administrator to be used for temporary job data.
  • delegationFactoryEndpoint: The endpoint reference to the delegation factory used to delegated credentials to the job.
  • stagingDelegationFactoryEndpoint: The endpoint reference to the delegation factory used to delegated credentials to the staging service (RFT).
  • condorArchitecture: Condor architecture label.
  • condorOS: Condor OS label.
  • GLUECE: GLUE data
  • GLUECESummary: GLUE data summary

Managed Job Port Type

  • serviceLevelAgreement: A wrapper around fields containing the single-job and multi-job descriptions or RSLs. Only one of these sub-fields shall have a non-null value.
  • state: The current state of the job.
  • fault: The fault (if generated) indicating the reason for failure of the job to complete.
  • localUserId: The job owner's local user account name.
  • userSubject: The GSI certificate DN of the job owner.
  • holding: Indicates whether a hold has been placed on this job.

Managed Executable Job Port Type

  • stdoutURL: A GridFTP URL to the file generated by the job which contains the stdout.
  • stderrURL: A GridFTP URL to the file generated by the job which contains the stderr.
  • credentialPath: The path (relative to the job process) to the file containing the user proxy used by the job to authenticate out to other services.
  • exitCode: The exit code generated by the job process.

Managed Multi-Job Port Type

  • subJobEndpoint: A set of endpoint references to the sub-jobs created by this multi-job.

Faults

  • FaultType: This is the base fault for runtime errors that occur while managing a job. It extends the OGSI FaultType.
  • CredentialSerializationFaultType: This fault indicates that the managed job service was unable to serialize or deserialize a delegated credential.
  • InsufficientCredentialsFaultType: This fault indicates that the managed job service was unable to perform some action on behalf of the owner of the job submission because the owner has delegated insufficient credentials.
  • InternalFaultType: This fault indicates that an internal operation failed.
  • InvalidCredentialsFaultType: This fault indicates that the managed job service was unable to use a delegated credential.
  • ServiceLevelAgreementFaultType: Fault for runtime errors which are directly related to a particular part of the ServiceLevelAgreement document passed to the createService method. This fault type contains the fragment of the ServiceLevelAgreement related to the fault as one of its elements.
  • ExecutionFailedFaultType: This fault indicates that the Managed Job service was unable to begin the execution of the job.
  • FilePermissionsFaultType: This fault indicates that the ManagedJob service does not have permissions to access a file referenced in the ServiceLevelAgreement.
  • InvalidPathFaultType: This fault indicates that a file or directory path referenced in the ServiceLevelAgreement contains an invalid path.
  • StagingFaultType: This fault indicates that part of the file staging requirements of the ServiceLevelAgreement could not be completed.
  • UnsupportedFeatureFaultType: This fault indicates that an error occurred because the RSL depended on a feature not implemented by a particular GRAM scheduler.

WSDL and Schema Definition

Command-line tools

Job submission

Graphical User Interface

There is no support for this type of interface for WS GRAM.

Semantics and syntax of domain-specific interface data

Please see the job description document for details about the job description language used to define GRAM jobs.

Configuration interface

Locating configuration files

All the GRAM service configuration files are located in subdirectories of the $GLOBUS_LOCATION/etc directory. The names of the GRAM configuration directories all start with gram-service. For instance, with a default GRAM installation, the command line:

% ls etc | grep gram-service
gives the following output:
gram-service
gram-service-Fork
gram-service-Multi

Web service deployment configuration

The file $GLOBUS_LOCATION/etc/gram-service/server-config.wsdd contains information necessary to deploy and instantiate the GRAM services in the Globus container.

Three GRAM services are deployed:

  • ManagedExecutableJobService: service invoked when querying or managing an executable job
  • ManagedMultiJobService: service invoked when querying or managing a multijob
  • ManagedJobFactoryService: service invoked when submitting a job
Each service deployment information contains the name of the Java service implementation class, the path to the WSDL service file, the name of the operation providers that the service reuses for its implementation of WSDL-defined operations, etc. More information about the service deployment configuration information can be found here.

JNDI application configuration

The configuration of WSRF resources and application-level service configuration not related to service deployment is contained in JNDI files. The JNDI-based GRAM configuration is of two kinds:

Common job factory configuration

The file $GLOBUS_LOCATION/etc/gram-service/jndi-config.xml contains configuration information that is common to every local resource manager.

More precisely, the configuration data it contains pertains to the implementation of the GRAM WSRF resources (factory resources and job resources), as well as initial values of WSRF resource properties that are always published by any Managed Job Factory WSRF resource.

The data is categorized by service, because according to WSRF, in spite of the service/resource separation of concern, a given service will use only one XML Schema type of resource. In practice it is therefore clearer to categorize the configuration resource implementation by service, even if theoretically speaking a given resource implementation could be used by several services. For more information, refer to the Java WS Core documentation.

Here is the decomposition, in JNDI objects, of the common configuration data, categorized by service. Each XYZHome object contains the same Globus Core-defined information for the implementation of the WSRF resource, such as the Java implementation class for the resource (resourceClass datum), the Java class for the resource key (resourceKeyType datum), etc.

  • ManagedExecutableJobService
    • ManagedExecutableJobHome: configuration of the implementation of resources for the service.
  • ManagedMultiJobService
    • ManagedMultiJobHome: configuration of the implementation of resources for the service
  • ManagedJobFactoryService
    • FactoryServiceConfiguration: this encapsulates configuration information used by the factory service. Currently this identifies the service to associate to a newly created job resource in order to create an endpoint reference and return it.
    • ManagedJobFactoryHome: implementation of resources for the service resourceClass
    • FactoryHomeConfiguration: this contains GRAM application-level configuration data i.e. values for resource properties common to all factory resources. For instance, the path to the Globus installation, host information such as CPU type, manufacturer, operating system name and version, etc.

Local resource manager configuration

When a SOAP call is made to a GRAM factory service in order to submit a job, the call is actually made to a GRAM service-resource pair, where the factory resource represents the local resource manager to be used to execute the job.

There is one directory gram-service-<manager>/ for each local resource manager supported by the GRAM installation.

For instance, let's assume the command line:

% ls etc | grep gram-service-
gives the following output:
gram-service-Fork
gram-service-LSF
gram-service-Multi

In this example, the Multi, Fork and LSF job factory resources have been installed. Multi is a special kind of local resource manager which enables the GRAM services to support multijobs.

The JNDI configuration file located under each manager directory contains configuration information for the GRAM support of the given local resource manager, such as the name that GRAM uses to designate the given resource manager. This is referred to as the GRAM name of the local resource manager.

For instance, $GLOBUS_LOCATION/etc/gram-service-Fork/jndi-config.xml contains the following XML element structure:

    <service name="ManagedJobFactoryService">
        <!-- LRM configuration:  Fork -->
        <resource
            name="ForkResourceConfiguration"
            type="org.globus.exec.service.factory.FactoryResourceConfiguration">
            <resourceParams>
                [...]
                <parameter>
                    <name>
                        localResourceManagerName
                    </name>
                    <value>
                        Fork
                    </value>
                </parameter>           
                <parameter>
                    <name>
                        scratchDirectory
                    </name>
                    <value>
                        ${GLOBUS_USER_HOME}
                    </value>
                </parameter>           
            </resourceParams>
        </resource>        
    </service>

In the example above, the GRAM name of the local resource manager is Fork. This value can be used with the GRAM command line client in order to specify which factory resource to use when submitting a job. Similarly, it is used to create contruct an endpoint reference to the chosen factory service-resource pair when using the GRAM client API.

In the example above, the scratchDirectory is set to ${GLOBUS_USER_HOME}. This is the default setting, it can be configured to point to an alternate netowrk file sustem path that is common to the compute cluster and is typically less reliable (auto purging), while offering a greater amount of disk space. (e.g. /scratch)

Security descriptor

The file $GLOBUS_LOCATION/etc/gram-service/managed-job-factory-security-config.xml contains the Core security configuration for the GRAM ManagedJobFactory service:
  • default security information for all remote invocations, such as:
    • the authorization method, based on a Gridmap file (in order to resolve user credentials to local user names)
    • limited proxy credentials will be rejected
  • security information for the createManagedJob operation
The file $GLOBUS_LOCATION/etc/gram-service/managed-job-security-config.xml contains the Core security configuration for the GRAM job resources:
  • The default is to only allow the identity that called the createManagedJob operation to access the resource.
Note: GRAM does not override the container security credentials defined in $GLOBUS_LOCATION/etc/globus_wsrf_core/global_security_descriptor.xml. These are the credentials used to authenticate all service requests.

GRAM and GridFTP file system mapping

The file $GLOBUS_LOCATION/etc/gram-service/globus_gram_fs_map_config.xml contains information to associate local resource managers with GridFTP servers. GRAM uses the GridFTP server (via RFT) to perform all file staging directives. Since the GridFTP server and the Globus service container can be run on separate hosts, a mapping is needed between the common file system paths of these 2 hosts. This enables the GRAM services to resolve file:/// staging directives to the local GridFTP URLs.

below is the default Fork entry. mapping a jobPath of / to ftpPath of / will allow any file staging directive to be attempted.

    <map>
        <scheduler>Fork</scheduler>
        <ftpServer>
           <protocol>gsiftp</protocol>
           <host>myhost.org</host>
           <port>2811</port>
        </ftpServer>
        <mapping>
           <jobPath>/</jobPath>
           <ftpPath>/</ftpPath>
        </mapping>
    </map>
For a scheduler, where jobs will typically run on a compute node, a default entry is not provided. This means staging directives will fail until a mapping is entered. Here is an example of a compute cluster with PBS installed and has 2 common mount points between the front end host and the GridFTP server host.
    <map>
        <scheduler>PBS</scheduler>
        <ftpServer>
           <protocol>gsiftp</protocol>
           <host>myhost.org</host>
           <port>2811</port>
        </ftpServer>
        <mapping>
           <jobPath>/pvfs/mount1/users</jobPath>
           <ftpPath>/pvfs/mount2/users</ftpPath>
        </mapping>
        <mapping>
           <jobPath>/pvfs/jobhome</jobPath>
           <ftpPath>/pvfs/ftphome</ftpPath>
        </mapping>
    </map>
The file system mapping schema doc is here.

 Scheduler-Specific Configuration Files

In addition to the service configuration described above, there are scheduler-specific configuration files for the Scheduler Event Generator modules. These files consist of name=value pairs separated by newlines. These files are:

$GLOBUS_LOCATION/etc/globus-fork.conf
Configuration for the Fork SEG module implementation. The attributes names for this file are:
log_path
Path to the SEG Fork log (used by the globus-fork-starter and the SEG). The value of this should be the path to a world-writable file. The default value for this created by the Fork setup package is $GLOBUS_LOCATION/var/globus-fork.log. This file must be readable by the account that the SEG is running as.
$GLOBUS_LOCATION/etc/globus-condor.conf
Configuration for the Condor SEG module implementation. The attributes names for this file are:
log_path
Path to the SEG Condor log (used by the Globus::GRAM::JobManager::condor perl module and Condor SEG module. The value of this should be the path to a world-readable and world-writable file. The default value for this created by the Fork setup package is $GLOBUS_LOCATION/var/globus-condor.log
$GLOBUS_LOCATION/etc/globus-pbs.conf
Configuration for the PBS SEG module implementation. The attributes names for this file are:
log_path
Path to the SEG PBS logs (used by the Globus::GRAM::JobManager::pbs perl module and PBS SEG module. The value of this should be the path to the directory containing the server logs generated by PBS. For the SEG to operate, these files must have file permissions such that the files may be read by the user the SEG is running as.
$GLOBUS_LOCATION/etc/globus-lsf.conf
Configuration for the PBS SEG module implementation. The attributes names for this file are:
log_path
Path to the SEG LSF log directory. This is used by the LSF SEG module. The value of this should be the path to the directory containing the server logs generated by LSF. For the SEG to operate, these files must have file permissions such that the files may be read by the user the SEG is running as.

Environment variable interface

There is no support for this type of interface for WS GRAM.