GT 3.9.4 WS GRAM: Developer's Guide

Introduction

This guide is intended to help a developer create compatible WS-GRAM clients and alternate service implementations.

The key concepts for the GRAM component have not changed. Its purpose is still to provide the mechanisms to execute remote applications for a user. Given an RSL (Resource Specification Language) job description, GRAM submits the job to a scheduling system such as PBS or Condor, or to a simple fork-based way of spawning processes, and monitors it until completion. More details can be found here:

http://www-unix.globus.org/toolkit/docs/3.2/gram/key

Architecture and design overview

The GRAM services in GT 3.9.4 are WSRF compliant. One of the key concepts in the WSRF specification is the decoupling of a service with the public "state" of the service in the interface via the implied resource pattern. Following this concept, the data of GT 3.9.4 GRAM jobs is published as part of WSRF resources, while there is only one service to start jobs or query and monitor their state. This is different from the OGSI model of GT3 where each job was represented as a separate service. There still is a job factory service that can be called in order to create job instances (represented as WSRF resources). Each scheduling system that GRAM is interfaced with is represented as a separate factory resource. By making a call to the factory service while associating the call to the appropriate factory resource, the job submitting actor can create a job resource mapping to a job in the chosen scheduling system.

Public interface

The semantics and syntax of the APIs and WSDL for the component, along with descriptions of domain-specific structured interface data, can be found in the public interface guide.

Usage scenarios

[describe how to use the programatic interfaces of the component, provide examples]

Tutorials

The following tutorials are available for WS GRAM developers:

Feature summary

Features new in release 3.9.4

  • Improved service performance:
    • Job concurrency
    • Throughput
    • Latency
  • Improved service reliability/recovery
  • Support for mpich-g2 jobs:
    • multi-job submission capabilites
    • ability to coordinate processes in a job
    • ability to coordinate subjobs in a multi-job
  • Publishing of the job's exit code
  • The ability to select the account under which the remote job will be run. If a user's grid credential is mapped to multiple accounts, then the user can specify, in the RSL, under which account the job should be run.

Other Supported Features

  • Remote job execution and management
  • Uniform and flexible interface to batch scheduling systems
  • File staging before and after job execution

Deprecated Features

  • Service managed data streaming of job's stdout/err during execution.
  • File staging using the GASS protocol
  • File caching of stages files, e.g. GASS Cache

Tested platforms

Tested platforms for GRAM:

  • Linux

Backward compatibility summary

Protocol changes since GT version 3.2:

  • The protocol has been changed to be WSRF compliant. There is no backward compatibility between this version and any previous versions.

API changes since GT version 3.2:

  • The MJFS create() operation has become createManagedJob() and, now provides the option to send a uuid. A client can use this uuid to recover a job handle in the event that the reply message is not received. Given this new method, the start() operation can be removed. This operation also allows a notification subscription to specified. Without the start() operation, this is the only way to reliably get all job state notifications.
  • The MJS start() operation has been removed. Its purpose was to ensure that the client had recieved the job handle prior to the job being submitted (and thus consuming resources), and is redundant with the uuid functionality.

Exception changes since GT version 3.2:

  • list the new/changed exceptions here...

Schema changes since GT version 3.2. See the 3.9.4 User's Guide for more information about the new RSL syntax:

  • Executable is now a single local file path. Remote URLs are no longer allowed. If executable staging is desired, it should be added to the fileStageIn directive.
  • stdin is now a single local file path. Remote URLs are no longer allowed. If stdin staging is desired, it should be added to the fileStageIn directive.
  • stdout is now a single local file path, instead of a list of remote URLs. If stdout staging is desired, it should be added to the fileStageOut directive.
  • stderr is now a single local file path, instead of a list of remote URLs. If stderr staging is desired, it should be added to the fileStageOut directive.
  • scratch_directory has been removed.
  • GramMyJobType has been removed. "Collective" functionality is always available if a job chooses to use it.
  • File Staging related RSL attributes have been replaced with RFT file stransfer attributes/syntax.
  • RSL substitution definitions and substitution references have been removed in order to be able to use standard XML parsing/serialization tools.
  • RSL variables have been added. These are keywords denoted in the form of ${variable name} that can be found in certain RSL attributes.
  • Explicit credential references have added, which, along with use of the new DelegationFactory service, replace the old implicit delegation model.

Technology dependencies

GRAM depends on the following GT components:

  • Java WS Core
  • Transport-Level Security
  • Delegation Service
  • RFT
  • GridFTP
  • MDS - internal libraries

GRAM depends on the following 3rd party software. The dependency exists only for the batch schedulers configured, thus making job submissions possible to the batch scheduling service:

  • PBS
  • Condor
  • LSF
  • other batch schedulers... (where the GRAM scheduler interface has been implemented)

Security considerations

[describe security considerations relevant for this component]

Troubleshooting

[help for common problems developers may experience]

Related Documentation

[could link to pdfs and whitepapers about protocols, etc re: the component]