GT 3.9.5 WS GRAM : Developer's Guide
- Introduction
- Architecture and design overview
- Public interface
- Usage scenarios
- Tutorials
- Feature summary
- Tested platforms
- Backward compatibility summary
- Technology dependencies
- Security considerations
- Debugging
- Troubleshooting
- Related Documentation
Introduction
This guide is intended to help a developer create compatible WS GRAM clients and alternate service implementations.The key concepts for the GRAM component have not changed. Its purpose is still to provide the mechanisms to execute remote applications for a user. Given an RSL (Resource Specification Language) job description, GRAM submits the job to a scheduling system such as PBS or Condor, or to a simple fork-based way of spawning processes, and monitors it until completion. More details can be found here:
http://www-unix.globus.org/toolkit/docs/3.2/gram/key
Architecture and design overview
The GRAM services in GT 3.9.5 are WSRF compliant. One of the key concepts in the WSRF specification is the decoupling of a service with the public "state" of the service in the interface via the implied resource pattern. Following this concept, the data of GT 3.9.5 GRAM jobs is published as part of WSRF resources, while there is only one service to start jobs or query and monitor their state. This is different from the OGSI model of GT3 where each job was represented as a separate service. There is still a job factory service that can be called in order to create job instances (represented as WSRF resources). Each scheduling system that GRAM is interfaced with is represented as a separate factory resource. By making a call to the factory service while associating the call to the appropriate factory resource, the job submitting actor can create a job resource mapping to a job in the chosen scheduling system.
Public interface
The semantics and syntax of the APIs and WSDL for the component, along with descriptions of domain-specific structured interface data, can be found in the public interface guide.
Usage scenarios
Java
The following is a general scenario for submitting a job using the Java stubs and APIs. Please consult the Java WS Core API, Delegation API, Reliable File Transfer API, and WS-GRAM API documentation for details on package names for classes referenced in the code excerpts.- Load the RSL
File rslFile = new File("myrsl.xml"); JobDescriptionType rsl = RSLHelper.readRSL(rslFile);The object
rslwill be of sub-type MultiJobDescriptionType if the file contents is a multi-job RSL. - Create the factory service stub
URL factoryUrl = ManagedJobFactoryClientHelper.getServiceURL( contactString).getURL(); String factoryType = ManagedJobFactoryConstants.FACTORY_TYPE.<factory type constant>; EndpointReferenceType factoryEndpoint = ManagedJobFactoryClientHelper.getFactoryEndpoint(factoryUrl, factoryType); ManagedJobFactoryPortType factoryPort = ManagedJobFactoryClientHelper.getPort(factoryEndpoint);The format of
contactStringis [protocol://]host[:port][/servicepath]. - Set stub security parameters
ClientSecurityDescriptor secDesc = new ClientSecurityDescriptor(); secDesc.setGSITransport(Constants.<protection level constant>); secDesc.setAuthz(<
Authorizationsub-class instance>); if (proxy != null) { secDesc.setGSSCredential(proxy); } ((Stub) port)._setProperty(Constants.CLIENT_DESCRIPTOR, secDesc);Use setGSISecureMsg() for GSI Secure Message.
- Query factory resource properties
One at a time
GetResourcePropertyResponse response = factoryport.getResourceProperty(ManagedJobConstants.<RP constant>); SOAPElement[] any = response.get_any(); ... = ObjectDeserializer.toObject(any[0], <RP type>.class);Many at a time
GetMultipleResourceProperties_Element rpRequest = new GetMultipleResourceProperties_Element(); rpRequest.setResourceProperty(new QName[] { ManagedJobFactoryConstants.<RP constant #1>, ManagedJobFactoryConstants.<RP constant #2>, ManagedJobFactoryConstants.<RP constant #N> }); GetMultipleResourcePropertiesResponse response = factoryPort.getMultipleResourceProperties(rpRequest); SOAPElement[] any = response.get_any(); ... = ObjectDeserializer.toObject(any[0], <RP #1 type>.class); ... = ObjectDeserializer.toObject(any[0], <RP #2 type>.class); ... = ObjectDeserializer.toObject(any[0], <RP #N type>.class); - Delegate credentials (if needed)
X509Certificate certToSign = DelegationUtil.getCertificateChainRP( delegFactoryEndpoint, //EndpointReferenceType secDesc, //ClientSecurityDescriptor )[0]; //first element in the returned array EndpointReferenceType credentialEndpoint = DelegationUtil.delegate( delegFactoryurl, //String credential, //GlobusCredential certToSign, //X509Certificate lifetime, //int (seconds) fullDelegation, //boolean secDesc); //ClientSecurityDescriptorThere are three types of delegated credentials:- Credential used by the job to generate user-owned proxy:
rsl.setJobCredential(credentialEndpoint);
- Credential used to contact RFT for staging and file clean up:
rsl.setStagingCredentialEndpoint(credentialEndpoint);
- Credential used by RFT to contact GridFTP servers:
TransferRequestType stageOut = rsl.getFileStageOut(); stageOut.setTransferCredential(credentialEndpoint);Do the same for fileStageIn and fileCleanUp.
- Credential used by the job to generate user-owned proxy:
- Create the job resource
CreateManagedJobInputType jobInput = new CreateManagedJobInputType(); jobInput.setJobID(new AttributedURI("uuid:" + UUIDGenFactory.getUUIDGen())); jobInput.setInitialTerminationTime(<Calendar instance>); if (multiJob) jobInput.setMultiJob(rsl) else jobInput.setJob(rsl); if (subscribeOnCreate) jobInput.setSubscribe(subscriptionReq); CreateManagedJobOutputType createResponse = factoryPort(createManagedJob(jobInput); EndpointReferenceType jobEndpoint = createResponse.getManagedJobEndpoint(); - Create the job service stub
ManagedJobPortType jobPort = ManagedJobClientHelper.getPort(jobEndpoint);
You must set the appropriate security parameters for the job service stub (
jobPort) as well. - Subscribe for job state notifications
NotificationConsumerManager notifConsumerManager = NotificationConsumerManager.getInstance(); List topicPath = new LinkedList(); topicPath.add(ManagedJobConstants.RP_STATE); ResourceSecurityDescriptor resourceSecDesc = new ResourceSecurityDescriptor(); resourceSecDesc.setAuthz(Authorization.<authz type constant>); Vector authMethods = new Vector(); authMethods.add(GSISecureTransportAuthMethod.BOTH); resourceSecDesc.setAuthMethods(authMethods); EndpointReferenceType notificationConsumerEndpoint = notifConsumerManager.createNotificationConsumer( topicPath, this, resourceSecDesc); Subscribe subscriptionReq = new Subscribe(); subscriptionReq.setConsumerReference( notificationConsumerEndpoint); TopicExpressionType topicExpression = new TopicExpressionType( WSNConstants.SIMPLE_TOPIC_DIALECT, ManagedJobConstants.RP_STATE); subscriptionReq.setTopicExpression(topicExpression); EndpointReferenceType subscriptionEndpoint;- Subscribe on creation
jobInput.setSubscribe(subscriptionReq);
- Subscribe after creation
SubscribeResponse subscribeResponse = jobPort.subscribe(subscriptionRequest); subscriptionEndpoint = subscribeResponse.getSubscriptionReference();
- Subscribe on creation
- Release any state holds (if necessary)
jobPort.release(new ReleaseInputType());
- Destroy resources
/*destroy subscription resource*/ SubscriptionManager subscriptionManagerPort = new WSBaseNotificationServiceAddressingLocator() .getSubscriptionManagerPort(subscriptionEndpoint); //set stub security parameters on subscriptionManagerPort subscriptionManagerPort.destroy(new Destroy()); /*destroy the job resource*/ jobPort.destroy(new Destroy());
C
No C developer scenarios have been created yet.Tutorials
The following tutorials are available for WS GRAM developers:
Feature summary
Features new in release 3.9.5
- Improved service performance:
- Job concurrency
- Throughput
- Latency
- Improved service reliability/recovery
- Support for mpich-g2 jobs:
- multi-job submission capabilites
- ability to coordinate processes in a job
- ability to coordinate subjobs in a multi-job
- Publishing of the job's exit code
- The ability to select the account under which the remote job will be run. If a user's grid credential is mapped to multiple accounts, then the user can specify, in the RSL, under which account the job should be run.
- Optional client-specified hold on a state. Released with the new "release" operation.
Other Supported Features
- Remote job execution and management
- Uniform and flexible interface to batch scheduling systems
- File staging before and after job execution
- File / directory clean up after job execution (after file stage out)
Deprecated Features
- Service managed data streaming of job's
stdout/errduring execution. - File staging using the GASS protocol
- File caching of stages files, e.g. GASS Cache
Tested platforms
Tested platforms for GRAM:
- Linux
Backward compatibility summary
Protocol changes since GT version 3.2:
- The protocol has been changed to be WSRF compliant. There is no backward compatibility between this version and any previous versions.
API changes since GT version 3.2:
- The MJFS
createoperation has becomecreateManagedJoband, now provides the option to send a uuid. A client can use this uuid to recover a job EPR in the event that the reply message is not received. Given this new scheme, thestartoperation was removed. The createManagedJob() operation also allows a notification subscription request to be specified. This is the only way to reliably get all job state notifications. - The MJS
startoperation has been removed. Its purpose was to ensure that the client had recieved the job EPR prior to the job being executed (and thus consuming resources), and is redundant with the uuid functionality.
Fault changes since GT version 3.2:
- CacheFaultType was removed since there is no longer a GASS cache.
- RepeatedlyStartedFaultType
was removed since there is no longer a
startoperation. Repeat creates with the same submission ID simply return the job EPR. - SLAFaultType was changed to ServiceLevelAgreementFaultType for clarification.
- StreamServiceCreationFaultType was removed since there is no longer a stream service.
- UnresolvedSubstitutionReferencesFaultType was removed since there is no longer support for substitution definitions and references in the RSL.
- DatabaseAccessFaultType was removed since a database is no longer used to save job data.
RSL schema changes since GT version 3.2. See the 3.9.5 User's Guide for more information about the new RSL syntax:
- executable is now a single local file path. Remote URLs are no longer allowed. If executable staging is desired, it should be added to the fileStageIn directive.
- stdin is now a single local file path. Remote URLs are no longer allowed. If stdin staging is desired, it should be added to the fileStageIn directive.
- stdout is now a single local file path, instead of a list of remote URLs. If stdout staging is desired, it should be added to the fileStageOut directive.
- stderr is now a single local file path, instead of a list of remote URLs. If stderr staging is desired, it should be added to the fileStageOut directive.
- scratchDirectory has been removed.
- gramMyJobType has been removed. "Collective" functionality is always available if a job chooses to use it.
- dryRun has been removed. This is obsolete given the addition of the holdState attribute. setting holdState to "StageIn" should prevent the job from being submitted to the local scheduler. It can then be canceled once the StageIn-Hold state notification is received.
- remoteIoUrl has been removed. This was a hack for pre-ws GRAM involved with staging via GASS, and has no relevancy in the current implementation.
- File Staging related RSL attributes have been replaced with RFT file stransfer attributes/syntax.
- RSL substitution definitions and substitution references have been removed in order to be able to use standard XML parsing/serialization tools.
- RSL variables have been added. These are keywords denoted in the form of ${variable name} that can be found in certain RSL attributes.
- Explicit credential references have added, which, along with use of the new DelegationFactory service, replace the old implicit delegation model.
Technology dependencies
GRAM depends on the following GT components:
- Java WS Core
- Transport-Level Security
- Delegation Service
- RFT
- GridFTP
- MDS - internal libraries
GRAM depends on the following 3rd party software. The dependency exists only for the batch schedulers configured, thus making job submissions possible to the batch scheduling service:
- PBS
- Condor
- LSF
- other batch schedulers... (where the GRAM scheduler interface has been implemented)
Security considerations
No special security considerations exist at this time.
Debugging
Enabling debug logging for GRAM classes
For starters, consult the Debugging section of the Java WS Core Developer's Guide for details about what files to edit and other general log4j configuration information.
To turn on debug logging for the Managed Executable Job Service (MEJS), add the
following entry to the container-log4j.properties file:
log4j.category.org.globus.exec.service.exec=DEBUGTo turn on debug logging for the Managed Multi Job Service (MMJS), add the following entry to the
container-log4j.properties file:
log4j.category.org.globus.exec.service.multi=DEBUGTo turn on debug logging for the Managed Job Factory Service (MJFS), add the following entry to the
container-log4j.properties file:
log4j.category.org.globus.exec.service.factory=DEBUGTo turn on debug logging for all GRAM code, add the following entry to the
container-log4j.properties file:
log4j.category.org.globus.exec=DEBUGFollow the pattern to turn on logging for other specifc packages or classes.
Debugging script execution
It may be neccessary to debug the scheduler scripts if jobs aren't being submitted correctly, and either no fault or a less-than-helpful fault is generated. Ideally we would like that this not be neccessary; so if you find that you must resort to this, please file a bug report or let us know on the discuss e-mail list.
By turning on debug logging for the MEJS (see above), you should be able to search for "Perl Job Description" in the logging output to find the perl form of the job description that is sent to the scheduler scripts.
Also by turning on debug logging for the MEJS, you should be able to search for "Executing command" in the logging output to find the specific commands that are executed when the scheduler scripts are invoked from the service code.
Beyond the above advice, you may want to edit the perl scripts themselves to print more detailed information like the native scheduler job description. For more information on the location and composition of the scheduler scripts, please consult the WS-GRAM Scheduler Interface Tutorial.
Troubleshooting
The job manager detected an invalid script response
- Check for a restrictive umask. When the service writes the native scheduler job description to a file, an overly restrictive umask will cause the permissions on the file to be such that the submission script run through sudo as the user cannot read the file (bug #2655).
Related Documentation
No related documentation links have been determined at this time.