Table of Contents
- 1. Introduction
- 2. New Functionality in GT4
- 3. Changed Functionality in GT4
- 4. Usage scenarios
- 4.1. Generating a valid proxy
- 4.2. Submitting a simple job
- 4.3. Submitting a job with the contact string
- 4.4. Submitting a job with the job description
- 4.5. Delegating credentials
- 4.6. Finding which schedulers are interfaced by the WS GRAM installation
- 4.7. Specifying file staging in the job description
- 4.8. Specifying and submitting a multijob
- 4.9. Lifetime of jobs
- 4.10. Specifying substitution variables in a job description
- 4.11. Specifying a self-generated resource key during job submission
- 4.12. Specifying and handling custom job description extensions (4.0.5+, update pkg available)
- 4.13. Specifying SoftEnv keys in the job description (4.0.5+ only)
- 5. Command-line tools
- 6. Submitting MPI Jobs
- 7. Graphical user interfaces
- 8. Troubleshooting
- 9. Usage statistics collection by the Globus Alliance
GRAM services provide secure job submission to many types of job schedulers for users who have the right to access a job hosting resource in a Grid environment. The existence of a valid proxy is actually required for job submission. All GRAM job submission options are supported transparently through the embedded request document input. In fact, the job startup is done by submitting a client-side provided job description to the GRAM services. This submission can be made by end-users with the GRAM command-line tools.
A submission ID may be used in the GRAM protocol for reliability in the face of message faults or other transient errors in order to ensure that at most one instance of a job is executed, i.e. to prevent accidental duplication of jobs under rare circumstances with client retry on failure. By default, the globusrun-ws program will generate a submission ID (uuid). One can override this behavior by supplying a submission ID as a command line argument.
If a user is unsure whether a job was submitted successfully, he should resubmit using the same ID as was used for the previous attempt.
It is possible to specify in a job description that the job be put on hold when it reaches a chosen state (see GRAM Approach documentation for more information about the executable job state machine, and see the job description XML schema documentation for information about how to specify a held state). This is useful, for example, when a GRAM client wishes to directly access output files written by the job (as opposed to waiting for the stage-out step to transfer files from the job host). The client would request that the file cleanup process be held until released, giving the client an opportunity to fetch all remaining/buffered data after the job completes but before the output files are deleted.
globusrun-ws uses job hold and release to ensure client-side streaming of remote files in batch mode.
The new job description XML schema allows for specification of a multijob, i.e., a job that is itself composed of several executable jobs. This is useful in order to bundle a group of jobs together and submit them as a whole to a remote GRAM installation.
WS GRAM services implement a rendezvous mechanism to perform synchronization between job processes in a multiprocess job and between subjobs in a multijob. The job application can in fact register binary information, for instance process information or subjob information, and get notified when all the other processes or subjobs have registered their own information. This is useful for parallel jobs which need to rendezvous at a "barrier" before proceeding with computations, in the case when no native application API is available to help do the rendezvous.
![]() | Important |
|---|---|
This change in functionality is only available starting with GT 4.0.5. |
WS GRAM enables the client to add a self-generated resource key to the input type when submitting a new job request to the ManagedJobFactoryService (MJFS). This enables the client to keep in contact with the job in case the server fails after the job was created but before the EndpointReference (EPR) of the newly created job was sent to the client.
The client is then able to create an EPR itself with the self-generated job UUID and the address of the ManagedExecutableJobService (MEJS) and then query for the state of the job.
In former versions of WS GRAM, the job UUID that was generated on the client-side was used in WS GRAM as the resource key of the created job resource. This has changed: starting with GT 4.0.5, WS GRAM now creates its own job UUID, even if the client provides one in the input of its call to the MJFS, and returns this job key inside the EPR which is then returned to the client. With the self-generated job key, the client can still contact the MJFS and the MJFS will simply use that mapping. But the client cannot contact the MEJS with that self-generated job key as part of an EPR in order to query for job state.
The following scenarios walk you through tasks typically performed by WS GRAM users.
Finding which schedulers are interfaced by the WS GRAM installation
Specifying and handling custom job description extensions (4.0.5+, update pkg available)
Specifying SoftEnv keys in the job description (4.0.5+ only)
Specifying substitution variables in a job description (new info for 4.0.5+)
Specifying a self generated resource key during job submission (new info for 4.0.5+)
In order to generate a valid proxy file, use the
grid-proxy-init
tool available under $GLOBUS_LOCATION/bin:
% bin/grid-proxy-init Your identity: /O=Grid/OU=GlobusTest/OU=simpleCA.mymachine/OU=mymachine/CN=John Doe Enter GRID pass phrase for this identity: Creating proxy ................................. Done Your proxy is valid until: Tue Oct 26 01:33:42 2004
Use the globusrun-ws command to submit a
simple job without writing a job description document. With the -c option,
a job description will be generated assuming the first arg is the executable
and the remaining are arguments. For example:
% globusrun-ws -submit -c /bin/touch touched_it Submitting job...Done. Job ID: uuid:4a92c06c-b371-11d9-9601-0002a5ad41e5 Termination time: 04/23/2005 20:58 GMT Current job state: Active Current job state: CleanUp Current job state: Done Destroying job...Done.
Confirm that the job worked by verifying the file was touched:
% ls -l ~/touched_it -rw-r--r-- 1 smartin globdev 0 Apr 22 15:59 /home/smartin/touched_it % date Fri Apr 22 15:59:20 CDT 2005
![]() | Note |
|---|---|
You did not tell globusrun-ws where to run your job, so the default
of |
Use globusrun-ws to submit the same touch job, but this time specify the contact string.
% globusrun-ws -submit -F https://lucky0.mcs.anl.gov:8443/wsrf/services/ManagedJobFactoryService -c /bin/touch touched_it Submitting job...Done. Job ID: uuid:3050ad64-b375-11d9-be11-0002a5ad41e5 Termination time: 04/23/2005 21:26 GMT Current job state: Active Current job state: CleanUp Current job state: Done Destroying job...Done.
Try the same job to a remote host. Type
globusrun-ws -help
to learn the details about the contact string.
The user writes the specifications of a job submission to a job description XML file.
Here is an example of a simple job description:
<job>
<executable>/bin/echo</executable>
<argument>this is an example_string </argument>
<argument>Globus was here</argument>
<stdout>${GLOBUS_USER_HOME}/stdout</stdout>
<stderr>${GLOBUS_USER_HOME}/stderr</stderr>
</job>
Tell globusrun-ws to read the job description from a file, using the -f
option:
% bin/globusrun-ws -submit -f test_super_simple.xml Submitting job...Done. Job ID: uuid:c51fe35a-4fa3-11d9-9cfc-000874404099 Termination time: 12/17/2004 20:47 GMT Current job state: Active Current job state: CleanUp Current job state: Done Destroying job...Done.
Note the usage of the substitution variable ${GLOBUS_USER_HOME}
which resolves to the user home directory.
Here is an example with more job description parameters:
<?xml version="1.0" encoding="UTF-8"?>
<job>
<executable>/bin/echo</executable>
<directory>/tmp</directory>
<argument>12</argument>
<argument>abc</argument>
<argument>34</argument>
<argument>this is an example_string </argument>
<argument>Globus was here</argument>
<environment>
<name>PI</name>
<value>3.141</value>
</environment>
<stdin>/dev/null</stdin>
<stdout>stdout</stdout>
<stderr>stderr</stderr>
<count>2</count>
</job>
Note that in this example,
A
<directory>element specifies that the command will be executed in the/tmpdirectory on the execution machine.An
<stdout>element specifies the standard output as the relative pathstdout.
The output is therefore written to /tmp/stdout:
% cat /tmp/stdout 12 abc 34 this is an example_string Globus was here
There are three different uses of delegated credentials:
for use by the MEJS to create a remote user proxy,
for use by the MEJS to contact RFT, and
for use by RFT to contact the GridFTP servers.
The EPRs to each of these are specified in three job description elements--they are:
jobCredentialEndpointstagingCredentialEndpointtransferCredentialEndpoint
respectively. Please see the job description schema and RFT transfer request schema documentation for more details about these elements.
The globusrun-ws command can either delegate these credentials automatically for a particular job or reuse pre-delegated credentials (see next paragraph) through the use of command-line arguments for specifying the credentials' EPR files. Please see the globusrun-ws documentation for details on these command-line arguments.
It is possible to use delegation command-line clients to obtain and refresh delegated credentials in order to use them when submitting jobs to WS GRAM. This enables the submission of many jobs using a shared set of delegated credentials. This can significantly decrease the number of remote calls for a set of jobs, thus improving performance.
Unfortunately there is no option yet to print the list of local resource managers supported by a given WS-GRAM service installation. But there is a way to check whether or not WS-GRAM supports a certain local resource manager. The following command gives an example of how a client could find out if Condor is available at the remote site:
wsrf-query \
-s https://<hostname>:<port>/wsrf/services/ManagedJobFactoryService \
-key "{http://www.globus.org/namespaces/2004/10/gram/job}ResourceID" Condor \
"//*[local-name()='version']"
Replace host and port settings with the values you need. If Condor is available on the server-side, the output should look something like the following:
<ns1:version xmlns:ns1="http://mds.globus.org/metadata/2005/02">4.0.3</ns1:version>
In this example, the output indicates that a GT is listening on the server-side, that Condor is available and that the GT version is 4.0.3. If no GT is running at all on the specified host and/or port or if the specified local resource manager is not available on the server-side, the output will be an error message.
On the server-side, the GRAM name of local resource managers for which GRAM support has been installed can be obtained by looking at the GRAM configuration on the GRAM server-side machine, as explained here.
The GRAM name of the local resource manager can be used with the factory type option of the job submission command-line tool to specify which factory resource to use when submitting a job.
In order to do file staging, one must add specific elements to the job description and delegate credentials appropriately (see Delegating credentials). The file transfer directives follow the RFT syntax, which allows only for third-party transfers. Each file transfer must therefore specify a source URL and a destination URL. URLs are specified as GridFTP URLs (for remote files) or as file URLs (for files local to the service--these are converted internally to full GridFTP URLs by the service).
For instance, in the case of staging a file in, the source
URL would be a GridFTP URL (for example,
gsiftp://job.submitting.host:2811/tmp/mySourceFile) resolving to a source document accessible on the file system
of the job submission machine (for instance /tmp/mySourceFile).
At run-time, the Reliable File Transfer service used by the
MEJS on the remote machine would reliably fetch the remote file using the
GridFTP protocol and write it to the specified local file (for example,
file:///${GLOBUS_USER_HOME}/my_transfered_file,
which resolves to ~/my_transfered_file). Here
is how the stage-in directive would look:
<fileStageIn>
<transfer>
<sourceUrl>gsiftp://job.submitting.host:2811/tmp/mySourceFile</sourceUrl>
<destinationUrl>file:///${GLOBUS_USER_HOME}/my_transfered_file</destinationUrl>
</transfer>
</fileStageIn>
![]() | Note |
|---|---|
Additional RFT-defined quality of service requirements may be specified for each transfer. See the RFT documentation for more information. |
Here is an example job description with file stage-in and stage-out:
<job>
<executable>my_echo</executable>
<directory>${GLOBUS_USER_HOME}</directory>
<argument>Hello</argument>
<argument>World!</argument>
<stdout>${GLOBUS_USER_HOME}/stdout</stdout>
<stderr>${GLOBUS_USER_HOME}/stderr</stderr>
<fileStageIn>
<transfer>
<sourceUrl>gsiftp://job.submitting.host:2811/bin/echo</sourceUrl>
<destinationUrl>file:///${GLOBUS_USER_HOME}/my_echo</destinationUrl>
</transfer>
</fileStageIn>
<fileStageOut>
<transfer>
<sourceUrl>file:///${GLOBUS_USER_HOME}/stdout</sourceUrl>
<destinationUrl>gsiftp://job.submitting.host:2811/tmp/stdout</destinationUrl>
</transfer>
</fileStageOut>
<fileCleanUp>
<deletion>
<file>file:///${GLOBUS_USER_HOME}/my_echo</file>
</deletion>
</fileCleanUp>
</job>
Note that the job description XML does not need to include a reference to the schema that describes its syntax. As a matter of fact, it is also possible to omit the namespace in the GRAM job description XML elements. The submission of this job to the GRAM services causes the following sequence of actions:
- The
/bin/echoexecutable is transferred from the submission machine to the GRAM host file system. The destination location is the HOME directory of the user on behalf of whom the GRAM services executed the job (see<fileStageIn>). - The transferred executable is used to print a test string
(see
<executable>,<directory>and the<argument>elements) on the standard output, which is redirected to a local file (see<stdout>). - The standard output file is transferred to the submission machine
(see
<fileStageOut>). - The file that was initially transferred during the stage-in phase is removed
from the file system of the GRAM installation (see
<fileCleanup>).
The job description XML schema allows for specification of a multijob i.e. a job that is itself composed of several executable jobs, which we will refer to as subjobs.
![]() | Note |
|---|---|
Subjobs cannot be multijobs, so the structure is not recursive. |
This is useful, for example, if you want to bundle a group of jobs together and submit them as a whole to a remote GRAM installation.
![]() | Note |
|---|---|
No relationship can be specified between the subjobs of a multijob. The subjobs are submitted to job factory services in order of appearance in the multijob description. |
Within a multijob description, each subjob description must include an endpoint to which the factory submits the subjob. This enables the at-once submission of several jobs to different hosts. The factory to which the multijob is submitted acts as an intermediary tier between the client and the eventual executable job factories.
Here is an example of a multijob description:
<?xml version="1.0" encoding="UTF-8"?>
<multiJob xmlns:gram="http://www.globus.org/namespaces/2004/10/gram/job"
xmlns:wsa="http://schemas.xmlsoap.org/ws/2004/03/addressing">
<factoryEndpoint>
<wsa:Address>
https://localhost:8443/wsrf/services/ManagedJobFactoryService
</wsa:Address>
<wsa:ReferenceProperties>
<gram:ResourceID>Multi</gram:ResourceID>
</wsa:ReferenceProperties>
</factoryEndpoint>
<directory>${GLOBUS_LOCATION}</directory>
<count>1</count>
<job>
<factoryEndpoint>
<wsa:Address>https://localhost:8443/wsrf/services/ManagedJobFactoryService</wsa:Address>
<wsa:ReferenceProperties>
<gram:ResourceID>Fork</gram:ResourceID>
</wsa:ReferenceProperties>
</factoryEndpoint>
<executable>/bin/date</executable>
<stdout>${GLOBUS_USER_HOME}/stdout.p1</stdout>
<stderr>${GLOBUS_USER_HOME}/stderr.p1</stderr>
<count>2</count>
</job>
<job>
<factoryEndpoint>
<wsa:Address>https://localhost:8443/wsrf/services/ManagedJobFactoryService</wsa:Address>
<wsa:ReferenceProperties>
<gram:ResourceID>Fork</gram:ResourceID>
</wsa:ReferenceProperties>
</factoryEndpoint>
<executable>/bin/echo</executable>
<argument>Hello World!</argument>
<stdout>${GLOBUS_USER_HOME}/stdout.p2</stdout>
<stderr>${GLOBUS_USER_HOME}/stderr.p2</stderr>
<count>1</count>
</job>
</multiJob>
Notes:
- The
<ResourceID>element within the<factoryEndpoint>WS-Addressing endpoint structures must be qualified with the appropriate GRAM namespace. - Apart from the
<factoryEndpoint>element, all elements at the enclosing multijob level act as defaults for the subjob parameters, in this example<directory>and<count>. - The default
<count>value is overridden in the subjob descriptions.
In order to submit a multijob description, use a job submission command-line tool
and specify the Managed Job Factory resource to be Multi.
For example, submitting the multijob description above using globusrun-ws, we obtain:
% bin/globusrun-ws -submit -f test_multi.xml
Delegating user credentials...Done.
Submitting job...Done.
Job ID: uuid:bd9cd634-4fc0-11d9-9ee1-000874404099
Termination time: 12/18/2004 00:15 GMT
Current job state: Active
Current job state: CleanUp
Current job state: Done
Destroying job...Done.
Cleaning up any delegated credentials...Done.
A multijob resource is created by the factory and exposes a set of WSRF resource properties different than the resource properties of an executable job. The state machine of a multijob is also different since the multijob represents the overall execution of all the executable jobs of which it is composed.
Jobs submitted to WS-GRAM have a lifetime. If the lifetime of a ManagedJob resource expires the job will be destroyed after cleanup steps had been performed and the job's persistence data will removed.
For executable jobs the user-relevant steps in cleanup are:
- Cancellation of the job at the local resource manager if it's still running.
- Performing fileCleanUp if specified in the job description and the job did not already pass this step.
If a multi job expires all sub-jobs will be destroyed.
The default C-client globusrun-ws and the
Java API GramJob (for developers) set the lifetime by default to 24 hours. If
a user wants a job to have a longer lifetime he/she must explicitly specify it.
Using the C-client globusrun-ws the lifetime of a job can be set in 2 ways in job submission. The first example shows how to set a relative lifetime, i.e. the job will expire in 48h from now:
globusrun-ws -submit -term "+48:00" -b -o myJob.epr -f myJob.xml
The second example shows how to set an absolute lifetime. The job will expire at the given date:
globusrun-ws -submit -term "10/23/2008 12:00" -b -o myJob.epr -f myJob.xml
In both example the job had been submitted in batch mode, which makes sense for longer running jobs.
![]() | Note |
|---|---|
Specyfing the lifetime in the short way using
|
The lifetime of a job can also be changed after the submission. The following
example shows how to set a new termination time of a job resource, assuming
that the Endpoint Reference (EPR) of the job is stored in the file
myJob.epr. The new lifetime is provided in
seconds (604800 in this example [one week]):
[martin@osg-test1 ~]$ wsrf-set-termination-time -e myJob.epr 604800
The output could be something like this:
requested: Tue May 13 09:27:15 CDT 2008 scheduled: Tue May 13 09:27:15 CDT 2008
Job description variables are special strings in a job description that are replaced by the GRAM service with values that the client-side does not a priori know. Job description variables can be used in any path-like string or URL specified in the job description.
An example of a variable is
${GLOBUS_USER_HOME}, which represents the
path to the HOME directory on the file system where the job is executed.
The set of variables is fixed in the GRAM service implementation. This is
different from previous implementations of RSL
substitutions in GT2 and GT3,
where a user could define a new variable for use inside a job description
document. This was done to preserve the simplicity of the job description
XML schema (relative to the GT3.2 RSL schema), which does not require a
specialized XML parser to serialize a job description document.
Details of the RSL variables are in the job description doc and the substitution variable section of the admin guide.
![]() | If you are using 4.0.5+: |
|---|---|
Beginning with version 4.0.5, additional variables can be defined on the server side for use in the job description. Currently, users cannot get information from WS GRAM about whether or not additional variables are defined on the server side and, if so, what their names and values are. For now, this information must be published by the provider. |
WS GRAM enables a client to add a self-generated resource key to the input type when submitting a new job request to the ManagedJobFactoryService (MJFS). The client should make sure to provide a universal unique identifier (UUID) as the job resource key. For information about UUID's please read here.
Providing its own UUID enables a client to resubmit a job in case the server did not respond to a prior job submission request (due to network failures, for example). If the client submits a job with an already existing resource key a second time, the job will not be started again because it is already running. This avoids unnecessary and undesired resource usage and enables a reliable job submission.
![]() | If you are using 4.0.5+: |
|---|---|
Beginning with version 4.0.5, WS GRAM now creates its own job UUID, even if the client provides one in the input of its call to the MJFS, and returns this job UUID inside the endpoint reference (EPR) to the client. The client can still contact the ManagedJobFactoryService (MJFS) with the self-generated job resource key in order to resubmit a potentially not 'lost' and submitted job. But the client can no longer contact the ManagedExecutableJobService (MEJS) with that self-generated job key as part of an EPR in order to query for job state. If it is unclear whether a job request has been started by the server, the client must submit the job with the same job UUID again in order to get an EPR from the MJFS. The client can then query for job state or destroy the job. |
![]() | Important |
|---|---|
This feature has been added as of GT 4.0.5. For versions older than 4.0.5, an update package is available to upgrade your installation. See the GT Development Downloads page for the latest links. |
Basic support is provided for specifying custom extensions to the job description. There are plans to improve the usability of this feature, but at this time it involves a bit of work.
Specifying the actual custom elements in the job description is trivial. Simply add any elements that you need between the beginning and ending
extensions tags at the bottom of the job
description as in the following basic example:
<job>
<executable>/home/user1/myapp</executable>
<extensions>
<myData>
<var1>hello</var1>
<var2>world</var2>
</myData>
</extensions>
</job>
To handle this data, you must alter the appropriate perl scheduler
script (i.e. fork.pm for the Fork scheduler, etc...) to parse the data returned
from the $description->extensions() sub.
More information about job description extension support can be found here.
![]() | Note |
|---|---|
This feature is only available beginning with version 4.0.5 of the toolkit. |
For a short introduction to SoftEnv please have a look at the SoftEnv chapter.
If SoftEnv is enabled on the server-side, nothing needs to be added to a job
description to set up the environment which is specified in the
.soft file in the remote home directory of
the user before the job is submitted to the scheduler.
If a different software
environment should be used than the one specified in the remote
.soft file, the user must provide
SoftEnv parameters in the extensions element of the job
description.
The schema of the extension element for software selection in the job description is as follows:
<element name="softenv" type="xsd:string">
For example, to add the SoftEnv commands @teragrid-basic,
+intel-compilers, +atlas, and +tgcp to the job process'
environment, the user would specify the following <extensions> element
in the job description:
<extensions> <softenv>@teragrid-basic</softenv> <softenv>+intel-compilers</softenv> <softenv>+atlas</softenv> <softenv>+tgcp</softenv> </extensions>
So far there is no way for a user to learn from the remote service
itself whether or not SoftEnv support is enabled. Currently, the only way to check this is to submit a job
with /bin/env as the executable and watch the results.
The following table describes what happens in various scenarios if SoftEnv is disabled or enabled on the server side:
| Disabled on server side | Enabled on server side | |
|---|---|---|
User provides no SoftEnv extensions: |
No SoftEnv environment is configured before job submission, even
if the user has a |
If the user has a If the user has a |
User provides valid SoftEnv extensions: |
If SoftEnv is not installed on the server then no environment will be configured
If SoftEnv is installed, the environment the user specifies in the |
The specified environment overwrites any SoftEnv configuration the user
specifies in a |
User provides invalid SoftEnv extensions: |
If SoftEnv is not installed on the server, then no environment will be configured.
If SoftEnv is installed, the environment the user specifies in the |
The specified environment overwrites any SoftEnv configuration the user
specifies in a |
|
In general, jobs do not fail if they have SoftEnv extensions in their description and SoftEnv is disabled (or not even installed) on the server side. But they will fail if they rely on environments being set up before job submission. |
![]() | Note |
|---|---|
In the current implementation, it is not possible to call executables directly whose paths are defined in SoftEnv without specifiying the complete path to the executable. For example, if a database query must be executed using the mysql command and mysql is not in the default path, then the direct use of mysql as an executable in the jobs description document will fail, even if the use of SoftEnv is configured. The mysql command must be written to a script which is in the default path. Thus a job submission with the following job description document will fail: <job> ... <executable>mysql</executable> ... </job> But when the command is embedded inside a shell script which is specified as the executable in the job description document, it will work:
#!/bin/sh
...
mysql ...
...
|
![]() | Note |
|---|---|
The use of invalid SoftEnv keys in the extension part of the job description document does not generate errors. |
Please see the GT 4.0 WS GRAM Command-line Reference.
This document from DGrid describes how to submit MPI batch jobs to compute clusters using GRAM4.
When I submit a streaming or staging job, I get the following error: ERROR service.TransferWork Terminal transfer error: [Caused by: Authentication failed[Caused by: Operation unauthorized(Mechanism level: Authorization failed. Expected"/CN=host/localhost.localdomain" target but received "/O=Grid/OU=GlobusTest/OU=simpleCA-my.machine.com/CN=host/my.machine.com")
Check
$GLOBUS_LOCATION/etc/gram-service/globus_gram_fs_map_config.xmlto see if it useslocalhostor127.0.0.1instead of the public hostname (in the example above,my.machine.com). Change these uses of the loopback hostname or IP to the public hostname as neccessary.
Fork jobs work fine, but submitting PBS jobs with globusrun-ws hangs at "Current job state: Unsubmitted"
Make sure the log_path in
$GLOBUS_LOCATION/etc/globus-pbs.confpoints to locally accessible scheduler logs that are readable by the user running the container. The Scheduler Event Generator (SEG) will not work without local scheduler logs to monitor. This can also apply to other resource managers, but is most comonly seen with PBS.If the SEG configuration looks sane, try running the SEG tests. They are located in
$GLOBUS_LOCATION/test/globus_scheduler_event_generator_*_test/. If Fork jobs work, you only need to run the PBS test. Run each test by going to the associated directory and run./TESTS.pl. If any tests fail, report this to the gram-dev@globus.org mailing list.If the SEG tests succeed, the next step is to figure out the ID assigned by PBS to the queued job. Enable GRAM debug logging by uncommenting the appropriate line in the
$GLOBUS_LOCATION/container-log4j.propertiesconfiguration file. Restart the container, run a PBS job, and search the container log for a line that contains "Received local job ID" to obtain the local job ID.Once you have the local job ID, you can find out if the PBS status is being logged by checking the latest PBS logs pointed to by the value of "log_path" in
$GLOBUS_LOCATION/etc/globus-pbs.conf.If the status is not being logged, check the documentation for your flavor of PBS to see if there's any futher configuration that needs to be done to enable job status logging. For example, PBS Pro requires a sufficient
-e <bitmask>option added to the pbs_server command line to enable enough logging to satisfy the SEG.If the correct status is being logged, try running the SEG manually to see if it is reading the log file properly. The general form of the SEG command line is as follows:
$GLOBUS_LOCATION/libexec/globus-scheduler-event-generator -s pbs -t <timestamp>The timestamp is in seconds since the epoch and dictates how far back in the log history the SEG should scan for job status events. The command should hang after dumping some status data to stdout.
If no data appears, change the timestamp to an earlier time.
If nothing ever appears, report this to the gram-user@globus.org mailing list.
If running the SEG manually succeeds, try running another job and make sure the job process actually finishes and PBS has logged the correct status before giving up and cancelling globusrun-ws. If things are still not working, report your problem and exactly what you have tried to remedy the situtation to the gram-user@globus.org mailing list.
The job manager detected an invalid script response
Check for a restrictive umask. When the service writes the native scheduler job description to a file, an overly restrictive umask will cause the permissions on the file to be such that the submission script run through sudo as the user cannot read the file (bug #2655).
When restarting the container, I get the following error: Error getting delegation resource
Most likely this is simply a case of the delegated credential expiring. Either refresh it for the affected job or destroy the job resource. For more information, see delegation command-line clients.
The user's home directory has not been determined correctly
This occurs when the administrator changed the location of the users' home directory and did not restart the GT4 container afterwards. Beginning with version 4.0.3, WS-GRAM determines a user's home directory only once in the lifetime of a container (when the user submits the first job). Subsequently, submitted jobs will use the cached home directory during job execution.
The following usage statistics are sent by default in a UDP packet (in addition to the GRAM component code, packet version, timestamp, and source IP address) at the end of each job (i.e. when Done or Failed state is entered).
- job creation timestamp (helps determine the rate at which jobs are submitted)
- scheduler type (Fork, PBS, LSF, Condor, etc...)
- jobCredentialEndpoint present in RSL flag (to determine if server-side user proxies are being used)
- fileStageIn present in RSL flag (to determine if the staging in of files is used)
- fileStageOut present in RSL flag (to determine if the staging out of files is used)
- fileCleanUp present in RSL flag (to determine if the cleaning up of files is used)
- CleanUp-Hold requested flag (to determine if streaming is being used)
- job type (Single, Multiple, MPI, or Condor)
- gt2 error code if job failed (to determine common scheduler script errors users experience)
- fault class name if job failed (to determine general classes of common faults users experience)
If you wish to disable this feature, please see the Java WS Core System Administrator's Guide section on Usage Statistics Configuration for instructions.
Also, please see our policy statement on the collection of usage statistics.
![[Important]](/docbook-images/important.gif)
![[Note]](/docbook-images/note.gif)