GT 5.0.2 GRAM5: User's Guide

Introduction

GRAM services provide secure, remote job submission to different local resource managers in a Grid environment. This document describes how to use the RSL language and command-line interfaces provided in GT 5.0.2 to submit jobs to Grid resources.


Table of Contents

1. Using GRAM5
1. Preparing to use GRAM
1.1. Proxy credentials with grid-proxy-init
2. Java Client API Download
3. Delegating credentials
3.1. Delegated Credential Usage
4. Submitting jobs
4.1. Resource Names
4.2. Running Jobs with globus-job-run
4.3. Submitting Jobs with globus-job-submit
4.4. Using the globusrun tool
4.4.1. Checking RSL Syntax
4.4.2. Checking Service Contacts
4.4.3. Checking GRAM service version
4.4.4. Basic Interactive job with globusrun
4.4.5. Basic batch job with globusrun
4.4.6. Refreshing a GRAM5 Credential
4.4.7. Dealing with credential expiration
4.4.8. File staging
4.4.9. Temporary files and cleanup
4.4.10. Reliable job submit
4.4.11. Reconnecting to a job
4.4.12. Submitting a Java job
5. Using GRAM5 with Condor-G
GRAM5 Commands
globusrun - Execute and manage jobs via GRAM
globus-job-cancel - Cancel a GRAM batch job
globus-job-clean - Cancel and clean up a GRAM batch job
globus-job-get-output - Retrieve the output and error streams from a GRAM job
globus-job-run - Execute a job using GRAM
globus-job-status - Check the status of a GRAM5 job
globus-job-submit - Submit a batch job using GRAM
globus-personal-gatekeeper - Manage a user's personal gatekeeper daemon
globus-gram-audit - Load GRAM4 and GRAM5 audit records into a database
globus-gatekeeper - Authorize and execute a grid service on behalf of a user
globus-job-manager - Execute and monitor jobs
globus-job-manager-event-generator - Create LRM-independent SEG files for the job manager to use
globus-fork-starter - Start and monitor a fork job
2. Troubleshooting
1. Troubleshooting tips
2. Errors
3. Known Problems in GRAM5
1. Known Problems
1.1. Limitations
1.2. Outstanding bugs
4. Usage statistics collection by the Globus Alliance
1. GRAM5-specific usage statistics
Glossary
Index

Chapter 1. Using GRAM5

1. Preparing to use GRAM

The first step to being able to use GRAM5 after installation is to acquire a temporary Grid credential to use to authenticate with the GRAM5 service and any file services your job requires. Normally this is done via either grid-proxy-init or via the MyProxy service.

1.1. Proxy credentials with grid-proxy-init

To generate a proxy credential using the grid-proxy-init program, execute the command with no arguments. By default, it will generate an impersonation proxy with a lifetime of 12 hours.

Example 1.1. Generating a proxy with grid-proxy-init

Thie example creates a 12 hour impersonation proxy to use to authenticate with grid services such as GRAM5:

% bin/grid-proxy-init
Your identity: /O=Grid/OU=Example/CN=Joe User
Enter GRID pass phrase for this identity:
Creating proxy ................................. Done
Your proxy is valid until: Tue Oct 26 01:33:42 2010

[Important]Important

In order to generate a proxy credential, you must have first been issued an identity credential by some certificate authority that is trusted by the GRAM5 resource you want to use. To learn more about certificates and Grid security in general, please read Security Key Concepts.

2. Java Client API Download

GT 5.0 does not include any of the CoG JGlobus Java APIs that were included in the GT4 release series.   But, the JGlobus APIs can still be used with the GT5 services.  You can get them directly from the CoG JGlobus releases; see the following link:

http://dev.globus.org/wiki/CoG_jglobus

Consider the following when determining which version of CoG JGlobus to use:

  • The GRAM development team used CoG JGlobus version 1.6.0 for performance testing.

  • The BIRN project used CoG JGlobus version 1.6.0 (plus patches) for GridFTP testing. All patches are included in 1.8.0.

  • At the time of the GT 5.0.2 release, 1.8.0 was the recommended version.  In general, the latest recommended CoG JGlobus version should be used.

3. Delegating credentials

The credential created in the previous section is used to authenticate with the GRAM5 service as well as to delegate a limited proxy of that credential to the service so that it can process the job. This credential delegation occurs when the globus-gatekeeper service is first contacted when a job is to be submitted. By default, the tools provided with GT 5.0.2 delegate a limited proxy. This limited proxy can be used to authenticate with other services on the client's behalf, but with the services knowing that the proxy is not under direct control by the user.

3.1. Delegated Credential Usage

The delegated proxy can be used by the GRAM5 service and the job in a few different ways:

  1. The GRAM5 service uses the credential to send job state notification messages to clients which have registered to receive them.
  2. The GRAM5 service uses the credential to contact GASS and GridFTP file servers to stage files to and from the execution resource
  3. The job executed by the GRAM5 service can use the delegated credential for application-specific purposes.

[Note]Note

In GRAM5, the Job Manager may manage multiple jobs simultaneously. It will use the delegated proxy with the most time left for authentication. Individual GRAM5 jobs will have separate proxies.

globusrun globus-job-run, and globus-job-submit commands delegate credentials automatically when submitting a job. Additionally, globusrun can refresh the credentials used by the job and job manager, after the job manager is started.

4. Submitting jobs

This section describes the steps needed to submit jobs to resources managed by GRAM5 services. It describes how resources are named, tools for submitting and monitoring jobs, and the RSL language which describes requirements for jobs.

4.1. Resource Names

In GRAM5, a Gatekeeper Service Contact contains the host, port, service name, and service identity required to contact a particular GRAM service. For convenience, default values are used when parts of the contact are omitted. An example of a full gatekeeper service contact is grid.example.org:2119/jobmanager:/C=US/O=Example/OU=Grid/CN=host/grid.example.org.

The various forms of the resource name using default values follow:

  • HOST
  • HOST:PORT
  • HOST:PORT/SERVICE
  • HOST/SERVICE
  • HOST:/SERVICE
  • HOST:PORT:SUBJECT
  • HOST/SERVICE:SUBJECT
  • HOST:/SERVICE:SUBJECT
  • HOST:PORT/SERVICE:SUBJECT

Where the various values have the following meaning:

HOST
Network name of the machine hosting the service.
PORT
Network port number that the service is listening on. If not specified, the default of 2119 is used.
SERVICE
Path of the service entry in $GLOBUS_LOCATION/etc/grid-services. If not specified, the default of jobmanager is used.
SUBJECT
X.509 identity of the credential used by the service. If not specified, the default of host@HOST is used.

Example 1.2. Gatekeeper Service Contact Examples

The following strings all name the service grid.example.org:2119/jobmanager:/C=US/O=Example/OU=Grid/CN=host/grid.example.org using the formats with the various defaults described above.

  • grid.example.org
  • grid.example.org:2119
  • grid.example.org:2119/jobmanager
  • grid.example.org/jobmanager
  • grid.example.org:/jobmanager
  • grid.example.org:2119:/C=US/O=Example/OU=Grid/CN=host/grid.example.org
  • grid.example.org/jobmanager:/C=US/O=Example/OU=Grid/CN=host/grid.example.org
  • grid.example.org:/jobmanager:/C=US/O=Example/OU=Grid/CN=host/grid.example.org
  • grid.example.org:2119/jobmanager:/C=US/O=Example/OU=Grid/CN=host/grid.example.org

4.2. Running Jobs with globus-job-run

The globus-job-run provides a simple blocking command-line interface to the GRAM service. The globus-job-run program submits a job to a GRAM5 resource and waits for the job to terminate. After the job terminates, the output and error streams of the job are sent to the output and error streams of globus-job-run as if the job were run interactively. Note that input to the job must be located in a file prior to running the job; true interactive I/O is not supported by GRAM5.

The globus-job-run program has command-line options to control most aspects of jobs run by GRAM5. However, certain behaviors must be specified by definition of an RSL string containing various job attributes. A more detailed description about the RSL language is included on the section on running jobs with globusrun below.

The following examples show some of the common command-line options to globus-job-run. Full globus-job-run documentation is available in the GRAM5 public interface guide.

Example 1.3. Minimal job using globus-job-run

The following command line submits a single instance of the /bin/hostname executable to the resource named by grid.example.org:2119/jobmanager-pbs.

% globus-job-run grid.example.org:2119/jobmanager-pbs /bin/hostname
node1.grid.example.org

Example 1.4. Multiprocess job using globus-job-run

The following command line submits ten instances of an executable a.out, staging it from the client host to the service node using GASS. The a.out program prints the name of the host it is executing on.

% globus-job-run grid.example.org:2119/jobmanager-pbs -np 10 -s a.out
node1.grid.example.org
node3.grid.example.org
node2.grid.example.org
node5.grid.example.org
node4.grid.example.org
node8.grid.example.org
node6.grid.example.org
node9.grid.example.org
node7.grid.example.org
node10.grid.example.org

Example 1.5. Canceling an interactive job

This example shows how using the Control+C (or other system-specific mechanism for sending the SIGINT signal) can be used to cancel a GRAM job.

% globus-job-run grid.example.org:2119/jobmanager-pbs /bin/sleep 90
Control-C
GRAM Job failed because the user cancelled the job (error code 8)

Example 1.6. Setting job environment variables with globus-job-run

The following command line submits one instances of the executable /usr/bin/env, setting some environment variables in the job environment beyond those set by GRAM5.

% globus-job-run grid.example.org:2119/jobmanager-pbs -env TEST=1 -env GRID=1 /usr/bin/env
HOME=/home/juser
LOGNAME=juser
GLOBUS_GRAM_JOB_CONTACT=https://client.example.org:3882/16001579536700793196/5295612977485997184/
GLOBUS_LOCATION=/opt/globus-5.0.2
GLOBUS_GASS_CACHE_DEFAULT=/home/juser/.globus/.gass_cache
TEST=1
X509_USER_PROXY=/home/juser/.globus/job/mactop.local/16001579536700793196.5295612977485997184/x509_user_proxy
GRID=1

Example 1.7. Using custom RSL clauses with globus-job-run

The following command line submits an mpi job using globus-job-run, setting the jobtype RSL attribute to mpi. Any RSL attribute understood by the LRM can be added to a job via this method.

% globus-job-run grid.example.org:2119/jobmanager-pbs -np 5 -x '&(jobtype=mpi)' a.out
Hello, MPI (rank: 0, count: 5)
Hello, MPI (rank: 3, count: 5)
Hello, MPI (rank: 1, count: 5)
Hello, MPI (rank: 4, count: 5)
Hello, MPI (rank: 2, count: 5)
                

Example 1.8. Constructing RSL strings with globus-job-run

The globus-job-run program can also generate the RSL language description of a job based on the command-line options given to it. This example combines some of the features above and prints out the resulting RSL. This RSL string can be passed to tools such as globusrun to be run later.

% globus-job-run -dumprsl grid.example.org:2119/jobmanager-pbs -np 5 -x '&(jobtype=mpi)' -env GRID=1 -env TEST=1 a.out
 &(jobtype=mpi)
    (executable="a.out")
    (environment= ("GRID" "1") ("TEST" "1"))
    (count=5)

4.3. Submitting Jobs with globus-job-submit

A related tool to globus-job-run is globus-job-submit. This command submits a job to a GRAM5 service then exits without waiting for the job to terminate. Other tools (globus-job-cancel, globus-job-clean, and globus-job-get-output) allow futher interaction with the job.

[Important]Important

When using globus-job-submit, the job output and state will remain on disk on the GRAM resource until one of globus-job-clean or globus-job-cancel is run for that job. Be sure to clean up your jobs!

The globus-job-submit program has most of the same command-line options as globus-job-run. When run, instead of displaying the output and error streams of the job, it prints the job contact, which is used with the other globus-job tools to interact with the job.

Example 1.9. globus-job-submit

This example shows the interaction of submitting a job via globus-job-submit, checking its status with globus-job-status, getting its output with globus-job-get-output, and then cleaning the job with globus-job-clean.

% globus-job-submit grid.example.org:2119/jobmanager-pbs /bin/hostname
https://grid.example.org:38843/16001600430615223386/5295612977486013582/
% globus-job-status https://grid.example.org:38843/16001600430615223386/5295612977486013582/
PENDING
% globus-job-status https://grid.example.org:38843/16001600430615223386/5295612977486013582/
ACTIVE
% globus-job-status https://grid.example.org:38843/16001600430615223386/5295612977486013582/
DONE
% globus-job-get-output -r grid.example.org:2119/jobmanager-fork \
    https://grid.example.org:38843/16001600430615223386/5295612977486013582/
node1.grid.example.org
% globus-job-clean -r grid.example.org:2119/jobmanager-fork \
    https://grid.example.org:38843/16001600430615223386/5295612977486013582/

    WARNING: Cleaning a job means:
        - Kill the job if it still running, and
        - Remove the cached output on the remote resource

    Are you sure you want to cleanup the job now (Y/N) ?

y

Cleanup successful.

4.4. Using the globusrun tool

The globusrun tool provides a more flexible tool for submitting, monitoring, and canceling jobs. With this tool, most of the functionality of the GRAM5 APIs are made available.

One major difference between globusrun and the other tools described above is that globusrun uses the RSL language to provide the job description, instead of multiple command-line options to describe the various aspects of the job. The section on globus-job-run contained a brief example RSL in the -dumprsl example above.

The following sections show examples of the different modes that globusrun can run in. Full information about globusrun command-line options is available in the public interface guide.

4.4.1. Checking RSL Syntax

This example shows how to check that an RSL document contains a syntactically correct job description. Note that this mode does not do semantic validation of the RSL, so an RSL document that passes this test may not work when submitted to a GRAM5 service.

Example 1.10. Checking RSL Syntax

% globusrun -p "&(executable=a.out)"

RSL Parsed Successfully...

% globusrun -p "&/executable=a.out)"

ERROR: cannot parse RSL &/executable=/adfadf/adf /adf /adf)

Syntax: globusrun [-help] [-f RSL file] [-s][-b][-d][...] [-r RM] [RSL]


Use -help to display full usage

4.4.2. Checking Service Contacts

This example shows how to check that a globus-gatekeeper is running at a particular contact and that the client and service have mutually-trusted credentials.

Example 1.11. GRAM Authentication test

% globusrun -a -r grid.example.org:2119/jobmanager-pbs
GRAM Authentication test successful
% globusrun -a -r grid.example.org:2119/jobmanager-lsf
GRAM Authentication test failure: the gatekeeper failed to find the requested service
% globusrun -a -r grid.example.org:2119/jobmanager-pbs:host@not.example.org
GRAM Authentication test failure: an authorization operation failed
globus_xio_gsi: gss_init_sec_context failed.
GSS Major Status: Unexpected Gatekeeper or Service Name
globus_gsi_gssapi: Authorization denied: The name of the remote host
(host@not.example.org), and the expected name for the remote host
(grid.example.org) do not match. This happens when the name in the host
certificate does not match the information obtained from DNS and is often a DNS
configuration problem.
                

[Note]Note

The DNS configuration problem was a common issue in GRAM2, but GRAM5 will not depend on DNS to resolve names for mutual authentication.

4.4.3. Checking GRAM service version

This example shows how to determine what software version of GRAM5 is deployed at a particular service contact.

Example 1.12. GRAM version check

% globusrun -j -r grid.example.org:2119/jobmanager-pbs:host@not.example.org
Toolkit version: 4.3.0-HEAD
Job Manager version: 10.5 (1256257907-0)
                

[Note]Note

This example shows the version number for an unreleased development version of GRAM5. The actual numbers returned will be different.

[Note]Note

This feature is new in GRAM5. When contacting a GRAM2 service, globusrun will display the following error message:

GRAM version check failed : an incoming HTTP message did not contain the expected information

4.4.4. Basic Interactive job with globusrun

This example shows how to submit interactive job with globusrun. When the -s is used, the output of the job command is returned to the client and displayed as if the command ran locally. This is similar to the behavior of the globus-job-run program described above.

Example 1.13. Basic Interactive Job

% globusrun -s -r example.grid.org/jobmanager-pbs "&(executable=/bin/hostname)(count=5)"
node03.grid.example.org
node01.grid.example.org
node02.grid.example.urg
node05.grid.example.org
node04.grid.example.org

4.4.5. Basic batch job with globusrun

This example shows how to submit, monitor, and cancel a batch job using globusrun. This method is useful for the case where the job may run for a long time, the job may be queued for a long time, or when there are network reliability issues between the client and service.

Example 1.14. Basic Batch Job

% globusrun -b -r grid.example.org:2119/jobmanager-pbs "&(executable=/bin/sleep)(arguments=500)"
globus_gram_client_callback_allow successful
GRAM Job submission successful
https://grid.example.org:38824/16001608125017717261/5295612977486019989/
GLOBUS_GRAM_PROTOCOL_JOB_STATE_PENDING
% globusrun -status https://grid.example.org:38824/16001608125017717261/5295612977486019989/
PENDING
% globusrun -k https://grid.example.org:38824/16001608125017717261/5295612977486019989/
% 

4.4.6. Refreshing a GRAM5 Credential

The following example shows how to refresh the credential used by a job manager and a job.

Example 1.15. Refreshing a Credential

% globusrun -refresh-proxy https://grid.example.org:38824/16001608125017717261/5295612977486019989/
% echo $?
0

[Note]Note

In GT 5.0.2, globusrun does not print any diagnostics when given the -refresh-proxy command-line option. Therefore, check the exit code as above to ensure that the refresh is successful.

4.4.7. Dealing with credential expiration

When the Job Manager's credential is about to expire, it sends a message to all clients registered for GLOBUS_GRAM_PROTOCOL_JOB_STATE_FAILED notifications that the job manager is terminating and that the job will continue to run without the job manager.

Any client which receives such a message can (if necessary) generate a new proxy as described above and then submit a restart request to start a job manager with a new credential. This job manager will resume monitoring the jobs which were started prior to proxy expiration.

In this example, the globusrun displays an error message when the job manager's proxy is about to expire. The user creates a new proxy and resumes monitoring the job with globusrun.

Example 1.16. Proxy Expiration Example

% globusrun -r grid.example.org "&(executable=a.out)"
globus_gram_client_callback_allow successful
GRAM Job submission successful
GLOBUS_GRAM_PROTOCOL_JOB_STATE_ACTIVE
GLOBUS_GRAM_PROTOCOL_JOB_STATE_FAILED
GRAM Job failed because the user proxy expired (job is still running) (error code 131)
% grid-proxy-init
Your identity: /DC=org/DC=example/OU=grid/CN=Joe User
Enter GRID pass phrase for this identity:
Creating proxy ........................................................................... Done
Your proxy is valid until: Tue Nov 10 04:25:03 2009
% globusrun -r grid.example.org "&(restart="https://grid.example.org:1997/16001700477575114131/5295612977486005428/)"
globus_gram_client_callback_allow successful
GRAM Job submission successful
GLOBUS_GRAM_PROTOCOL_JOB_STATE_ACTIVE
GLOBUS_GRAM_PROTOCOL_JOB_STATE_DONE

4.4.8. File staging

In addition to the standard output and error stream output done by globusrun, GRAM5 can do basic file management tasks to stage files to the GRAM5 service node before submitting a job and to stage files from the GRAM5 service node to a file service after the job completes.

GRAM5 file staging supports four URL schemes: ftp, gsiftp, http, and https. Note, that for the https scheme, GRAM expects the file server to be running with the same identity as the client.

General file staging is controlled by three RSL attributes: file_stage_in, file_stage_in_shared, and file_stage_out. In addition, the files named by the RSL attributes executable, stdin may be staged in and the files named by the RSL attributes stdout and stderr may be staged out.

The file_stage_in_shared RSL attribute instructs GRAM to store a local copy of the resource named by the URL in the GASS cache. This is useful if multiple concurrent jobs will be accessing one or more common files. The GASS cache will manage a reference count for files in the cache and remove them when all jobs that refer to them complete.

The following example shows how to stage a few files from a GridFTP server to the GRAM node. It uses the rsl_substitution mechanism to define a subsitution variable to reduce the amount of redundancy in the job description.

Example 1.17. File stage in

% globusrun -s -r grid.example.org:2119/jobmanager-pbs \
    "&(rsl_substitution = (GRIDFTP_SERVER gsiftp://gridftp.example.org)) \
      (executable=/bin/ls)
      (arguments=/tmp/staged_file)
      (file_stage_in = ($(GRIDFTP_SERVER)/staged_file /tmp/staged_file))"
/tmp/staged_file

The next example uses the file_stage_in_shared RSL attribute to stage a file into the cache. The file is transferred from the client using the GASS https server embedded in the globusrun program when the -s option is used.

Example 1.18. File stage in shared

% globusrun -s -r grid.example.org:2119/jobmanager-pbs \
    "&(executable=/bin/ls) \
      (arguments = -l /tmp/staged_file_link1 /tmp/staged_file_link1) \
      (file_stage_in_shared = \
          (\$(GLOBUSRUN_GASS_URL)/staged_file1 /tmp/staged_file_link1))"
lrwxr-xr-x  1 juser   juser  120 Nov 11 20:37 /tmp/staged_file1 -> /home/juser/.globus/.gass_cache/local/md5/ff/771bded8a2c7dacc1a1c0fecafa0ce/md5/39/13ab3db7fc002ed54012083ae6ed1c/data

The final staging example uses the file_stage_out RSL attribute to transfer a file from the GRAM service to an FTP server using anonymous FTP

Example 1.19. File stage out

% globusrun -r grid.example.org:2119/jobmanager-pbs \
    "&(executable=a.out) \
      (file_stage_out = (results.txt ftp://anonymous:nopass@ftp.example.org/incoming/results.txt))"
% 

[Note]Note

In all of the above cases, multiple files may be staged using any combination of the supported URL schemes.

4.4.9. Temporary files and cleanup

GRAM5 supports creating a per-job scratch directory which can be used as a place to store files that will be automatically removed by GRAM when the job completes. It also supports an explicit list of files to remove when the job completes.

This example shows how to stage files into a scratch directory. It again uses the embedded GASS https server, stages to the GRAM service, then runs /bin/ls in the temporary directory. After the job completes, the contents of $(SCRATCH_DIRECTORY) and the directory itself are removed.

Example 1.20. Staging to scratch directory

% globusrun -s grid.example.org:2119/jobmanager-pbs \
    "&(scratch_dir = \$(HOME)) \
      (directory = \$(SCRATCH_DIRECTORY))
      (file_stage_in = \
          (\$(GLOBUSRUN_GASS_URL)/inputfile $(SCRATCH_DIRECTORY)/inputfile)) \
      (executable = /bin/ls)"
inputfile

This example shows how to explicitly remove a file that was created by the job.

Example 1.21. Cleaning up a file

% globusrun -s grid.example.org:2119/jobmanager-pbs \
    "&(executable = /bin/touch) \
      (arguments = temporary_file) \
      (file_clean_up = temporary_file)"
% 

4.4.10. Reliable job submit

The globusrun command supports a two-phase commit protocol to ensure that the client knows the contact of the job which has been created so that it can be monitored or canceled in the case of a client or service error. The two-phase commit affects both job submission and termination.

The two-phase protocol is enabled by using the two_phase RSL attribute, as in the next example. When this is enabled, job submission will fail with the error GLOBUS_GRAM_PROTOCOL_ERROR_WAITING_FOR_COMMIT. The client must respond to this signal with either the GLOBUS_GRAM_PROTOCOL_JOB_SIGNAL_COMMIT_REQUEST or GLOBUS_GRAM_PROTOCOL_JOB_SIGNAL_COMMIT_EXTEND signals to either commit the job to execution or delay the commit timeout. One of these signals must be sent prior to the two phase commit timeout, or the job will be discarded by the GRAM service.

A two phase protocol is also used at job termination if the save_state RSL attribute is used along with the two_phase attribute. When the job manager sends a callback with the job state set to GLOBUS_GRAM_PROTOCOL_JOB_STATE_DONE or GLOBUS_GRAM_PROTOCOL_JOB_STATE_DONE it will wait to clean up the job until the two phase commit occurs. The client must reply with the GLOBUS_GRAM_PROTOCOL_JOB_SIGNAL_COMMIT_END signal to cause the job to be cleaned. Otherwise, the job will be unloaded from memory until a client restarts the job and sends the signal.

Example 1.22. Two phase commit example

In this example, the user submits a job with a two_phase timeout of 30 seconds and the save_state attribute. The client must send commit signals to ensure the job runs.

% globusrun -r grid.example.org:2119/jobmanager-pbs \
    "&(two_phase = 30) \
      (save_state = yes) \
      (executable = a.out)"

globus_gram_client_callback_allow successful
GRAM Job submission successful
GLOBUS_GRAM_PROTOCOL_JOB_STATE_PENDING
GLOBUS_GRAM_PROTOCOL_JOB_STATE_ACTIVE
GLOBUS_GRAM_PROTOCOL_JOB_STATE_DONE
% 

4.4.11. Reconnecting to a job

If a job manager or client exits before a job has completed, the job will continue to run. The client can reconnect to a job manager and receive job state notifications and output using the restart RSL attribute.

Example 1.23. Restart example

This example uses globus-job-submit to submit a batch job and then globusrun to reconnect to the job.

% globus-job-submit grid.example.org:2119/jobmanager-pbs /bin/sleep 90
https://grid.example.org:38824/16001746665595486521/5295612977486005662/
% globusrun -r grid.example.org:2119/jobmanager-pbs \
    "&(restart = https://grid.example.org:38824/16001746665595486521/5295612977486005662/)"
globus_gram_client_callback_allow successful
GRAM Job submission successful
GLOBUS_GRAM_PROTOCOL_JOB_STATE_DONE
% 

4.4.12. Submitting a Java job

To submit a job that runs a java program, the client must ensure that the job can find the Java interpreter and its classes. This example sets the default PATH and CLASSPATH environment variables and uses the shell to locate the path to the java program.

Example 1.24. Java example

This example uses globus-job-submit to submit a java job, staging a jar file from a remote service.

% globusrun -r grid.example.org:2119/jobmanager-pbs \
    "&(environment = (PATH '/usr/bin:/bin') (CLASSPATH \$(SCRATCH_DIRECTORY)))
      (scratch_dir = \$(HOME)) 
      (directory = \$(SCRATCH_DIRECTORY))
      (rsl_substitution = (JAVA_SERVER http://java.example.org))
      (file_stage_in = 
          (\$(JAVA_SERVER)/example.jar \$(SCRATCH_DIRECTORY)/example.jar) 
          (\$(JAVA_SERVER)/support.jar \$(SCRATCH_DIRECTORY)/support.jar))
      (executable=/bin/sh)
      (arguments=-c 'java -jar example.jar')"
globus_gram_client_callback_allow successful
GRAM Job submission successful
GLOBUS_GRAM_PROTOCOL_JOB_STATE_PENDING
GLOBUS_GRAM_PROTOCOL_JOB_STATE_ACTIVE
GLOBUS_GRAM_PROTOCOL_JOB_STATE_DONE
% 

5. Using GRAM5 with Condor-G

Condor-G users should upgrade their clients to condor 7.4.0 or later to achvieve highest performance. That version includes the gt5 grid type, which includes client-side optimizations to improve performance. To use an older Condor-G client, be sure to set the GRIDMANAGER_MAX_JOBMANAGERS_PER_RESOURCE attribute to 0 to disable the Condor-G client attempts to stop and restart the job manager service. Also, disable the Grid Monitor when using a GRAM5 resource by setting the ENABLE_GRID_MONITOR configuration attribute to FALSE.

GRAM5 Commands


Table of Contents

globusrun - Execute and manage jobs via GRAM
globus-job-cancel - Cancel a GRAM batch job
globus-job-clean - Cancel and clean up a GRAM batch job
globus-job-get-output - Retrieve the output and error streams from a GRAM job
globus-job-run - Execute a job using GRAM
globus-job-status - Check the status of a GRAM5 job
globus-job-submit - Submit a batch job using GRAM
globus-personal-gatekeeper - Manage a user's personal gatekeeper daemon
globus-gram-audit - Load GRAM4 and GRAM5 audit records into a database
globus-gatekeeper - Authorize and execute a grid service on behalf of a user
globus-job-manager - Execute and monitor jobs
globus-job-manager-event-generator - Create LRM-independent SEG files for the job manager to use
globus-fork-starter - Start and monitor a fork job

Name

globusrun — Execute and manage jobs via GRAM

Synopsis

globusrun [-help] [-usage] [-version] [-versions]

globusrun { -p | -parse }
{ -f RSL_FILENAME | -file RSL_FILENAME | RSL_SPECIFICATION }

globusrun [-n] [-no-interrupt]
{ -r RESOURCE_CONTACT | -resource RESOURCE_CONTACT }
{ -a | -authenticate-only }

globusrun [-n] [-no-interrupt]
{ -r RESOURCE_CONTACT | -resource RESOURCE_CONTACT }
{ -j | -jobmanager-version }

globusrun [-n] [-no-interrupt] { -k | -kill } {JOB_ID}

globusrun [-n] [-no-interrupt] [-full-proxy] [-D] { -y | -refresh-proxy } {JOB_ID}

globusrun { -status } {JOB_ID}

globusrun [-q] [-quiet] [-o] [-output-enable] [-s] [-server] [-w] [-write-allow] [-n] [-no-interrupt] [-b] [-batch] [-F] [-fast-batch] [-full-proxy] [-D] [-d] [-dryrun]
{ -r RESOURCE_CONTACT | -resource RESOURCE_CONTACT }
{ -f RSL_FILENAME | -file RSL_FILENAME | RSL_SPECIFICATION }

Description

The globusrun program for submits and manages jobs run on a local or remote job host. The jobs are controlled by the globus-job-manager program which interfaces with a local resource manager that schedules and executes the job.

The globusrun program can be run in a number of different modes chosen by command-line options.

When -help, -usage, -version, or -versions command-line options are used, globusrun will print out diagnostic information and then exit.

When the -p or -parse command-line option is present, globusrun will verify the syntax of the RSL specification and then terminate. If the syntax is valid, globusrun will print out the string "RSL Parsed Successfully..." and exit with a zero exit code; otherwise, it will print an error message and terminate with a non-zero exit code.

When the -a or -authenticate-only command-line option is present, globusrun will verify that the service named by RESOURCE_CONTACT exists and the client's credentials are granted permission to access that service. If authentication is successful, globusrun will display the string "GRAM Authentication test successful" and exit with a zero exit code; otherwise it will print an explanation of the problem and will with a non-zero exit code.

When the -j or -jobmanager-version command-line option is present, globusrun will attempt to determine the software version that the service named by RESOURCE_CONTACT is running. If successful, it will display both the Toolkit version and the Job Manager package version and exit with a zero exit code; otherwise, it will print an explanation of the problem and exit with a non-zero exit code.

When the -k or -kill command-line option is present, globusrun will attempt to terminate the job named by JOB_ID. If successful, globusrun will exit with zero; otherwise it will display an explanation of the problem and exit with a non-zero exit code.

When the -y or -refresh-proxy command-line option is present, globusrun will attempt to delegate a new X.509 proxy to the job manager which is managing the job named by JOB_ID. If successful, globusrun will exit with zero; otherwise it will display an explanation of the problem and exit with a non-zero exit code. This behavior can be modified by the -full-proxy or -D command-line options to enable full proxy delegation. The default is limited proxy delegation.

When the -status command-line option is present, globusrun will attempt to determine the current state of the job. If successful, the state will be printed to standard output and globusrun will exit with a zero exit code; otherwise, a description of the error will be displayed and it will exit with a non-zero exit code.

Otherwise, globusrun will submit the job to a GRAM service. By default, globusrun waits until the job has terminated or failed before exiting, displaying information about job state changes and at exit time, the job exit code if it is provided by the GRAM service.

The globusrun program can also function as a GASS file server to allow the globus-job-manager program to stage files to and from the machine on which globusrun is executed to the GRAM service node. This behavior is controlled by the -s, -o, and -w command-line options.

Jobs submitted by globusrun can be monitored interactively or detached. To have globusrun detach from the GRAM service after submitting the job, use the -b or -F command-line options.

Options

The full set of options to globusrun consist of:

-help
Display a help message to standard error and exit.
-usage
Display a one-line usage summary to standard error and exit.
-version
Display the software version of globusrun to standard error and exit.
-versions
Display the software version of all modules used by globusrun (including DiRT information) to standard error and then exit.
-p, -parse
Do a parse check on the job specification and print diagnostics. If a parse error occurs, globusrun exits with a non-zero exit code.
-f RSL_FILENAME, -file RSL_FILENAME
Read job specification from the file named by RSL_FILENAME.
-n, -no-interrupt
Disable handling of the SIGINT signal, so that the interrupt character (typically Control-C) causes globusrun to terminate without canceling the job.
-r RESOURCE_CONTACT, -resource RESOURCE_CONTACT

Submit the request to the resource specified by RESOURCE_CONTACT. A resource may be specified in the following ways:

  • HOST
  • HOST:PORT
  • HOST:PORT/SERVICE
  • HOST/SERVICE
  • HOST:/SERVICE
  • HOST::SUBJECT
  • HOST:PORT:SUBJECT
  • HOST/SERVICE:SUBJECT
  • HOST:/SERVICE:SUBJECT
  • HOST:PORT/SERVICE:SUBJECT

If any of PORT, SERVICE, or SUBJECT is omitted, the defaults of 2811, jobmanager, and host@HOST are used respectively.

-j, -jobmanager-version
Print the software version being run by the service running at RESOURCE_CONTACT.
-k JOB_ID, -kill JOB_ID
Kill the job named by JOB_ID
-D, -full-proxy
Delegate a full impersonation proxy to the service. By default, a limited proxy is delegated when needed.
-y, -refresh-proxy
Delegate a new proxy to the service processing JOB_ID.
-status
Display the current status of the job named by JOB_ID.
-q, -quiet
Do not display job state change or exit code information.
-o, -output-enable
Start a GASS server within the globusrun application that allows access to its standard output and standard error streams only. Also, augment the RSL_SPECIFICATION with a definition of the GLOBUSRUN_GASS_URL RSL substitution and add stdout and stderr clauses which redirect the output and error streams of the job to the output and error streams of the interactive globusrun command. If this is specified, then globusrun acts as though the -q were also specified.
-s, -server
Start a GASS server within the globusrun application that allows access to its standard output and standard error streams for writing and any file local the the globusrun invocation for reading. Also, augment the RSL_SPECIFICATION with a definition of the GLOBUSRUN_GASS_URL RSL substitution and add stdout and stderr clauses which redirect the output and error streams of the job to the output and error streams of the interactive globusrun command. If this is specified, then globusrun acts as though the -q were also specified.
-w, -write-allow
Start a GASS server within the globusrun application that allows access to its standard output and standard error streams for writing and any file local the the globusrun invocation for reading or writing. Also, augment the RSL_SPECIFICATION with a definition of the GLOBUSRUN_GASS_URL RSL substitution and add stdout and stderr clauses which redirect the output and error streams of the job to the output and error streams of the interactive globusrun command. If this is specified, then globusrun acts as though the -q were also specified.
-b, -batch
Terminate after submitting the job to the GRAM service. The globusrun program will exit after the job hits any of the following states: PENDING, ACTIVE, FAILED, or DONE. The GASS-related options can be used to stage input files, but standard output, standard error, and file staging after the job completes will not be processed.
-F, -fast-batch
Terminate after submitting the job to the GRAM service. The globusrun program will exit after it receives a reply from the service. The JOB_ID will be displayed to standard output before terminating so that the job can be checked with the -status command-line option or modified by the -refresh-proxy or -kill command-line options.
-d, -dryrun
Submit the job with the dryrun attribute set to true. When this is done, the job manager will prepare to start the job but start short of submitting it to the service. This can be used to detect problems with the RSL_SPECIFICATION.

Environment

If the following variables affect the execution of globusrun

X509_USER_PROXY
Path to proxy credential.
X509_CERT_DIR
Path to trusted certificate directory.

Bugs

The globusrun program assumes any failure to contact the job means the job has terminated. In fact, this may be due to the globus-job-manager program exiting after all jobs it is managing have reached the DONE or FAILED states. In order to reliably detect job termination, the two_phase RSL attribute should be used.

See Also

globus-job-submit(1), globus-job-run(1), globus-job-clean(1), globus-job-get-output(1), globus-job-cancel(1)

Name

globus-job-cancel — Cancel a GRAM batch job

Synopsis

globus-job-cancel [ -f | -force ] [ -q | -quiet ] JOBID

globus-job-cancel [-help] [-usage] [-version] [-versions]

Description

The globus-job-cancel program cancels the job named by JOBID. Any cached files associated with the job will remain until globus-job-clean is executed for the job.

By default, globus-job-cancel prompts the user prior to canceling the job. This behavior can be overridden by specifying the -f or -force command-line options.

Options

The full set of options to globus-job-cancel are:

-help, -usage
Display a help message to standard error and exit.
-version
Display the software version of the globus-job-cancel program to standard output.
-version
Display the software version of the globus-job-cancel program including DiRT information to standard output.
-force, -f
Do not prompt to confirm job cancel and clean-up.
-quiet, -q
Do not print diagnostics for succesful cancel. Implies -f

ENVIRONMENT

If the following variables affect the execution of globus-job-cancel.

X509_USER_PROXY
Path to proxy credential.
X509_CERT_DIR
Path to trusted certificate directory.

Name

globus-job-clean — Cancel and clean up a GRAM batch job

Synopsis

globus-job-clean [ -r RESOURCE | -resource RESOURCE ]
[ -f | -force ] [ -q | -quiet ] JOBID

globus-job-clean [-help] [-usage] [-version] [-versions]

Description

The globus-job-clean program cancels the job named by JOBID if it is still running, and then removes any cached files on the GRAM service node related to that job. In order to do the file clean up, it submits a job which removes the cache files. By default this cleanup job is submitted to the default GRAM resource running on the same host as the job. This behavior can be controlled by specifying a resource manager contact string as the parameter to the -r or -resource option.

By default, globus-job-clean prompts the user prior to canceling the job. This behavior can be overridden by specifying the -f or -force command-line options.

Options

The full set of options to globus-job-clean are:

-help, -usage
Display a help message to standard error and exit.
-version
Display the software version of the globus-job-clean program to standard output.
-version
Display the software version of the globus-job-clean program including DiRT information to standard output.
-resource RESOURCE, -r RESOURCE
Submit the clean-up job to the resource named by RESOURCE instead of the default GRAM service on the same host as the job contact.
-force, -f
Do not prompt to confirm job cancel and clean-up.
-quiet, -q
Do not print diagnostics for succesful clean-up. Implies -f

ENVIRONMENT

If the following variables affect the execution of globus-job-clean.

X509_USER_PROXY
Path to proxy credential.
X509_CERT_DIR
Path to trusted certificate directory.

Name

globus-job-get-output — Retrieve the output and error streams from a GRAM job

Synopsis

globus-job-get-output [ -r RESOURCE | -resource RESOURCE ]
[ -out | -err ] [ -t LINES | -tail LINES ] [ -follow LINES | -f LINES ] JOBID

globus-job-get-output [-help] [-usage] [-version] [-versions]

Description

The globus-job-get-output program retrieves the output and error streams of the job named by JOBID. By default, globus-job-get-output will retrieve all output and error data from the job and display them to its own output and error streams. Other behavior can be controlled by using command-line options. The data retrieval is implemented by submitting another job which simply displays the contents of the first job's output and error streams. By default this retrieval job is submitted to the default GRAM resource running on the same host as the job. This behavior can be controlled by specifying a particular resource manager contact string as the RESOURCE parameter to the -r or -resource option.

Options

The full set of options to globus-job-get-output are:

-help, -usage
Display a help message to standard error and exit.
-version
Display the software version of the globus-job-get-output program to standard output.
-version
Display the software version of the globus-job-get-output program including DiRT information to standard output.
-resource RESOURCE, -r RESOURCE
Submit the retrieval job to the resource named by RESOURCE instead of the default GRAM service on the same host as the job contact.
-out
Retrieve only the standard output stream of the job. The default is to retrieve both standard output and standard error.
-err
Retrieve only the standard error stream of the job. The default is to retrieve both standard output and standard error.
-tail LINES, -t LINES
Print only the last LINES count lines of output from the data streams being retrieved. By default, the entire output and error file data is retrieved. This option can not be used along with the -f or -follow options.
-follow LINES, -f LINES
Print the last LINES count lines of output from the data streams being retrieved and then wait until canceled, printing any subsequent job output that occurs. By default, the entire output and error file data is retrieved. This option can not be used along with the -t or -tail options.

ENVIRONMENT

If the following variables affect the execution of globus-job-get-output.

X509_USER_PROXY
Path to proxy credential.
X509_CERT_DIR
Path to trusted certificate directory.

Name

globus-job-run — Execute a job using GRAM

Synopsis

globus-job-run [-dumprsl] [-dryrun] [-verify]
[-file ARGUMENT_FILE]
SERVICE_CONTACT
[ -np PROCESSES | -count PROCESSES ]
[ -m MAX_TIME | -maxtime MAX_TIME ]
[ -p PROJECT | -project PROJECT ]
[ -q QUEUE | -queue QUEUE ]
[ -d DIRECTORY | -directory DIRECTORY ] [-env NAME=VALUE]...
[-stdin [ -l | -s ] STDIN_FILE ] [-stdout [ -l | -s ] STDOUT_FILE ] [-stderr [ -l | -s ] STDERR_FILE ]
[-x RSL_CLAUSE]
[ -l | -s ] EXECUTABLE [ARGUMENT...]

globus-job-run [-help] [-usage] [-version] [-versions]

Description

The globus-job-run program constructs a job description from its command-line options and then submits the job to the GRAM service running at SERVICE_CONTACT. The executable and arguments to the executable are provided on the command-line after all other options. Note that the -dumprsl, -dryrun, -verify, and -file command-line options must occur before the first non-option argument, the SERVICE_CONTACT.

The globus-job-run provides similar functionality to globusrun in that it allows interactive start-up of GRAM jobs. However, unlike globusrun, it uses command-line parameters to define the job instead of RSL expressions.

Options

The full set of options to globus-job-run are:

-help, -usage
Display a help message to standard error and exit.
-version
Display the software version of the globus-job-run program to standard output.
-version
Display the software version of the globus-job-run program including DiRT information to standard output.
-dumprsl
Translate the command-line options to globus-job-run into an RSL expression that can be used with tools such as globusrun.
-dryrun
Submit the job request to the GRAM service with the dryrun option enabled. When this option is used, the GRAM service prepares to execute the job but stops before submitting the job to the LRM. This can be used to diagnose some problems such as missing files.
-verify
Submit the job request to the GRAM service with the dryrun option enabled and then without it enabled if the dryrun is successful.
-file ARGUMENT_FILE
Read additional command-line options from ARGUMENT_FILE.
-np PROCESSES, -count PROCESSES
Start PROCESSES instances of the executable as a single job.
-m MAX_TIME, -maxtime MAX_TIME
Schedule the job to run for a maximum of MAX_TIME minutes.
-p PROJECT, -project PROJECT
Request that the job use the allocation PROJECT when submitting the job to the LRM.
-q QUEUE, -queue QUEUE
Request that the job be submitted to the LRM using the named QUEUE.
-d DIRECTORY, -directory DIRECTORY
Run the job in the directory named by DIRECTORY. Input and output files will be interpreted relative to this directory. This directory must exist on the file system on the LRM-managed resource. If not specified, the job will run in the home directory of the user the job is running as.
-env NAME=VALUE
Define an environment variable named by NAME with the value VALUE in the job environment. This option may be specified multiple times to define multiple environment variables.
-stdin [-l | -s] STDIN_FILE
Use the file named by STDIN_FILE as the standard input of the job. If the -l option is specified, then this file is interpreted to be on a file system local to the LRM. If the -s option is specified, then this file is interpreted to be on the file system where globus-job-run is being executed, and the file will be staged via GASS. If neither is specified, the local behavior is assumed.
-stdout [-l | -s] STDOUT_FILE
Use the file named by STDOUT_FILE as the destination for the standard output of the job. If the -l option is specified, then this file is interpreted to be on a file system local to the LRM. If the -s option is specified, then this file is interpreted to be on the file system where globus-job-run is being executed, and the file will be staged via GASS. If neither is specified, the local behavior is assumed.
-stderr [-l | -s] STDERR_FILE
Use the file named by STDERR_FILE as the destination for the standard error of the job. If the -l option is specified, then this file is interpreted to be on a file system local to the LRM. If the -s option is specified, then this file is interpreted to be on the file system where globus-job-run is being executed, and the file will be staged via GASS. If neither is specified, the local behavior is assumed.
-x RSL_CLAUSE
Add a set of custom RSL attributes described by RSL_CLAUSE to the job description. The clause must be an RSL conjunction and may contain one or more attributes. This can be used to include attributes which can not be defined by other command-line options of globus-job-run.
-l
When included outside the context of -stdin, -stdout, or -stderr command-line options, -l option alters the interpretation of the executable path. If the -l option is specified, then the executable is interpreted to be on a file system local to the LRM.
-s
When included outside the context of -stdin, -stdout, or -stderr command-line options, -l option alters the interpretation of the executable path. If the -s option is specified, then the executable is interpreted to be on the file system where globus-job-run is being executed, and the file will be staged via GASS. If neither is specified, the local behavior is assumed.

ENVIRONMENT

If the following variables affect the execution of globus-job-run.

X509_USER_PROXY
Path to proxy credential.
X509_CERT_DIR
Path to trusted certificate directory.

See Also

globusrun(1), globus-job-submit(1), globus-job-clean(1), globus-job-get-output(1), globus-job-cancel(1)

Name

globus-job-status — Check the status of a GRAM5 job

Synopsis

globus-job-status JOBID

globus-job-status [-help] [-usage] [-version] [-versions]

Description

The globus-job-status program checks the status of a GRAM job by sending a status request to the job manager contact for that job specifed by the JOBID parameter. If successful, it will print the job status to standard output. The states supported by globus-job-status are:

PENDING

The job has been submitted to the LRM but has not yet begun execution.

ACTIVE

The job has begun execution.

FAILED

The job has failed.

SUSPENDED

The job is currently suspended by the LRM.

DONE

The job has completed.

UNSUBMITTED

The job has been accepted by GRAM, but not yet submitted to the LRM.

STAGE_IN

The job has been accepted by GRAM and is currently staging files prior to being submitted to the LRM.

STAGE_OUT

The job has completed execution and is currently staging files from the service node to other http, GASS, or GridFTP servers.

Options

The full set of options to globus-job-status are:

-help, -usage
Display a help message to standard error and exit.
-version
Display the software version of the globus-job-status program to standard output.
-versions
Display the software version of the globus-job-status program including DiRT information to standard output.

ENVIRONMENT

If the following variables affect the execution of globus-job-status.

X509_USER_PROXY
Path to proxy credential.
X509_CERT_DIR
Path to trusted certificate directory.

Bugs

The globus-job-status program can not distinguish between the case of the job manager terminating for any reason and the job being in the DONE state.

See Also

globusrun(1)

Name

globus-job-submit — Submit a batch job using GRAM

Synopsis

globus-job-submit [-dumprsl] [-dryrun] [-verify]
[-file ARGUMENT_FILE]
SERVICE_CONTACT
[ -np PROCESSES | -count PROCESSES ]
[ -m MAX_TIME | -maxtime MAX_TIME ]
[ -p PROJECT | -project PROJECT ]
[ -q QUEUE | -queue QUEUE ]
[ -d DIRECTORY | -directory DIRECTORY ] [-env NAME=VALUE]...
[-stdin [ -l | -s ] STDIN_FILE ] [-stdout [ -l | -s ] STDOUT_FILE ] [-stderr [ -l | -s ] STDERR_FILE ]
[-x RSL_CLAUSE]
[ -l | -s ] EXECUTABLE [ARGUMENT...]

globus-job-submit [-help] [-usage] [-version] [-versions]

Description

The globus-job-submit program constructs a job description from its command-line options and then submits the job to the GRAM service running at SERVICE_CONTACT. The executable and arguments to the executable are provided on the command-line after all other options. Note that the -dumprsl, -dryrun, -verify, and -file command-line options must occur before the first non-option argument, the SERVICE_CONTACT.

The globus-job-submit provides similar functionality to globusrun in that it allows batch submission of GRAM jobs. However, unlike globusrun, it uses command-line parameters to define the job instead of RSL expressions.

To retrieve the output and error streams of the job, use the program globus-job-get-output. To reclaim resources used by the job by deleting cached files and job state, use the program globus-job-clean. To cancel a batch job submitted by globus-job-submit, use the program globus-job-cancel.

Options

The full set of options to globus-job-submit are:

-help, -usage
Display a help message to standard error and exit.
-version
Display the software version of the globus-job-submit program to standard output.
-versions
Display the software version of the globus-job-submit program including DiRT information to standard output.
-dumprsl
Translate the command-line options to globus-job-submit into an RSL expression that can be used with tools such as globusrun.
-dryrun
Submit the job request to the GRAM service with the dryrun option enabled. When this option is used, the GRAM service prepares to execute the job but stops before submitting the job to the LRM. This can be used to diagnose some problems such as missing files.
-verify
Submit the job request to the GRAM service with the dryrun option enabled and then without it enabled if the dryrun is successful.
-file ARGUMENT_FILE
Read additional command-line options from ARGUMENT_FILE.
-np PROCESSES, -count PROCESSES
Start PROCESSES instances of the executable as a single job.
-m MAX_TIME, -maxtime MAX_TIME
Schedule the job to run for a maximum of MAX_TIME minutes.
-p PROJECT, -project PROJECT
Request that the job use the allocation PROJECT when submitting the job to the LRM.
-q QUEUE, -queue QUEUE
Request that the job be submitted to the LRM using the named QUEUE.
-d DIRECTORY, -directory DIRECTORY
Run the job in the directory named by DIRECTORY. Input and output files will be interpreted relative to this directory. This directory must exist on the file system on the LRM-managed resource. If not specified, the job will run in the home directory of the user the job is running as.
-env NAME=VALUE
Define an environment variable named by NAME with the value VALUE in the job environment. This option may be specified multiple times to define multiple environment variables.
-stdin [-l | -s] STDIN_FILE
Use the file named by STDIN_FILE as the standard input of the job. If the -l option is specified, then this file is interpreted to be on a file system local to the LRM. If the -s option is specified, then this file is interpreted to be on the file system where globus-job-submit is being executed, and the file will be staged via GASS. If neither is specified, the local behavior is assumed.
-stdout [-l | -s] STDOUT_FILE
Use the file named by STDOUT_FILE as the destination for the standard output of the job. If the -l option is specified, then this file is interpreted to be on a file system local to the LRM. If the -s option is specified, then this file is interpreted to be on the file system where globus-job-submit is being executed, and the file will be staged via GASS. If neither is specified, the local behavior is assumed.
-stderr [-l | -s] STDERR_FILE
Use the file named by STDERR_FILE as the destination for the standard error of the job. If the -l option is specified, then this file is interpreted to be on a file system local to the LRM. If the -s option is specified, then this file is interpreted to be on the file system where globus-job-submit is being executed, and the file will be staged via GASS. If neither is specified, the local behavior is assumed.
-x RSL_CLAUSE
Add a set of custom RSL attributes described by RSL_CLAUSE to the job description. The clause must be an RSL conjunction and may contain one or more attributes. This can be used to include attributes which can not be defined by other command-line options of globus-job-submit.
-l
When included outside the context of -stdin, -stdout, or -stderr command-line options, -l option alters the interpretation of the executable path. If the -l option is specified, then the executable is interpreted to be on a file system local to the LRM.
-s
When included outside the context of -stdin, -stdout, or -stderr command-line options, -l option alters the interpretation of the executable path. If the -s option is specified, then the executable is interpreted to be on the file system where globus-job-run is being executed, and the file will be staged via GASS. If neither is specified, the local behavior is assumed.

ENVIRONMENT

If the following variables affect the execution of globus-job-submit.

X509_USER_PROXY
Path to proxy credential.
X509_CERT_DIR
Path to trusted certificate directory.

See Also

globusrun(1), globus-job-run(1), globus-job-clean(1), globus-job-get-output(1), globus-job-cancel(1)

Name

globus-personal-gatekeeper — Manage a user's personal gatekeeper daemon

Synopsis

globus-personal-gatekeeper [-help] [-usage] [-version] [-versions] [-list] [-directory CONTACT]

globus-personal-gatekeeper [-debug] {-start} [-jmtype LRM] [-auditdir AUDIT_DIRECTORY] [-port PORT] [-log [=DIRECTORY]] [-seg] [-acctfile ACCOUNTING_FILE]

globus-personal-gatekeeper [-killall] [-kill]

Description

The globus-personal-gatekeeper command is a utility which manages a gatekeeper and job manager service for a single user. Depending on the command-line arguments it will operate in one of several modes. In the first set of arguments indicated in the synopsis, the program provides information about the globus-personal-gatekeeper command or about instances of the globus-personal-gatekeeper that are running currently. The second set of arguments indicated in the synopsis provide control over starting a new globus-personal-gatekeeper instance. The final set of arguments provide control for terminating one or more globus-personal-gatekeeper instances.

The -start mode will create a new subdirectory of $HOME/.globus and write the configuration files needed to start a globus-gatekeeper daemon which will invoke the globus-job-manager service when new authenticated connections are made to its service port. The globus-personal-gatekeeper then exits, printing the contact string for the new gatekeeper prefixed by GRAM contact: to standard output. In addition to the arguments described above, any arguments described in globus-job-manager(8) can be appended to the command-line and will be added to the job manager configuration for the service started by the globus-gatekeeper.

The new globus-gatekeeper will continue to run in the background until killed by invoking globus-personal-gatekeeper with the -kill or -killall argument. When killed, it will kill the globus-gatekeeper and globus-job-manager processes, remove state files and configuration data, and then exit. Jobs which are running when the personal gatekeeper is killed will continue to run, but their job directory will be destroyed so they may fail in the LRM.

The full set of command-line options to globus-personal-gatekeeper consists of:

-help, -usage
Print command-line option summary and exit
-version
Print software version
-versions
Print software version including DiRT information
-list
Print a list of all currently running personal gatekeepers. These entries will be printed one per line.
-directory CONTACT
Print the configuration directory for the personal gatekeeper with the contact string CONTACT.
-debug
Print additional debugging information when starting a personal gatekeeper. This option is ignored in other modes.
-start
Start a new personal gatekeeper process.
-jmtype LRM
Use LRM as the local resource manager interface. If not provided when starting a personal gatekeeper, the job manager will use the default fork LRM.
-auditdir AUDIT_DIRECTORY
Write audit report files to AUDIT_DIRECTORY. If not provided, the job manager will not write any audit files.
-port PORT
Listen for gatekeeper TCP/IP connections on the port PORT. If not provided, the gatekeeper will let the operating system choose.
-log[=DIRECTORY]
Write job manager log files to DIRECTORY. If DIRECTORY is omitted, the default of $HOME will be used. If this option is not present, the job manager will not write any log files.
-seg
Try to use the SEG mechanism to receive job state change information, instead of polling for these. These require either the system administrator or the user to run an instance of the globus-job-manager-event-generator program for the LRM specified by the -jmtype option.
-acctfile ACCOUNTING_FILE
Write gatekeeper accounting entries to ACCOUNTING_FILE. If not provided, no accounting records are written.

Examples

This example shows the output when starting a new personal gatekeeper which will schedule jobs via the lsf LRM, with debugging enabled.

% globus-personal-gatekeeper -start -jmtype lsf

verifying setup...
done.
GRAM contact: personal-grid.example.org:57846:/DC=org/DC=example/CN=Joe User

This example shows the output when listing the current active personal gatekeepers.

% globus-personal-gatekeeper -list

personal-grid.example.org:57846:/DC=org/DC=example/CN=Joe User

This example shows the output when querying the configuration directory for th eabove personal gatekeeper. gatekeepers.

% globus-personal-gatekeeper -directory "personal-grid.example.org:57846:/DC=org/DC=example/CN=Joe User"

/home/juser/.globus/.personal-gatekeeper.personal-grid.example.org.1337
% globus-personal-gatekeeper -kill "personal-grid.example.org:57846:/DC=org/DC=example/CN=Joe User"

killing gatekeeper: "personal-grid.example.org:57846:/DC=org/DC=example/CN=Joe User"

See Also

globusrun(1), globus-job-manager(8), globus-gatekeeper(8)

Name

globus-gram-audit — Load GRAM4 and GRAM5 audit records into a database

Synopsis

globus-gram-audit [--conf CONFIG_FILE] [--check] [--delete] [--audit-directory AUDITDIR]

Description

The globus-gram-audit program loads audit records to an SQL-based database. It reads $GLOBUS_LOCATION/etc/globus-job-manager.conf by default to determine the audit directory and then uploads all files in that directory that contain valid audit records to the database configured by the globus_gram_job_manager_auditing_setup_scripts package. If the upload completes successfully, the audit files will be removed.

The full set of command-line options to globus-gram-audit consist of:

--conf CONFIG_FILE

Use CONFIG_FILE instead of the default from the configuration file for audit database configuration.

--check

Check whether the insertion of a record was successful by querying the database after inserting the records. This is used in tests.

--deleteDelete audit records from the database right after inserting them. This is used in tests to avoid filling the databse with test records.
--audit-directory DIRLook for audit records in DIR, instead of looking in the directory specified in the job manager configuration. This is used in tests to control which records are loaded to the database and then deleted.
--query SQLPerform the given SQL query on the audit database. This uses the database information from the configuration file to determine how to contact the database.

FILES

The globus-gram-audit uses the following files (paths relative to $GLOBUS_LOCATION.

etc/globus-gram-job-manager.conf

GRAM5 job manager configuration. It includes the default path to the audit directory

etc/globus-gram-audit.conf

Audit configuration. It includes the information needed to contact the audit database.

Name

globus-gatekeeper — Authorize and execute a grid service on behalf of a user

Synopsis

globus-gatekeeper [-help]
[-conf PARAMETER_FILE]
[-test] [ -d | -debug ]
{ -inetd | -f }
[ -p PORT | -port PORT ]
[-home PATH] [ -l LOGFILE | -logfile LOGFILE ]
[-acctfile ACCTFILE]
[-e LIBEXECDIR]
[-launch_method { fork_and_exit | fork_and_wait | dont_fork } ]
[-grid_services SERVICEDIR]
[-globusid GLOBUSID]
[-gridmap GRIDMAP]
[-x509_cert_dir TRUSTED_CERT_DIR]
[-x509_cert_file TRUSTED_CERT_FILE]
[-x509_user_cert CERT_PATH]
[-x509_user_key KEY_PATH]
[-x509_user_proxy PROXY_PATH]
[-k]
[-globuskmap KMAP]

Description

The globus-gatekeeper program is a meta-server similar to inetd or xinetd that starts other services after authenticating the TCP connection using GSSAPI.

The most common use for the globus-gatekeeper program is to start instances of the globus-job-manager(8) service. A single globus-gatekeeper deployment can handle multiple different service configurations by having entries in the grid-services directory.

Typically, users interact with the globus-gatekeeper program via client applications such as globusrun(1), globus-job-submit, or tools such as CoG jglobus or Condor-G.

The full set of command-line options to globus-gatekeeper consists of:

-help
Display a help message to standard error and exit
-conf PARAMETER_FILE
Load configuration parameters from PARAMETER_FILE. The parameters in that file are treated as additional command-line options.
-test
Parse the configuration file and print out the POSIX user id of the globus-gatekeeper process, service home directory, service execution directory, and X.509 subject name and then exits.
-d, -debug
Run the globus-gatekeeper process in the foreground.
-inetd
Flag to indicate that the globus-gatekeeper process was started via inetd or a similar super-server. If this flag is set and the globus-gatekeeper was not started via inetd, a warning will be printed in the gatekeeper log.
-f
Flag to indicate that the globus-gatekeeper process should run in the foreground. This flag has no effect when the globus-gatekeeper is started via inetd.
-p PORT, -port PORT
Listen for connections on the TCP/IP port PORT. This option has no effect if the globus-gatekeeper is started via inetd or a similar service. If not specified and the gatekeeper is running as root, the default of 754 is used. Otherwise, the gatekeeper defaults to an ephemeral port.
-home PATH
Sets the gatekeeper deployment directory to PATH. This is used to interpret relative paths for accounting files, libexecdir, certificate paths, and also to set the GLOBUS_LOCATION environment variable in the service environment. If not specified, the gatekeeper uses its working directory.
-l LOGFILE, -logfile LOGFILE
Write status log entries to LOGFILE
-acctfile ACCTFILE
Set the path to write accounting records to ACCTFILE. If not set, no accounting records will be written.
-e LIBEXECDIR
Look for service executables in LIBEXECDIR. If not specified, the default of HOME/libexec is used.
-launch_method fork_and_exit|fork_and_wait|dont_fork
Determine how to launch services. The method may be either fork_and_exit (the service runs completely independently of the gatekeeper, which exits after creating the new service process), fork_and_wait (the service is run in a separate process from the gatekeeper but the gatekeeper does not exit until the service terminates), or dont_fork, where the gatekeeper process becomes the service process via the exec() system call.
-grid_services SERVICEDIR
Look for service descriptions in SERVICEDIR. If this is a relative path, it is interpreted relative to the HOME value. If this is not specified, the default of HOME/etc/grid-services is used.
-globusid GLOBUSID
Sets the GLOBUSID environment variable to GLOBUSID. This variable is used to construct the gatekeeper contact string if it can not be parsed from the service credential.
-gridmap GRIDMAP
Use the file at GRIDMAP to map GSSAPI names to POSIX user names. If not specified, the default of HOME/etc/grid-mapfile is used.
-x509_cert_dir TRUSTED_CERT_DIR
Use the directory TRUSTED_CERT_DIR to locate trusted CA X.509 certificates. The gatekeeper sets the environment variable X509_CERT_DIR to this value.
-x509_cert_file TRUSTED_CERT_FILE
OBSOLETE GSI OPTION
-x509_user_cert CERT_PATH
Read the service X.509 certificate from CERT_PATH. The gatekeeper sets the X509_USER_CERT environment variable to this value.
-x509_user_key KEY_PATH
Read the private key for the service from KEY_PATH. The gatekeeper sets the X509_USER_KEY environment variable to this value.
-x509_user_proxy PROXY_PATH
Read the X.509 proxy certificate from PROXY_PATH. The gatekeeper sets the X509_USER_PROXY environment variable to this value.
-k
Assume authentication with Kerberos 5 GSSAPI instead of X.509 GSSAPI.
-globuskmap KMAP
Assume authentication with Kerberos 5 GSSAPI instead of X.509 GSSAPI and use KMAP as the path to the kerberos principal to POSIX user mapping file.

ENVIRONMENT

If the following variables affect the execution of globus-gatekeeper

X509_CERT_DIR
Directory containing X.509 trust anchors and signing policy files.
X509_USER_PROXY
Path to file containing an X.509 proxy.
X509_USER_CERT
Path to file containing an X.509 user certificate.
X509_USER_KEY
Path to file containing an X.509 user key.

Files

$GLOBUS_LOCATION/etc/globus-gatekeeper.conf
Default path to gatekeeper configuration file.
$GLOBUS_LOCATION/etc/grid-services/SERVICENAME
Service configuration for SERVICENAME.

See also

globusrun(1), globus-job-manager(8)

Name

globus-job-manager — Execute and monitor jobs

Synopsis

globus-job-manager {-type LRM} [-conf CONFIG_PATH] [-help] [-globus-host-manufacturer MANUFACTURER] [-globus-host-cputype CPUTYPE] [-globus-host-osname OSNAME] [-globus-host-osversion OSVERSION] [-globus-gatekeeper-host HOST] [-globus-gatekeeper-port PORT] [-globus-gatekeeper-subject SUBJECT] [-home GLOBUS_LOCATION] [-target-globus-location TARGET_GLOBUS_LOCATION] [-condor-arch ARCH] [-condor-os OS] [-history HISTORY_DIRECTORY] [-scratch-dir-base SCRATCH_DIRECTORY] [-enable-syslog] [-stdio-log LOG_DIRECTORY] [-log-levels LEVELS] [-state-file-dir STATE_DIRECTORY] [-globus-tcp-port-range PORT_RANGE] [-x509-cert-dir TRUSTED_CERTIFICATE_DIRECTORY] [-cache-location GASS_CACHE_DIRECTORY] [-k] [-extra-envvars VAR=VAL,...] [-seg-module SEG_MODULE] [-audit-directory AUDIT_DIRECTORY] [-globus-toolkit-version TOOLKIT_VERSION] [-disable-streaming] [-disable-usagestats] [-usagestats-targets TARGET] [-service-tag SERVICE_TAG]

Description

The globus-job-manager program is a servivce which starts and controls GRAM jobs which are executed by a local resource management system, such as LSF or Condor. The globus-job-manager program is typically started by the globus-gatekeeper program and not directly by a user. It runs until all jobs it is managing have terminated or its delegated credentials have expired.

Typically, users interact with the globus-job-manager program via client applications such as globusrun, globus-job-submit, or tools such as CoG jglobus or Condor-G.

The full set of command-line options to globus-job-manager consists of:

-help
Display a help message to standard error and exit
-type LRM
Execute jobs using the local resource manager named LRM.
-conf CONFIG_PATH
Read additional command-line arguments from the file CONFIG_PATH. If present, this must be the first command-line argument to the globus-job-manager program.
-globus-host-manufacturer MANUFACTURER
Indicate the manufacturer of the system which the jobs will execute on. This parameter sets the value of the $(GLOBUS_HOST_MANUFACTURER) RSL substitution to MANUFACTURER
-globus-host-cputype CPUTYPE
Indicate the CPU type of the system which the jobs will execute on. This parameter sets the value of the $(GLOBUS_HOST_CPUTYPE) RSL substitution to CPUTYPE
-globus-host-osname OSNAME
Indicate the operating system type of the system which the jobs will execute on. This parameter sets the value of the $(GLOBUS_HOST_OSNAME) RSL substitution to OSNAME
-globus-host-osversion OSVERSION
Indicate the operating system version of the system which the jobs will execute on. This parameter sets the value of the $(GLOBUS_HOST_OSVERSION) RSL substitution to OSVERSION
-globus-gatekeeper-host HOST
Indicate the host name of the machine which the job was submitted to. This parameter sets the value of the $(GLOBUS_GATEKEEPER_HOST) RSL substitution to HOST
-globus-gatekeeper-port PORT
Indicate the TCP port number of gatekeeper to which jobs are submitted to. This parameter sets the value of the $(GLOBUS_GATEKEEPER_PORT) RSL substitution to PORT
-globus-gatekeeper-subject SUBJECT
Indicate the X.509 identity of the gatekeeper to which jobs are submitted to. This parameter sets the value of the $(GLOBUS_GATEKEEPER_SUBJECT) RSL substitution to SUBJECT
-home GLOBUS_LOCATION
Indicate the path where the Globus Toolkit(r) is installed on the service node. This is used by the job manager to locate its support and configuration files.
-target-globus-location TARGET_GLOBUS_LOCATION
Indicate the path where the Globus Toolkit(r) is installed on the execution host. If this is omitted, the value specified as a parameter to -home is used. This parameter sets the value of the $(GLOBUS_LOCATION) RSL substitution to TARGET_GLOBUS_LOCATION
-history HISTORY_DIRECTORY
Configure the job manager to write job history files to HISTORY_DIRECTORY. These files are described in the FILES section below.
-scratch-dir-base SCRATCH_DIRECTORY
Configure the job manager to use SCRATCH_DIRECTORY as the default scratch directory root if a relative path is specified in the job RSL's scratch_dir attribute.
-enable-syslog
Configure the job manager to write log messages via syslog. Logging is further controlled by the argument to the -log-levels parameter described below.
-stdio-log LOG_DIRECTORY
Configure the job manager to write log messages to files in the LOG_DIRECTORY directory. Files will be named LOG_DIRECTORY/gram_YYYYMMDD.log. Logging is further controlled by the argument to the -log-levels parameter described below. The LOG_DIRECTORY value can include variables derived from the job manager environment using the same syntax as RSL substitutions. For example, -stdio-log $(HOME) would cause each user's logs to be stored in their individual home directories.
-log-levels LEVELS
Configure the job manager to write log messages of certain levels to syslog and/or log files. The available log levels are FATAL, ERROR, WARN, INFO, DEBUG, and TRACE. Multiple values can be combined with the | character. The default value of logging when enabled is FATAL|ERROR.
-state-file-dir STATE_DIRECTORY
Configure the job manager to write state files to STATE_DIRECTORY. If not specified, the job manager uses the default of $GLOBUS_LOCATION/tmp/gram_job_state/. This directory must be writable by all users and be on a file system which supports POSIX advisory file locks.
-globus-tcp-port-range PORT_RANGE
Configure the job manager to restrict its TCP/IP communication to use ports in the range described by PORT_RANGE. This value is also made available in the job environment via the GLOBUS_TCP_PORT_RANGE environment variable.
-x509-cert-dir TRUSTED_CERTIFICATE_DIRECTORY
Configure the job manager to search TRUSTED_CERTIFICATE_DIRECTORY for its list of trusted CA certificates and their signing policies. This value is also made available in the job environment via the X509_CERT_DIR environment variable.
-cache-location GASS_CACHE_DIRECTORY
Configure the job manager to use the path GASS_CACHE_DIRECTORY for its temporary GASS-cache files. This value is also made available in the job environment via the GLOBUS_GASS_CACHE_DEFAULT environment variable.
-k
Configure the job manager to assume it is using Kerberos for authentication instead of X.509 certificates. This disables some certificate-specific processing in the job manager.
-extra-envvars VAR=VAL,...
Configure the job manager to define a set of environment variables in the job environment beyond those defined in the base job environment. The format of the parameter to this argument is a comma-separated sequence of VAR=VAL pairs, where VAR is the variable name and VAL is the variables value.
-seg-module SEG_MODULE
Configure the job manager to use the schedule event generator module named by SEG_MODULE to detect job state changes events from the local resource manager, in place of the less efficient polling operations used in GT2. To use this, one instance of the globus-job-manager-event-generator must be running to process events for the LRM into a generic format that the job manager can parse.
-audit-directory AUDIT_DIRECTORY
Configure the job manager to write audit records to the directory named by AUDIT_DIRECTORY. This records can be loaded into a database using the globus-gram-audit program.
-globus-toolkit-version TOOLKIT_VERSION
Configure the job manager to use TOOLKIT_VERSION as the version for audit and usage stats records.
-service-tag SERVICE_TAG
Configure the job manager to use SERVICE_TAG as a unique identifier to allow multiple GRAM instances to use the same job state directories without interfering with each other's jobs. If not set, the value untagged will be used.
-disable-streaming
Configure the job manager to disable file streaming. This is propagated to the LRM script interface but has no effect in GRAM5.
-disable-usagestats
Disable sending of any usage stats data, even if -usagestats-targets is present in the configuration.
-usagestats-targets TARGET
Send usage packets to a data collection service for analysis. The TARGET string consists of a comma-separated list of HOST:PORT combinations, each contaiing an optional list of data to send. See Usage Stats Packets for more information about the tags. Special tag strings of all (which enables all tags) and default may be used, or a sequence of characters for the various tags. If this option is not present in the configuration, then the default of usage-stats.globus.org:4810 is used.
-condor-arch ARCH
Set the architecture specification for condor jobs to be ARCH in job classified ads generated by the GRAM5 codnor LRM script. This is required for the condor LRM but ignored for all others.
-condor-os OS
Set the operating system specification for condor jobs to be OS in job classified ads generated by the GRAM5 codnor LRM script. This is required for the condor LRM but ignored for all others.

Environment

If the following variables affect the execution of globus-job-manager

HOME
User's home directory.
LOGNAME
User's name.
JOBMANAGER_SYSLOG_ID
String to prepend to syslog audit messages.
JOBMANAGER_SYSLOG_FAC
Facility to log syslog audit messages as.
JOBMANAGER_SYSLOG_LVL
Priority level to use for syslog audit messages.
GATEKEEPER_JM_ID
Job manager ID to be used in syslog audit records.
GATEKEEPER_PEER
Peer information to be used in syslog audit records
GLOBUS_ID
Credential information to be used in syslog audit records
GLOBUS_JOB_MANAGER_SLEEP
Time (in seconds) to sleep when the job manager is started. [For debugging purposes only]
GRID_SECURITY_HTTP_BODY_FD
File descriptor of an open file which contains the initial job request and to which the initial job reply should be sent. This file descriptor is inherited from the globus-gatekeeper.
X509_USER_PROXY
Path to the X.509 user proxy which was delegated by the client to the globus-gatekeeper program to be used by the job manager.
GRID_SECURITY_CONTEXT_FD
File descriptor containing an exported security context that the job manager should use to reply to the client which submitted the job.
GLOBUS_USAGE_TARGETS
Default list of usagestats services to send usage packets to.

Files

$HOME/.globus/job/HOSTNAME/LRM.TAG.red
Job manager delegated user credential.
$HOME/.globus/job/HOSTNAME/LRM.TAG.lock
Job manager state lock file.
$HOME/.globus/job/HOSTNAME/LRM.TAG.pid
Job manager pid file.
$HOME/.globus/job/HOSTNAME/LRM.TAG.sock
Job manager socket for inter-job manager communications.
$HOME/.globus/job/HOSTNAME/JOB_ID/
Job-specific state directory.
$HOME/.globus/job/HOSTNAME/JOB_ID/stdin
Standard input which has been staged from a remote URL.
$HOME/.globus/job/HOSTNAME/JOB_ID/stdout
Standard output which will be staged from a remote URL.
$HOME/.globus/job/HOSTNAME/JOB_ID/stderr
Standard error which will be staged from a remote URL.
$HOME/.globus/job/HOSTNAME/JOB_ID/x509_user_proxy
Job-specific delegated credential.
$GLOBUS_LOCATION/tmp/gram_job_state/job.HOSTNAME.JOB_ID
Job state file.
$GLOBUS_LOCATION/tmp/gram_job_state/job.HOSTNAME.JOB_ID.lock
Job state lock file. In most cases this will be a symlink to the job manager lock file.
$GLOBUS_LOCATION/etc/globus-job-manager.conf
Default location of the global job manager configuration file.
$GLOBUS_LOCATION/etc/grid-services/jobmanager-LRM
Default location of the LRM-specific gatekeeper configuration file.

See Also

globusrun(1), globus-gatekeeper(8), globus-personal-gatekeeper(1), globus-gram-audit(8)

Name

globus-job-manager-event-generator — Create LRM-independent SEG files for the job manager to use

Synopsis

globus-job-manager-event-generator [-help] {-scheduler LRM} [-background] [-pidfile PIDPATH]

Description

The globus-job-manager-event-generator program is a utility which uses LRM-specific SEG parsers to generate a LRM-independent log file that a job manager instance can use to process job status change events. This program runs independently of all globus-job-manager instances so that only one process needs to deal with the LRM interface. The globus-job-manager-event-generator program can be run as a privileged user if required to interface with the LRM.

The full set of command-line options to globus-job-manager-event-generator consists of:

-help
Print command-line option summary and exit.
-scheduler LRM
Process events for the local resource manager named by LRM.
-background
Run globus-job-manager-event-generator as a background process. It will fork a new process, print out its process ID and then the original process will terminate.
-pidfile PIDPATH
Write the process ID of an instance of globus-job-manager-event-generator to the file named by PIDPATH. This file can be used to kill or monitor the globus-job-manager-event-generator process.

Files

globus-job-manager-seg.conf
Configuration file for globus-job-manager-event-generator. Each line consists of a string of the form LRM_log_path=PATH, which indicates the directory containing LRM-independent format SEG log files for the LRM. This file is created by the running the globus_scheduler_event_generator_job_manager_setup setup package.

See Also

globus-scheduler-event-generator(8), globus-job-manager(8)

Name

globus-fork-starter — Start and monitor a fork job

Synopsis

globus-fork-starter

Description

The globus-fork-starter program is executes jobs specified on its standard input stream, recording the job state changes to a file defined in the $GLOBUS_LOCATION/etc/globus-fork.conf configuration file. It runs until its standard input stream is closed and all jobs it is managing have terminated. The log generated by this program can be used by the SEG to provide job state changes and exit codes to the GRAM service. The globus-fork-starter program is typically started by the fork GRAM module.

The globus-fork-starter program expects its input to be a series of task definitions, separated by the newline character, each representing a separate job. Each task definition contains a number of fields, separated by the colon character. The first field is always the literal string 100 indicating the message format, the second field is a unique job tag that will be distinguish the reply from this program when multiple jobs are submitted. The rest of fields contain attribute bindings. The supported attributes are:

directory
Working directory of the job
environment
Comma-separated list of strings defining environment variables. The form of these strings is var=value
count
Number of processes to start
executable
Full path to the executable to run
arguments
Comma-separated list of command-line arguments for the job
stdin
Full path to a file containing the input of the job
stdout
Full path to a file to write the output of the job to
stderr
Full path to a file to write the error stream of the job

Within each field, the following characters may be escaped by preceding them with the backslash character:

  • backslash (\)
  • semicolor (;)
  • comma (,)
  • equal (=)

Additionally, newline can be represented within a field by using the escape sequence \n.

For each job the globus-fork-starter processes, it replies by writing a single line to standard output. The replies again consist of a number of fields separated by the semicolon character.

For a successful job start, the first field of the reply is the literal 101, the second field is the tag from the input, and the third field is a comma-separated list of SEG job identifiers which consist the concatenation of a UUID and a process id. The globus-fork-starter program will write state changes to the SEG log using these job identifiers.

For a failure, the first field of the reply is the literal 102, the second field is the tag from the input, the third field is the integer representation of a GRAM erorr code, and the fourth field is an string explaining the error.

ENVIRONMENT

If the following variables affect the execution of globus-fork-starter

GLOBUS_LOCATION
Path to Globus Toolkit installation. This is used to locate the globus-fork.conf configuration file.

Files

$GLOBUS_LOCATION/etc/globus-fork.conf
Path to fork SEG configuration file.

Chapter 2. Troubleshooting

For a list of error codes generated by GRAM5, see Section 2, “Errors”.

For information about sys admin logging, see Chapter 9, Admin Debugging in the GRAM5 Admin Guide.

1. Troubleshooting tips

In case you run into problems you can do the following

  • Check the GRAM5 documentation. Maybe you'll find hints here to solve your problem.
  • Check the GRAM5 log for errors.

    In case you don't find anything suspicious you can increase the log-level of GRAM5 or other relevant components. Maybe the additional logging-information will tell you what's going wrong.

  • Send e-mails to . You'll have to subscribe to a list before you can send an e-mail to it. See here for general e-mail lists and information on how to subscribe to a list and here for GRAM specific lists.

2. Errors

Table 2.1. GRAM5 Errors

Error CodeReasonPossible Solutions
1one of the RSL parameters is not supportedCheck RSL documentation
2the RSL length is greater than the maximum allowedUse RSL substitutions to reduce length of RSL strings
3an I/O operation failedEnable trace logging and report to gram-dev@globus.org
4jobmanager unable to set default to the directory requestedCheck that RSL directory attribute refers to a directory that exists on the target system.
5the executable does not existCheck that the RSL executable attribute refers to an executable that exists on the target system.
6of an unused INSUFFICIENT_FUNDSUnimplemented feature.
7authentication with the remote server failedCheck that the contact string contains the proper X.509 DN.
8the user cancelled the jobDon't cancel jobs you want to complete.
9the system cancelled the jobCheck RSL requirements such as maximum time and memory are valid for the job.
10data transfer to the server failedCheck gatekeeper and/or job manager logs to see why the process failed.
11the stdin file does not existCheck that the RSL stdin attribute refers to a file that exists on the target system or has a valid ftp, gsiftp, http, or https URL.
12the connection to the server failed (check host and port)Check that the service is running on the expected TCP/IP port. Check that no firewall prevents contacting that TCP/IP port. Check $GLOBUS_LOCATION/var/globus-gatekeeper.log for runtme configuration errors.
13the provided RSL 'maxtime' value is not an integerCheck that the RSL maxtime value evaluates to an integer.
14the provided RSL 'count' value is not an integerCheck that the RSL count value evaluates to an integer.
15the job manager received an invalid RSLCheck that the RSL string can be parsed by using globusrun -p RSL.
16the job manager failed in allowing others to make contactCheck job manager log.
17the job failed when the job manager attempted to run itVerify that the LRM is configured properly.
18an invalid paradyn was specifiedOBSOLETE IN GRAM2
19the provided RSL 'jobtype' value is invalidThe RSL jobtype attribute is not indicated as supported by the LRM. Valid jobtype values are single, multiple, mpi, and condor.
20the provided RSL 'myjob' value is invalidOBSOLETE IN GRAM5
21the job manager failed to locate an internal script argument fileCheck that $GLOBUS_LOCATION/libexec/globus-job-manager-script.pl exists and is executable. Check that the LRM-specific perl module is located in $GLOBUS_LOCATION/lib/perl/Globus/GRAM/JobManager/ directory and is valid. The command perl -I$GLOBUS_LOCATION/lib/perl $GLOBUS_LOCATION/lib/perl/Globus/GRAM/JobManager/LRM.pm can be used to check if there are any syntax errors in the script.
22the job manager failed to create an internal script argument fileCheck that your home directory is writable and not full.
23the job manager detected an invalid job stateCheck job manager logs.
24the job manager detected an invalid script responseCheck job manager logs. This is likely a bug in the LRM script.
25the job manager detected an invalid script statusCheck job manager logs. This is likely a bug in the LRM script.
26the provided RSL 'jobtype' value is not supported by this job managerCheck that the RSL jobtype attribute is implemented by the LRM script. Note that some job types require configuration
27unused ERROR_UNIMPLEMENTEDLRM does not support some feature included in the job request.
28the job manager failed to create an internal script submission fileCheck that the user's home file system is not full. Check job manager log
29the job manager cannot find the user proxyCheck that client is delegating a proxy when authenticating with the gatekeeper. Check that the user's home filesystem and the /tmp file system are not full.
30the job manager failed to open the user proxyCheck that the user's home filesystem and the /tmp file system are not full.
31the job manager failed to cancel the job as requestedCheck that the user's home filesystem and the /tmp file system are not full.
32system memory allocation failedCheck job manager log for details.
33the interprocess job communication initialization failedOBSOLETE IN GRAM5
34the interprocess job communication setup failedOBSOLETE IN GRAM5
35the provided RSL 'host count' value is invalidCheck that the RSL host_count attribute evaluates to an integer.
36one of the provided RSL parameters is unsupportedCheck job manager log for details about invalid parameter.
37the provided RSL 'queue' parameter is invalidCheck that the RSL queue attribute evaluates to a string that corresponds to an LRM-specific queue name.
38the provided RSL 'project' parameter is invalidCheck that the RSL project attribute evaluates to a string that corresponds to an LRM-specific project name.
39the provided RSL string includes variables that could not be identifiedCheck that all RSL substitutions are defined before being used in the job description.
40the provided RSL 'environment' parameter is invalidCheck that the RSL environment attribute contains a sequence of VARIABLE VALUE pairs.
41the provided RSL 'dryrun' parameter is invalidRemove the RSL dryrun attribute from the job description.
42the provided RSL is invalid (an empty string)Include a non-empty RSL string in your job submission request.
43the job manager failed to stage the executableCheck that the file service hosting the executable is reachable from the GRAM5 service node. Check that the executable exists on the file service node. Check that there is sufficient disk space in the user's home directory on the service node to store the executable.
44the job manager failed to stage the stdin fileCheck that the file service hosting the standard input file is reachable from the GRAM5 service node. Check that the standard input file exists on the file service node. Check that there is sufficient disk space in the user's home directory on the service node to store the standard input file.
45the requested job manager type is invalidOBSOLETE IN GRAM5
46the provided RSL 'arguments' parameter is invalidOBSOLETE IN GRAM2
47the gatekeeper failed to run the job managerCheck the gatekeeper or job manager logs for more information.
48the provided RSL could not be properly parsedCheck that the RSL string can be parsed by using globusrun -p RSL.
49there is a version mismatch between GRAM componentsAsk system administrator to upgrade GRAM service to GRAM2 or GRAM5
50the provided RSL 'arguments' parameter is invalidCheck that the RSL arguments attribute evaluates to a sequence of strings.
51the provided RSL 'count' parameter is invalidCheck that the RSL count attribute evaluates to a positive integer value.
52the provided RSL 'directory' parameter is invalidCheck that the RSL directory attribute evaluates to a string.
53the provided RSL 'dryrun' parameter is invalidCheck that the RSL dryrun attribute evaluates to either yes or no.
54the provided RSL 'environment' parameter is invalidCheck that the RSL environment attribute evaluates to a sequence of VARIABLE, VALUE pairs.
55the provided RSL 'executable' parameter is invalidCheck that the RSL executable attribute evaluates to a string value.
56the provided RSL 'host_count' parameter is invalidCheck that the RSL host_count attribute evaluates to a positive integer value.
57the provided RSL 'jobtype' parameter is invalidCheck that the RSL jobtype attribute evaluates to one of single, multiple, mpi, or condor
58the provided RSL 'maxtime' parameter is invalidCheck that the RSL maxtime attribute evaluates to a positive integer value.
59the provided RSL 'myjob' parameter is invalidOBSOLETE IN GRAM5.
60the provided RSL 'paradyn' parameter is invalidOBSOLETE IN GRAM2.
61the provided RSL 'project' parameter is invalidCheck that the RSL project attribute evaluates to a string value.
62the provided RSL 'queue' parameter is invalidCheck that the RSL queue attribute evaluates to a string value.
63the provided RSL 'stderr' parameter is invalidCheck that the RSL stderr attribute evaluates to a string value or a sequence of DESTINATION URLs with optional CACHE_TAG string parameters.
64the provided RSL 'stdin' parameter is invalidCheck that the RSL stdin attribute evaluates to a string value.
65the provided RSL 'stdout' parameter is invalidCheck that the RSL stdout attribute evaluates to a string value or a sequence of DESTINATION URLs with optional CACHE_TAG string parameters.
66the job manager failed to locate an internal scriptCheck job manager log for more details.
67the job manager failed on the system call pipe()OBSOLETE IN GRAM5
68the job manager failed on the system call fcntl()OBSOLETE IN GRAM2
69the job manager failed to create the temporary stdout filenameOBSOLETE IN GRAM5
70the job manager failed to create the temporary stderr filenameOBSOLETE IN GRAM5
71the job manager failed on the system call fork()OBSOLETE IN GRAM2
72the executable file permissions do not allow executionCheck that the RSL executable attribute refers to an executable program or script.
73the job manager failed to open stdoutCheck that the RSL stdout attribute refers to one or more valid destination files or URLs.
74the job manager failed to open stderrCheck that the RSL stderr attribute refers to one or more valid destination files or URLs.
75the cache file could not be opened in order to relocate the user proxyCheck that the user's home directory is writable and not full on the GRAM5 service node.
76cannot access cache files in ~/.globus/.gass_cache, check permissions, quota, and disk spaceCheck that the user's home directory is writable and not full on the GRAM5 service node.
77the job manager failed to insert the contact in the client contact listCheck job manager log
78the contact was not found in the job manager's client contact listDon't attempt to unregister callback contacts that are not registered
79connecting to the job manager failed. Possible reasons: job terminated, invalid job contact, network problems, ...Check that the job manager process is running. Check that the job manager credential has not expired. Check that the job manager contact refers to the correct TCP/IP host and port. Check that the job manager contact is not blocked by a firewall.
80the syntax of the job contact is invalidCheck the syntax of job contact string.
81the executable parameter in the RSL is undefinedInclude the RSL executable in all job requests.
82the job manager service is misconfigured. condor arch undefinedAdd the -condor-arch to the command-line or configuration file for a job manager configured to use the condor LRM.
83the job manager service is misconfigured. condor os undefinedAdd the -condor-os to the command-line or configuration file for a job manager configured to use the condor LRM.
84the provided RSL 'min_memory' parameter is invalidCheck that the RSL min_memory attribute evaluates to a positive integer value.
85the provided RSL 'max_memory' parameter is invalidCheck that the RSL max_memory attribute evaluates to a positive integer value.
86the RSL 'min_memory' value is not zero or greaterCheck that the RSL min_memory attribute evaluates to a positive integer value.
87the RSL 'max_memory' value is not zero or greaterCheck that the RSL max_memory attribute evaluates to a positive integer value.
88the creation of a HTTP message failedCheck job manager log.
89parsing incoming HTTP message failedCheck job manager log.
90the packing of information into a HTTP message failedCheck job manager log.
91an incoming HTTP message did not contain the expected informationCheck job manager log.
92the job manager does not support the service that the client requestedCheck that the client is talking to the correct servce
93the gatekeeper failed to find the requested serviceOBSOLETE IN GRAM2
94the jobmanager does not accept any new requests (shutting down)Execute queries before the job has been cleaned up.
95the client failed to close the listener associated with the callback URLCall globus_gram_client_callback_disallow() with a valid the callback contact.
96the gatekeeper contact cannot be parsedCheck the syntax of the gatekeeper contact string you are attempting to contact.
97the job manager could not find the 'poe' commandOBSOLETE IN GRAM2
98the job manager could not find the 'mpirun' commandConfigure the LRM script with mpirun in your path.
99the provided RSL 'start_time' parameter is invalidOBSOLETE IN GRAM2
100the provided RSL 'reservation_handle' parameter is invalidOBSOLETE IN GRAM2
101the provided RSL 'max_wall_time' parameter is invalidCheck that the RSL max_wall_time attribute evaluates to a positive integer.
102the RSL 'max_wall_time' value is not zero or greaterCheck that the RSL max_wall_time attribute evaluates to a positive integer.
103the provided RSL 'max_cpu_time' parameter is invalidCheck that the RSL max_cpu_time attribute evaluates to a positive integer.
104the RSL 'max_cpu_time' value is not zero or greaterCheck that the RSL max_cpu_time attribute evaluates to a positive integer.
105the job manager is misconfigured, a scheduler script is missingCheck that the adminstrator has configured the LRM by running its setup script.
106the job manager is misconfigured, a scheduler script has invalid permissionsCheck that the adminstrator has installed the GLLOBUS_LOCATION/libexec/globus-job-manager-script.pl script. Check that the file system containing that script allows file execution.
107the job manager failed to signal the jobOBSOLETE IN GRAM2
108the job manager did not recognize/support the signal typeCheck that your signal operation is using the correct signal constant.
109the job manager failed to get the job id from the local schedulerOBSOLETE IN GRAM2
110the job manager is waiting for a commit signalSend a two-phase commit signal to the job manager to acknowledge receiving the job contact from the job manager.
111the job manager timed out while waiting for a commit signalSend a two-phase commit signal to the job manager to acknowledge receiving the job contact from the job manager. Increase the two-phase commit time out for your job. Check that the job manager contact TCP/IP port is reachable from your client.
112the provided RSL 'save_state' parameter is invalidCheck that the RSL save_state attribute is set to yes or no.
113the provided RSL 'restart' parameter is invalidCheck that the RSL restart attribute evaluates to a string containing a job contact string.
114the provided RSL 'two_phase' parameter is invalidCheck that the RSL two_phase attribute evaluates to a positive integer.
115the RSL 'two_phase' value is not zero or greaterCheck that the RSL two_phase attribute evaluates to a positive integer.
116the provided RSL 'stdout_position' parameter is invalidOBSOLETE IN GRAM5
117the RSL 'stdout_position' value is not zero or greaterOBSOLETE IN GRAM5
118the provided RSL 'stderr_position' parameter is invalidOBSOLETE IN GRAM5
119the RSL 'stderr_position' value is not zero or greaterOBSOLETE IN GRAM5
120the job manager restart attempt failedOBSOLETE IN GRAM2
121the job state file doesn't existCheck that the job contact you are trying to restart matches one that the job manager returned to you.
122could not read the job state fileCheck that the state file directory is not full.
123could not write the job state fileCheck that the state file directory is not full.
124old job manager is still aliveContact the returned job manager contact to manage the job you are trying to restart.
125job manager state file TTL expiredOBSOLETE in GRAM2
126it is unknown if the job was submittedCheck job manager log.
127the provided RSL 'remote_io_url' parameter is invalidCheck that the RSL remote_io_url attribute evaluates to a string value.
128could not write the remote io url fileCheck that the user's home file system on the job manager service node is writable and not full.
129the standard output/error size is differentSend a stdio update signal to redirect the job manager output to a new URL
130the job manager was sent a stop signal (job is still running)Submit a restart request to monitor the job.
131the user proxy expired (job is still running)Generate a new proxy and then submit a restart request to monitor the job.
132the job was not submitted by original jobmanagerOBSOLETE IN GRAM2
133the job manager is not waiting for that commit signalDo not send a commit signal to a job that is not waiting for a commit signal.
134the provided RSL scheduler specific parameter is invalidCheck the LRM-specific documentation to determine what values are legal for the RSL extensions implemented by the LRM.
135the job manager could not stage in a fileCheck that the file service hosting the file to stage is reachable from the GRAM5 service node. Check that the file to stage exists on the file service node. Check that there is sufficient disk space in the user's home directory on the service node to store the file to stage.
136the scratch directory could not be createdCheck that the directory named by the RSL scratch_dir attribute exists and is writable. Check that the directory named by the RSL scratch_dir attribute is not full.
137the provided 'gass_cache' parameter is invalidCheck that the RSL gass_cache attribute evaluates to a string.
138the RSL contains attributes which are not valid for job submissionDo not use restart- or signal-only RSL attributes when submitting a job.
139the RSL contains attributes which are not valid for stdio updateDo not use submit- or restart-only RSL attributes when sending a stdio update signal to a job.
140the RSL contains attributes which are not valid for job restartDo not use submit- or signal-only RSL attributes when restarting a job.
141the provided RSL 'file_stage_in' parameter is invalidCheck that the RSL file_stage_in attribute evaluates to a sequence of SOURCE DESTINATION pairs.
142the provided RSL 'file_stage_in_shared' parameter is invalidCheck that the RSL file_stage_in_shared attribute evaluates to a sequence of SOURCE DESTINATION pairs.
143the provided RSL 'file_stage_out' parameter is invalidCheck that the RSL file_stage_out attribute evaluates to a sequence of SOURCE DESTINATION pairs.
144the provided RSL 'gass_cache' parameter is invalidCheck that the RSL gass_cache attribute evaluates to a string.
145the provided RSL 'file_cleanup' parameter is invalidCheck that the RSL file_clean_up attribute evaluates to a sequence of strings.
146the provided RSL 'scratch_dir' parameter is invalidCheck that the RSL scratch_dir attribute evaluates to a string.
147the provided scheduler-specific RSL parameter is invalidCheck the LRM-specific documentation to determine what values are legal for the RSL extensions implemented by the LRM.
148a required RSL attribute was not defined in the RSL specCheck that the RSL executable attribute is present in your job request RSL. Check that the RSL restart attributes is present in your restart RSL.
149the gass_cache attribute points to an invalid cache directoryCheck that the RSL gass_cache attributes evaluates to a directory that exists or can be created. Check that the user's home file system is writable and not full.
150the provided RSL 'save_state' parameter has an invalid valueCheck that the RSL save_state attribute has a value of yes or no.
151the job manager could not open the RSL attribute validation fileCheck that $GLOBUS_LOCATION/share/globus_gram_job_manager/globus-gram-job-manager.rvf is present and readable on the job manager service node. Check that $GLOBUS_LOCATION/share/globus_gram_job_manager/LRM.rvf is readable on the job manager service node if present.
152the job manager could not read the RSL attribute validation fileCheck that $GLOBUS_LOCATION/share/globus_gram_job_manager/globus-gram-job-manager.rvf is valid. Check that $GLOBUS_LOCATION/share/globus_gram_job_manager/LRM.rvf is valid if present.
153the provided RSL 'proxy_timeout' is invalidCheck that RSL proxy_timeout attribute evaluates to a positive integer.
154the RSL 'proxy_timeout' value is not greater than zeroCheck that RSL proxy_timeout attribute evaluates to a positive integer.
155the job manager could not stage out a fileCheck that the source file being staged exists on the job manager service node. Check that the directory of the destination file being staged exists on the file service node. Check that the directory of the destination file being staged is writable by the user. Check that the destination file service is reachable by the job manager service node.
156the job contact string does not match any which the job manager is handlingCheck that the job contact string matches one returned from a job request.
157proxy delegation failedCheck that the job manager service node trusts the signer of your credential. Check that you trust the signer of the job manager service node's credential.
158the job manager could not lock the state lock fileCheck that the file system holding the job state directory supports POSIX advisory locking. Check that the job state directory is writable by the user on the service node. Check that the job state directory is not full.
159an invalid globus_io_clientattr_t was used.Check that you have initialized the globus_io_clientattr_t attribute prior to using it with the GRAM client API.
160an null parameter was passed to the gram libraryCheck that you are passing legal values to all GRAM API calls.
161the job manager is still streaming outputOBSOLETE IN GRAM5
162the authorization system denied the requestCheck with your GRAM system administrator to allow a particular certificate to be authorized.
163the authorization system reported a failureCheck with your system administrator to verify that the authorization system is configured properly.
164the authorization system denied the request - invalid job idCheck with your system administrator to verify that the authorization system is configured properly. Use a credential which is authorized to interact with a particular GRAM job.
165the authorization system denied the request - not authorized to run the specified executableCheck with your system administrator to verify that the authorization system is configured properly. Use a credential which is authorized to interact with a particular GRAM job.
166the provided RSL 'user_name' parameter is invalid.Check that the RSL user_name attribute evaluates to a string.
167the job is not running in the account named by the 'user_name' parameter.Ask with the GRAM system administrator to add an authorization entry to allow your credential to run jobs as the specified user account.

Chapter 3. Known Problems in GRAM5

1. Known Problems

The following problems and limitations are known to exist for GRAM5 at the time of the 5.0.2 release:

1.1. Limitations

  • None at this time.

1.2. Outstanding bugs

  • GRAM-2: Investigate how to setup GRAM5 services in a HA setup
  • GRAM-4: Add support for a "managed fork" service
  • GRAM-5: Add gram-level prologue and epilogue script execution for mpi jobs
  • GRAM-12: Gatekeeper's syslog output cannot be controlled
  • GRAM-15: transition from httpg to https
  • GRAM-22: client connections can't be timed out
  • GRAM-23: Improved error codes and error reporting for users
  • GRAM-24: Debug/verbose flags for globusrun, globus-job-run
  • GRAM-51: configurable control of number of perl scripts that can run simultaneously
  • GRAM-53: Generalize log path configuration
  • GRAM-79: Add support for OSG's "NFS Lite" concept
  • GRAM-99: Add a high-level diagram for the approach doc
  • GRAM-104: globus-job-manager-event-generator loads all historical events the first time run
  • GRAM-105: Held Condor jobs should be reported as SUSPENDED
  • GRAM-110: softenv extensions for GRAM5
  • GRAM-119: improve the GRAM LRM adapter doc
  • GRAM-122: tracking gram client software
  • GRAM-135: Improve developer doc for a reliable client
  • GRAM-138: GRAM5 job manager uses a lot of memory when SEG is pointed to incorrect log path
  • GRAM-139: SEG may deadlock with threads
  • GRAM-149: GRAM5 Unix domain socket misbehaves on Snow Leopard
  • GRAM-154: GASS Cache doesn't check for updates
  • GRAM-159: GRAM5 Migration guide is outdated
  • GRAM-163: improve error output for globusrun
  • Bug 5621: gram2 credential refresh problems in 4.0.5
  • Bug 1934: Gatekeeper's syslog output cannot be controlled
  • Bug 2739: Gatekeeper AuthZ/Gridmap Callout result logging
  • Bug 2741: catching SIGSEGV if dynamic loading of authorization modules fails
  • Bug 4199: Patch pre-WS GRAM to use individual condor logs for jobs
  • Bug 3795: jobmanager perl modules issues
  • Bug 4235: globus-job-manager doesn't exit if the job fails.
  • Bug 4730: MPI Jobs using Globus LSF in HP XC Cluster....
  • Bug 4747: Need evaluation of patch to JobManager.pm
  • Bug 4779: gram GT2 log files: timestamps are not ISO 8601 compatible
  • Bug 5143: DONE state never reported for Condor jobs when using Condor-G grid monitor
  • Bug 5429: stdin is lost when jobtype=multiple with jobmanager-lsf
  • Bug 5554: GRAM2 4.0.5 setup-globus-job-manager-fork.pl silent failure
  • Bug 5556: Audit directory setup instructions are insecure
  • Bug 5775: gram status of old jobs incorrect on some lsf systems
  • Bug 6184: pbs.pm jobmanager fails jobs on qstat failure
  • Bug 6337: Cannot configure globus to use different certificate path than default
  • Bug 6703: PBS scheduler adapter assumes that Globus is installed in the same location on the headnode of a cluster and on the work nodes.
  • Bug 6768: Held Condor jobs should be reported as SUSPENDED by GRAM
  • Bug 6815: Support standard install locations for globus-gram-protocol
  • Bug 6819: Missing metatdata in globus-scheduler-event-generator
  • Bug 6820: Support standard install locations for globus-gatekeeper
  • Bug 6821: Support standard install locations for globus-gatekeeper-setup
  • Bug 6822: Support standard install locations for globus-gram-job-manager-scripts
  • Bug 6823: Support standard install locations for globus-gram-job-manager
  • Bug 6824: Support standard install locations for globus-gram-job-manager-setup
  • Bug 6825: Remove hardcoded paths in globus-gram-job-manager-setup-fork
  • Bug 6826: Remove hardcoded paths in globus-gram-job-manager-setup-condor
  • Bug 6840: The PBS job manager doesn't handle large environments well
  • Bug 6855: Undefined variable in Makefiles
  • Bug 6862: PBS job manager fails if job history is enabled
  • Bug 6927: A Loadleveler LRM for GRAM5 should be very welcome
  • Bug 720: allow gram client to detect the version of a gram server
  • Bug 851: Add "cleanup" RSL attribute for cleaning up a job submission
  • Bug 5536: Missing dependency in package globus_gram_job_manager_auditing
  • Bug 5537: Missing dependency in package globus_gram_job_manager_auditing
  • Bug 3373: globus removes the temporary job directory before pbs writes back into it
  • Bug 5200: GRAM (pre-webservices) from OSG 0.6.0 (VDT 1.6.1) has bad syslog format
  • Bug 5207: GRAM SoftEnv extension bug
  • Bug 5250: Does not support mpi jobtype of RSL script
  • Bug 5272: Invalid parsing of RSL file

Chapter 4. Usage statistics collection by the Globus Alliance

1. GRAM5-specific usage statistics

The following usage statistics are sent by default in a UDP packet (in addition to the GRAM component code, packet version, timestamp, and source IP address) at the end of each job.

  • Job Manager Session ID
  • dryrun used
  • RSL Host Count
  • Timestamp when job hit GLOBUS_GRAM_PROTOCOL_JOB_STATE_UNSUBMITTED
  • Timestamp when job hit GLOBUS_GRAM_PROTOCOL_JOB_STATE_FILE_STAGE_IN
  • Timestamp when job hit GLOBUS_GRAM_PROTOCOL_JOB_STATE_PENDING
  • Timestamp when job hit GLOBUS_GRAM_PROTOCOL_JOB_STATE_ACTIVE
  • Timestamp when job hit GLOBUS_GRAM_PROTOCOL_JOB_STATE_FAILED
  • Timestamp when job hit GLOBUS_GRAM_PROTOCOL_JOB_STATE_FILE_STAGE_OUT
  • Timestamp when job hit GLOBUS_GRAM_PROTOCOL_JOB_STATE_DONE
  • Job Failure Code
  • Number of times status is called
  • Number of times register is called
  • Number of times signal is called
  • Number of times refresh is called
  • Number of files named in file_clean_up RSL
  • Number of files being staged in (including executable, stdin) from http servers
  • Number of files being staged in (including executable, stdin) from https servers
  • Number of files being staged in (including executable, stdin) from ftp servers
  • Number of files being staged in (including executable, stdin) from gsiftp servers
  • Number of files being staged into the GASS cache from http servers
  • Number of files being staged into the GASS cache from https servers
  • Number of files being staged into the GASS cache from ftp servers
  • Number of files being staged into the GASS cache from gsiftp servers
  • Number of files being staged out (including stdout and stderr) to http servers
  • Number of files being staged out (including stdout and stderr) to https servers
  • Number of files being staged out (including stdout and stderr) to ftp servers
  • Number of files being staged out (including stdout and stderr) to gsiftp servers
  • Bitmask of used RSL attributes (values are 2^id from the gram5_rsl_attributes table)
  • Number of times unregister is called
  • Value of the count RSL attribute
  • Comma-separated list of string names of other RSL attributes not in the set defined in globus-gram-job-manager.rvf
  • Job type string
  • Number of times the job was restarted
  • Total number of state callbacks sent to all clients for this job

The following information can be sent as well in a job status packet but it is not sent unless explicitly enabled by the system administrator:

  • Value of the executable RSL attribute
  • Value of the arguments RSL attribute
  • IP adddress and port of the client that submitted the job
  • User DN of the client that submitted the job

In addition to job-related status, the job manager sends information periodically about its execution status. The following information is sent by default in a UDP packet (in addition to the GRAM component code, packet version, timestamp, and source IP address) at job manager start and every 1 hour during the job manager lifetime:

  • Job Manager Start Time
  • Job Manager Session ID
  • Job Manager Status Time
  • Job Manager Version
  • LRM
  • Poll used
  • Audit used
  • Number of restarted jobs
  • Total number of jobs
  • Total number of failed jobs
  • Total number of canceled jobs
  • Total number of completed jobs
  • Total number of dry-run jobs
  • Peak number of concurrently managed jobs
  • Number of jobs currently being managed
  • Number of jobs currently in the UNSUBMITTED state
  • Number of jobs currently in the STAGE_IN state
  • Number of jobs currently in the PENDING state
  • Number of jobs currently in the ACTIVE state
  • Number of jobs currently in the STAGE_OUT state
  • Number of jobs currently in the FAILED state
  • Number of jobs currently in the DONE state

Also, please see our policy statement on the collection of usage statistics.

Glossary

Index

B

bugs
outstanding, Outstanding bugs

E

errors, Errors

J

jobs
preparing
delegating credentials, Delegating credentials
generate valid proxy, Preparing to use GRAM

L

limitations, Limitations

T

troubleshooting, Troubleshooting
check documentation, Troubleshooting tips
errors, Troubleshooting
gram log, Troubleshooting tips
mailing lists, Troubleshooting tips