Introduction
This guide contains advanced configuration information for system administrators working with GRAM5. It provides references to information on procedures typically performed by system administrators, including installation, configuring, deploying, and testing the installation. It also describes additional prerequisites and host settings necessary for GRAM5 operation. Readers should be familiar with the Key Concepts and Implementation Approach for GRAM5 to understand the motivation for and interaction between the various deployed components.
![]() | Important |
|---|---|
The information in this GRAM5 Admin Guide is in addition to the basic Globus Toolkit prerequisite, overview, installation, security configuration instructions in the Installing GT 5.0.0. Read through this guide before continuing! |
Table of Contents
- 1. Building and Installing
- 2. Configuring
- 1. Typical Configuration
- 2. Non-default Configuration
- 3. Configuration Details
- 3. Deploying
- 4. Running the SEG
- 5. Scalability and Performance Recommendations
- 6. Audit Logging
- 7. Testing
- 8. Security Considerations
- 9. Admin Debugging
- 10. GRAM5 Admin Programs
- globus-gram-audit - Load GRAM4 and GRAM5 audit records into a database
- globus-job-manager-event-generator - Create LRM-independent SEG files for the job manager to use
- globus-job-manager - Execute and monitor jobs
- 11. Troubleshooting
- 12. Usage statistics collection by the Globus Alliance
- Glossary
- Index
Table of Contents
GRAM5 is built and installed as part of a default GT 5.0.0 installation. For basic installation instructions, see Installing GT 5.0.0.
GRAM5 depends on a local mechanism for starting and controlling jobs. GRAM5 includes a fork local resource manager which requires no special software to execute jobs on the local host. GRAM5 can also be configured to use additional batch facilities and schedulers such as condor, torque, or LSF. Install and configure any local resource managers you intend to use prior to configuring GRAM5.
GRAM5 depends on LRM adapters to execute jobs described by GRAM5 RSL documents on a resource managed by a LRM.
LRM adapters included in the GT 5.0.0 release are:
Table 1.1. Supported LRM Adapters
| LRM Adapter Name | LRM Supported |
|---|---|
| fork | Unscheduled local execution |
| pbs | Torque |
| condor | Condor |
| lsf | LSF |
| SGE | Grid Engine |
![]() | Note |
|---|---|
The |
For configuration details, see "Configuring LRM adapters" in the Configuring section.
Table of Contents
- 1. Typical Configuration
- 2. Non-default Configuration
- 3. Configuration Details
Three GRAM5 LRM adapters included in the GT
5.0.0 installer besides the
default fork LRM adapter. These are
installed by using the following
make rules from the installer
directory:
Table 2.1. LRM Adapter Make targets
| LRM | Installer make target |
|---|---|
| Fork | gram5-fork |
| Condor | gram5-condor |
| LSF | gram5-lsf |
| PBS | gram5-pbs |
| SGE | gram5-sge |
Example 2.1. Installing PBS LRM Adapter
%make gram5-pbscd gpt && OBJECT_MODE=32 ./build_gpt build_gpt ====> installing GPT into /opt/globus build_gpt ====> building /usr/src/globus/gpt/support/Compress-Zlib-1.21 build_gpt ====> building /usr/src/globus/gpt/support/IO-Zlib-1.01 build_gpt ====> building /usr/src/globus/gpt/support/makepatch-2.00a build_gpt ====> building /usr/src/globus/gpt/support/Archive-Tar-0.22 build_gpt ====> building /usr/src/globus/gpt/support/PodParser-1.18 build_gpt ====> building /usr/src/globus/gpt/support/Digest-MD5-2.20 build_gpt ====> building /usr/src/globus/gpt/packaging_tools /opt/globus/sbin/gpt-build -srcdir=source-trees/core/source gcc64dbg gpt-build ====> Changing to /usr/src/globus/source-trees/core/source gpt-build ====> BUILDING FLAVOR gcc64dbg additional lines omitted%make installif [ ! -L /opt/globus/etc/globus_packages ]; then \ cd /opt/globus/etc/; \ ln -sf gpt/packages globus_packages; \ fi; \ /opt/globus/sbin/gpt-postinstall running /opt/globus/setup/globus/setup-globus-common..[ Changing to /opt/globus/setup/globus ] creating globus-sh-tools-vars.sh creating globus-script-initializer creating Globus::Core::Paths checking globus-hostname Done additional lines omitted
The LRM make targets will check all dependencies they need and try to build them if they are not yet installed.
In addition to the service configuration described above, there are LRM-specific configuration files for the Scheduler Event Generator modules. These files consist of name=value pairs separated by newlines. These files are:
Configuration for the Fork SEG module implementation. The attributes names for this file are:
- log_path
- Path to the SEG Fork log (used by the globus-fork-starter and the SEG). The value of this should be the path to a world-writable file. The default value for this created by the Fork setup package is $GLOBUS_LOCATION/var/globus-fork.log. This file must be readable by the account that the SEG is running as.
Configuration for the Condor SEG module implementation. The attributes names for this file are:
- log_path
- Path to the SEG Condor log (used by the Globus::GRAM::JobManager::condor perl module and Condor SEG module. The value of this should be the path to a world-readable and world-writable file. The default value for this created by the Fork setup package is $GLOBUS_LOCATION/var/globus-condor.log
Configuration for the PBS SEG module implementation. The attributes names for this file are:
- log_path
- Path to the SEG PBS logs (used by the Globus::GRAM::JobManager::pbs perl module and PBS SEG module. The value of this should be the path to the directory containing the server logs generated by PBS. For the SEG to operate, these files must have file permissions such that the files may be read by the user the SEG is running as.
Configuration for the LSF SEG module implementation. The attributes names for this file are:
- log_path
- Path to the SEG LSF log directory. This is used by the LSF SEG module. The value of this should be the path to the directory containing the server logs generated by LSF. For the SEG to operate, these files must have file permissions such that the files may be read by the user the SEG is running as.
Configuration for the SGE SEG module implementation. The attributes names for this file are:
- log_path
- Path to the SGE reporting file.
used by the SGE SEG module. The value of this should
be the path to the cell's
accountingfile generated by SGE. For the SEG to operate, this file must have file permissions such that the files may be read by the user the SEG is running as and the ARCO database uploader must not be running.
In most situations, the SEG interface provides a more efficient
way to monitor jobs than the poll method. This is configured by
adding the -seg-module
command-line option to
the job manager configuration in
LRM.
$GLOBUS_LOCATION/etc/grid-services/jobmanager-LRM
A client can submit a job without specifying the local resource manager (LRM) that should execute the job.
The default job manager for a particular host is determined by
the contents of the file
.
By default, this file is a symbolic link to the service definition for
the first LRM adapter that is installed. To change the default LRM,
change the link to point to a different LRM service definition.
$GLOBUS_LOCATION/etc/grid-services/jobmanager
When the GRAM5 gatekeeper accepts a new connection, it checks the
contents of the service directory to determine what services are
configured. To disable a service, remove the service entry from the
service directory. The service entry can be recreated by running the
gpt-postinstall -force command.
Example 2.3. Disabling the PBS LRM Adapter
%cd$GLOBUS_LOCATION/etc/grid-services%rmjobmanager-pbs
If the PBS LRM was the default job manager, then the symlink
will no longer point to a valid file, and there will be no default job
manager configured. Clients must explicitly choose some other LRM which
is configured.
$GLOBUS_LOCATION/etc/grid-services/jobmanager
Table of Contents
GRAM5 is installed as part of a standard toolkit installation. By
default the fork LRM interface is installed and configured
to use the poll interface.
In order to run the service, the globus-gatekeeper program must be started on a service node. This may be done in two supported ways, either via a super-server such as inetd or xinetd, or as a stand-alone daemon process.
To deploy GRAM5 to be started by inetd,
define a service entry for the gsigatekeeper service in
/etc/services file if it is not already present. The
ICANN-assigned port number for the service is 2119, so add an entry like:
gsigatekeeper 2119/tcp
To deploy GRAM5 to be started via
inetd, modify
/etc/inetd.conf (or your operating-system specific
path to this file) by adding the following entry (on one line):
gsigatekeeper stream tcp nowait root /usr/bin/env env LD_LIBRARY_PATH=GLOBUS_LOCATION/libGLOBUS_LOCATION/sbin/globus-gatekeeper -confGLOBUS_LOCATION/etc/globus-gatekeeper.conf
After doing so, run the operating-system specific command to have inetd reload its configuration.
![]() | Important |
|---|---|
This example assumes that the The |
![]() | Note |
|---|---|
When deploying GRAM5 via xinetd, be sure to include the command-line
option |
To deploy GRAM5 to be started via
xinetd, create a file in the
/etc/xinetd.d/ directory (or the
operating-system specific path to xinetd configuration files) called
gsigatekeeper with the following contents:
service gsigatekeeper
{
socket_type = stream
protocol = tcp
wait = no
user = root
env = LD_LIBRARY_PATH=GLOBUS_LOCATION/lib
server = GLOBUS_LOCATION/sbin/globus-gatekeeper
server_args = -conf GLOBUS_LOCATION/etc/globus-gatekeeper.conf
disable = no
}After doing so, run the operating-system specific command to have xinetd reload its configuration.
![]() | Important |
|---|---|
This example assumes that the The |
![]() | Note |
|---|---|
When deploying GRAM5 via xinetd, be sure to include the command-line
option |
To deploy GRAM5 to be started as a daemon, run the globus-gatekeeper at boot time. This may be done in an init script, via cron, or other system-specific methods. The typical command-line to run is
globus-gatekeeper -conf $GLOBUS_LOCATION/etc/globus-gatekeeper.conf
![]() | Note |
|---|---|
When deploying GRAM5 as a daemon, be sure to include the command-line
option |
Table of Contents
GRAM5 can be configured to use the
globus-job-manager-event-generator program to
monitor job state changes. This is often more efficient than using
a LRM adapter's poll method. This program
is configured when the LRM-specific bundle is installed. However, by
default, the job manager does not use the SEG. It must be explicitly
enabled by adding the -seg-module
option to the job manager
configuration.
LRM
To start the globus-job-manager-event-generator program, add the following command to your system init scripts or crontab to be run at boot time:
$GLOBUS_LOCATION/sbin/globus-job-manager-event-generator-schedulerLRM-background-pidfilePIDFILE
This will start the globus-job-manager-event-generator
to monitor jobs for the resource named by LRM
in the background. The process id of the command will be written to the
file named by PIDFILE, so that processes can
check if it is running and kill it if necessary.
![]() | Important |
|---|---|
If the job manager is configured to use a SEG module but the globus-job-manager-event-generator is not running for that LRM, jobs will appear to hang. It is important that the program be running whenever GRAM5 jobs might be run. |
To stop the SEG, kill the
globus-job-manager-event-generator process. The
-pidfile option makes it easy to know which process to
kill. When the SEG terminates, it will remove that file.
Example 4.1. Starting and Stopping the SEG
This example shows how to start and stop the SEG using the command-line options described above.
%globus-job-manager-event-generator-scheduler pbs-background-pidfile$GLOBUS_LOCATION/var/globus-job-manager-seg-pbs.pidRunning in background (78258)%kill `cat`$GLOBUS_LOCATION/var/globus-job-manager-seg-pbs.pid
Table of Contents
This document includes recommendations for increasing the scalability and performance of GRAM5 in a Grid.
The GRAM5 service stores job state for crash recovery on disk. By default, the
directory is used for these files. If this path is located on a distributed file system mount, locking and updating the job state files can negatively effect performance.$GLOBUS_LOCATION/tmp/gram_job_stateTo configure GRAM5 to use a local disk for job state files, modify
or$GLOBUS_LOCATION/etc/globus-job-manager.confso that the argument to the$GLOBUS_LOCATION/etc/grid-services/jobmanager-LRM-state-file-diris a local directory path. That directory must be world writable with the "sticky bit" set (mode1777).
The GRAM5 service can use two different interfaces to obtain job state changes, polling and using the SEG. The default method is to poll the LRM via the GRAM5 script interface. This polling method is often less efficient than the SEG method and results in a higher load on the GRAM5 service node, even when all managed jobs are waiting in the queue for execution.
The other method, uses a program implementing the SEG interface to generate LRM events which can be stored in a log file for job managers run by many different users to process. When this is used, the multiple job managers may detect job state changes without having to directly interact with the LRM.
To enable the SEG to monitor jobs for a particular LRM, install and configure the LRM-specific gram5 bundle, run the globus-job-manager-event-generator program on a node which can access the LRM interfaces needed by the LRM-specific SEG module, and configure the LRM-specific service instance to use the SEG to monitor jobs for state changes.
Table of Contents
GRAM5 includes mechanisms to provide access to audit and accounting information associated with jobs that GRAM5 submits to a local resource manager (LRM) such as PBS, LSF, or Condor.
![]() | Note |
|---|---|
Remember, GRAM is not a local resource manager but rather a protocol engine for communicating with a range of different local resource managers using a standard message format. |
In some scenarios, it is desirable to get general information about the usage of the underlying LRM, such as:
What kinds of jobs were submitted via GRAM?
How long did the processing of a job take?
How many jobs were submitted by user X?
The following three use cases give a better overview of the meaning and purpose of auditing and accounting:
Group Access. A grid resource provider allows a remote service (e.g., a gateway or portal) to submit jobs on behalf of multiple users. The grid resource provider only obtains information about the identity of the remote submitting service and thus does not know the identity of the users for which the grid jobs are submitted. This group access is allowed under the condition that the remote service stores audit information so that, if and when needed, the grid resource provider can request and obtain information to track a specific job back to an individual user.
Query Job Accounting. A client that submits a job needs to be able to obtain, after the job has completed, information about the resources consumed by that job. In portal and gateway environments where many users submit many jobs against a single allocation, this per-job accounting information is needed soon after the job completes so that client-side accounting can be updated. Accounting information is sensitive and thus should only be released to authorized parties.
Auditing. In a distributed multi-site environment, it can be necessary to investigate various forms of suspected intrusion and abuse. In such cases, we may need to access an audit trail of the actions performed by a service. When accessing this audit trail, it will frequently be important to be able to relate specific actions to the user.
Audit logging in GRAM5 is done when a job completes.
While audit and accounting records may be generated and stored by different entities in different contexts, we make the following assumptions in this chapter:
| Audit Records | Accounting Records | |
|---|---|---|
| Generated by: | GRAM service | LRM to which the GRAM service submits jobs |
| Stored in: | Database, indexed by GJID | LRM, indexed by JID |
| Data that is stored: | See list below. | May include all information about the duration and resource-usage of a job |
The audit record of each job contains the following data:
job_grid_id: String representation of the resource EPR
local_job_id: Job/process id generated by the scheduler
subject_name: Distinguished name (DN) of the user
username: Local username
idempotence_id: Job id generated on the client-side
creation_time: Date when the job resource is created
queued_time: Date when the job is submitted to the scheduler
stage_in_grid_id: String representation of the stageIn-EPR (RFT)
stage_out_grid_id: String representation of the stageOut-EPR (RFT)
clean_up_grid_id: String representation of the cleanUp-EPR (RFT)
globus_toolkit_version: Version of the server-side GT
resource_manager_type: Type of the resource manager (Fork, Condor, ...)
job_description: Complete job description document
success_flag: Flag that shows whether the job failed or finished successfully
finished_flag: Flag that shows whether the job is already fully processed or still in progress
gateway_user: Teragrid identity of the user which submitted the job.
The rest of this chapter focuses on how to configure GRAM5 to enable Audit-Logging. A case study for TeraGrid can be read here, which also includes more information about how to use this data to get accounting information of a job, query the audit database for information via a Web Services interface, etc.
Audit logging is turned off by default. To enable GRAM5 audit logging,
in the job manager, add the command-line option
-audit-directory to the job manager configuration in
either
{audit-directory}
to enable it for all job manager services, or in
$GLOBUS_LOCATION/etc/globus-job-manager.conf
to enable it for a particular job manager service for a particular LRM.
$GLOBUS_LOCATION/etc/grid-services/LRM_SERVICE_NAME
The globus-gram-audit program reads GRAM5
audit records and loads those records into an SQL database. This program
is available as part of the
globus_gram_job_manager_auditing package. It
must be configured by installing and running the
globus_gram_job_manager_auditing_setup_scripts
setup package via gpt-postinstall. This setup script
creates the
configuration file described below and creates database tables needed by
the audit system.
$GLOBUS_LOCATION/etc/globus-job-manager-audit.conf
The globus-gram-audit program support three database systems: MySQL, PostgreSQL, and SQLite.
The auditing configuration file consists of a series of line-oriented
records that define various configuration attributes used by the
globus-gram-audit program. This file can be edited
by hand or using the
$GLOBUS_LOCATION/setup/globus/setup-globus-gram-auditing
program. That program is run automatically with the default values
described below when gpt-postinstall is run after
installing the audit setup package.
The values which may be defined in the configuration file are:
| Attribute Name | Values | Default when not specified on the setup command-line |
|---|---|---|
| DRIVER |
The name of the Perl 5 DBI driver for the
database to be used. The supported drivers for this
program are | SQLite |
| DATABASE |
The DBI data source specfication to contact the audit database. | dbname= |
| USERNAME | Username to authenticate as to the database | |
| PASSWORD | Password to use to authenticate with the database | |
| AUDITVERSION | Version of the audit database table schemas to use. May be 1 or 1TG for this version of the software. | 1 |
Thus, the default configuration file when GLOBUS_LOCATION is
/opt/globus is
DRIVER:SQLite
DATABASE:dbname=/opt/globus/var/gram_audit_database/gram_audit.db
USERNAME:
PASSWORD:
AUDITVERSION:1
Table of Contents
There are three test suites available to verify that the GRAM5 client and service are installed corrected.
The GRAM protocol test suite tests the implementation of the GRAM protocol library which is used by the job manager and GRAM clients to process messages. The following examples shows how to run the test suite.
Example 7.1. Running the GRAM Protocol Test Suite
%cd$GLOBUS_LOCATION/test/globus_gram_protocol_test%grid-proxy-initYour identity: /DC=org/O=example/OU=grid/CN=Joe User Enter GRID pass phrase for this identity: Creating proxy ................................... Done Your proxy is valid until: Thu Nov 12 23:28:05 2009%./TESTS.plglobus-gram-protocol-allow-attach-test...........ok globus-gram-protocol-error-test..................ok globus-gram-protocol-io-test.....................ok globus-gram-protocol-pack-test...................ok pack-with-extensions-test........................ok create-extensions-test...........................ok unpack-message-test..............................ok unpack-with-extensions-test......................ok unpack-job-request-reply-with-extensions-test....ok unpack-status-reply-with-extensions-test.........ok All tests successful. Files=10, Tests=42, 1 wallclock secs ( 0.37 cusr + 0.24 csys = 0.61 CPU)
The GRAM client test suite tests the interactions between the GRAM client API implementation and the job manager service. These tests include authentication, callback, signal, job, and cancellation tests.
Example 7.2. Running the GRAM Client Test Suite
%cd$GLOBUS_LOCATION/test/globus_gram_client_test%grid-proxy-initYour identity: /DC=org/O=example/OU=grid/CN=Joe User Enter GRID pass phrase for this identity: Creating proxy ................................... Done Your proxy is valid until: Thu Nov 12 23:28:05 2009%./TESTS.plglobus-gram-client-activate-test................ok globus-gram-client-callback-contact-test........ok globus-gram-client-cancel-test..................ok globus-gram-client-nonblocking-register-test....ok 1/4Failed submitting job request because an authorization operation failed globus_xio_gsi: gss_init_sec_context failed. GSS Major Status: Unexpected Gatekeeper or Service Name globus_gsi_gssapi: Authorization denied: The name of the remote entity (/DC=org/O=example/OU=grid/CN=Joe User), and the expected name for the remote entity (/DC=org/O=example/OU=grid/CN=Joe UserX) do not match . globus-gram-client-nonblocking-register-test....ok globus-gram-client-refresh-credentials-test.....ok globus-gram-client-register-test................ok globus-gram-client-register-callback-test.......ok 1/4Failed submitting job request because an authorization operation failed globus_xio_gsi: gss_init_sec_context failed. GSS Major Status: Unexpected Gatekeeper or Service Name globus_gsi_gssapi: Authorization denied: The name of the remote entity (/DC=org/O=example/OU=grid/CN=Joe User), and the expected name for the remote entity (/DC=org/O=example/OU=grid/CN=Joe UserX) do not match . globus-gram-client-register-callback-test.......ok globus-gram-client-register-cancel-test.........ok globus-gram-client-ping-test....................ok globus-gram-client-status-test..................Made 3961 calls to status in 60.416390 seconds globus-gram-client-status-test..................ok globus-gram-client-two-phase-commit-test........ok globus-gram-client-register-ping-test...........ok globus-gram-client-stdio-size-test..............job manager returned 0 (Success) when I expected it to still be streaming output globus-gram-client-stdio-size-test..............ok version-test....................................ok All tests successful. Files=14, Tests=33, 791 wallclock secs (32.10 cusr + 5.34 csys = 37.44 CPU)
![]() | Note |
|---|---|
Some of the test cases display messages that look like errors when running. This is to be expected. The only concern should be the final lines indicating if the tests are successful or not. |
![]() | Note |
|---|---|
By default, this suite uses tests against a personal
gatekeeper running the |
The GRAM job manager test suite tests the features provided by the LRM scripts, including detecting failures, staging files, and submitting different types of jobs. The following example shows how to run the job manager tests.
Example 7.3. Running the GRAM Job Manager Test Suite
%cd$GLOBUS_LOCATION/test/globus_gram_job_manager_test%grid-proxy-initYour identity: /DC=org/O=example/OU=grid/CN=Joe User Enter GRID pass phrase for this identity: Creating proxy ................................... Done Your proxy is valid until: Thu Nov 12 23:28:05 2009%./TESTS.pljob-manager-script-test..................ok globus-gram-job-manager-stdio-test.......ok globus-gram-job-manager-submit-test......ok globus-gram-job-manager-failure-test.....ok globus-gram-job-manager-rsl-size-test....ok globus-gram-job-manager-user-test........ok All tests successful. Files=6, Tests=137, 200 wallclock secs (32.10 cusr + 5.34 csys = 37.44 CPU)
![]() | Note |
|---|---|
This test requires a GridFTP server to be running on the
host running the test suite. If one is not running, then the
following test cases will fail:
|
![]() | Note |
|---|---|
By default, this suite uses tests against a personal
gatekeeper running the |
By default, GRAM5 logs errors to
where $HOME/gram_YYYYMMDD.logYYYYMMDD is the time of the log event in
GMT. The log file format conforms to CEDPS Logging Best Practices. GRAM5 log files are governed by the log levels defined in the job manager configuration file. The log levels available are defined below:
Table 9.1. GRAM5 Log Levels
| Level | Meaning | Default Behavior |
|---|---|---|
FATAL | Problems which cause the job manager to terminate prematurely | Enabled |
ERROR | Problems which cause a job or operation to fail | Enabled |
WARN | Problems which cause minor problems with job execution or monitoring | Disabled |
INFO | Major events in the lifetime of the job manager and its jobs | Disabled |
DEBUG | Minor events in the lifetime of jobs | Disabled |
TRACE | Job processing details | Disabled |
To enable logging for GRAM5, modify
so that it has either $GLOBUS_LOCATION/etc/globus-job-manager.conf-stdio-log
to log to a file or
PATH-enable-syslog to log using the syslog service. To select
log levels, add -log-levels
" to the configuration file. The
LEVELS"LEVELS string can contain any of the log levels
mentioned aboved joined by the vertical bar character '|'.
Table of Contents
- globus-gram-audit - Load GRAM4 and GRAM5 audit records into a database
- globus-job-manager-event-generator - Create LRM-independent SEG files for the job manager to use
- globus-job-manager - Execute and monitor jobs
Name
globus-gram-audit — Load GRAM4 and GRAM5 audit records into a database
Synopsis
globus-gram-audit [--conf CONFIG_FILE] [--check] [--delete] [--audit-directory AUDITDIR]
Description
The globus-gram-audit program loads audit records to an
SQL-based database. It reads
by default to determine the audit directory and then uploads all files in that
directory that contain valid audit records to the database configured by the
globus_gram_job_manager_auditing_setup_scripts
package. If the upload completes successfully, the audit files will be removed.
$GLOBUS_LOCATION/etc/globus-job-manager.conf
The full set of command-line options to globus-gram-audit consist of:
--conf |
Use |
--check |
Check whether the insertion of a record was successful by querying the database after inserting the records. This is used in tests. |
--delete | Delete audit records from the database right after inserting them. This is used in tests to avoid filling the databse with test records. |
--audit-directory | Look for audit records in DIR, instead of looking in the directory specified in the job manager configuration. This is used in tests to control which records are loaded to the database and then deleted. |
--query | Perform the given SQL query on the audit database. This uses the database information from the configuration file to determine how to contact the database. |
FILES
The globus-gram-audit uses the following files (paths
relative to $GLOBUS_LOCATION.
etc/globus-gram-job-manager.conf |
GRAM5 job manager configuration. It includes the default path to the audit directory |
etc/globus-gram-audit.conf |
Audit configuration. It includes the information needed to contact the audit database. |
Name
globus-job-manager-event-generator — Create LRM-independent SEG files for the job manager to use
Synopsis
globus-job-manager-event-generator [-help] {-scheduler LRM} [-background] [-pidfile PIDPATH]
Description
The globus-job-manager-event-generator program is a utility which uses LRM-specific SEG parsers to generate a LRM-independent log file that a job manager instance can use to process job status change events. This program runs independently of all globus-job-manager instances so that only one process needs to deal with the LRM interface. The globus-job-manager-event-generator program can be run as a privileged user if required to interface with the LRM.
In order for globus-job-manager-event-generator to handle events for a particular LRM, the
globus_scheduler_event_generator_job_manager_setup
setup package must be configured after the LRM-specific setup package has been
run. This can be forced by gpt-postinstall -force or running
the command cd $GLOBUS_LOCATION/setup/globus;
./setup-seg-job-manager.pl.
The full set of command-line options to globus-job-manager-event-generator consists of:
-help- Print command-line option summary and exit.
-schedulerLRM- Process events for the local resource manager
named by
LRM. -background- Run globus-job-manager-event-generator as a background process. It will fork a new process, print out its process ID and then the original process will terminate.
-pidfilePIDPATH- Write the process ID of an instance of globus-job-manager-event-generator to
the file named by
PIDPATH. This file can be used to kill or monitor the globus-job-manager-event-generator process.
Files
- globus-job-manager-seg.conf
- Configuration file for globus-job-manager-event-generator. Each line consists of
a string of the form
LRM_log_path=, which indicates the directory containing LRM-independent format SEG log files for the LRM. This file is created by the running the globus_scheduler_event_generator_job_manager_setup setup package.PATH
Name
globus-job-manager — Execute and monitor jobs
Synopsis
globus-job-manager {-type LRM} [-conf CONFIG_PATH] [-help] [-globus-host-manufacturer MANUFACTURER] [-globus-host-cputype CPUTYPE] [-globus-host-osname OSNAME] [-globus-host-osversion OSVERSION] [-globus-gatekeeper-host HOST] [-globus-gatekeeper-port PORT] [-globus-gatekeeper-subject SUBJECT] [-home GLOBUS_LOCATION] [-target-globus-location TARGET_GLOBUS_LOCATION] [-condor-arch ARCH] [-condor-os OS] [-history HISTORY_DIRECTORY] [-scratch-dir-base SCRATCH_DIRECTORY] [-enable-syslog] [-stdio-log LOG_DIRECTORY] [-log-levels LEVELS] [-state-file-dir STATE_DIRECTORY] [-globus-tcp-port-range PORT_RANGE] [-x509-cert-dir TRUSTED_CERTIFICATE_DIRECTORY] [-cache-location GASS_CACHE_DIRECTORY] [-k] [-extra-envvars VAR=VAL,...] [-seg-module SEG_MODULE] [-audit-directory AUDIT_DIRECTORY] [-globus-toolkit-version TOOLKIT_VERSION] [-disable-streaming] [-disable-usagestats] [-usagestats-targets TARGET] [-service-tag SERVICE_TAG]
Description
The globus-job-manager program is a servivce which starts and controls GRAM jobs which are executed by a local resource management system, such as LSF or Condor. The globus-job-manager program is typically started by the globus-gatekeeper program and not directly by a user. It runs until all jobs it is managing have terminated or its delegated credentials have expired.
Typically, users interact with the globus-job-manager program via client applications such as globusrun, globus-job-submit, or tools such as CoG jglobus or Condor-G.
The full set of command-line options to globus-job-manager consists of:
-help- Display a help message to standard error and exit
-typeLRM- Execute jobs using the local resource manager named
LRM. -confCONFIG_PATH- Read additional command-line arguments from the file
CONFIG_PATH. If present, this must be the first command-line argument to the globus-job-manager program. -globus-host-manufacturerMANUFACTURER- Indicate the manufacturer of the system which the jobs will execute on. This parameter sets the value of the
$(GLOBUS_HOST_MANUFACTURER)RSL substitution toMANUFACTURER -globus-host-cputypeCPUTYPE- Indicate the CPU type of the system which the jobs will execute on. This parameter sets the value of the
$(GLOBUS_HOST_CPUTYPE)RSL substitution toCPUTYPE -globus-host-osnameOSNAME- Indicate the operating system type of the system which the jobs will execute on. This parameter sets the value of the
$(GLOBUS_HOST_OSNAME)RSL substitution toOSNAME -globus-host-osversionOSVERSION- Indicate the operating system version of the system which the jobs will execute on. This parameter sets the value of the
$(GLOBUS_HOST_OSVERSION)RSL substitution toOSVERSION -globus-gatekeeper-hostHOST- Indicate the host name of the machine which the job was submitted to. This parameter sets the value of the
$(GLOBUS_GATEKEEPER_HOST)RSL substitution toHOST -globus-gatekeeper-portPORT- Indicate the TCP port number of gatekeeper to which jobs are submitted to. This parameter sets the value of the
$(GLOBUS_GATEKEEPER_PORT)RSL substitution toPORT -globus-gatekeeper-subjectSUBJECT- Indicate the X.509 identity of the gatekeeper to which jobs are submitted to. This parameter sets the value of the
$(GLOBUS_GATEKEEPER_SUBJECT)RSL substitution toSUBJECT -homeGLOBUS_LOCATION- Indicate the path where the Globus Toolkit(r) is installed on the service node. This is used by the job manager to locate its support and configuration files.
-target-globus-locationTARGET_GLOBUS_LOCATION- Indicate the path where the Globus Toolkit(r) is installed on the execution host. If this is omitted, the value specified as a parameter to
-homeis used. This parameter sets the value of the$(GLOBUS_LOCATION)RSL substitution toTARGET_GLOBUS_LOCATION -historyHISTORY_DIRECTORY- Configure the job manager to write job history files to
HISTORY_DIRECTORY. These files are described in the FILES section below. -scratch-dir-baseSCRATCH_DIRECTORY- Configure the job manager to use
SCRATCH_DIRECTORYas the default scratch directory root if a relative path is specified in the job RSL'sscratch_dirattribute. -enable-syslog- Configure the job manager to write log messages via syslog. Logging is further controlled by the argument to the
-log-levelsparameter described below. -stdio-logLOG_DIRECTORY- Configure the job manager to write log messages to files in the
LOG_DIRECTORYdirectory. Files will be named. Logging is further controlled by the argument to theLOG_DIRECTORY/gram_YYYYMMDD.log-log-levelsparameter described below. TheLOG_DIRECTORYvalue can include variables derived from the job manager environment using the same syntax as RSL substitutions. For example,-stdio-log $(HOME)would cause each user's logs to be stored in their individual home directories. -log-levelsLEVELS- Configure the job manager to write log messages of certain levels to syslog and/or log files. The available log levels are
FATAL,ERROR,WARN,INFO,DEBUG, andTRACE. Multiple values can be combined with the|character. The default value of logging when enabled isFATAL|ERROR. -state-file-dirSTATE_DIRECTORY- Configure the job manager to write state files to
STATE_DIRECTORY. If not specified, the job manager uses the default of. This directory must be writable by all users and be on a file system which supports POSIX advisory file locks.$GLOBUS_LOCATION/tmp/gram_job_state/ -globus-tcp-port-rangePORT_RANGE- Configure the job manager to restrict its TCP/IP communication to use ports in the range described by
PORT_RANGE. This value is also made available in the job environment via theGLOBUS_TCP_PORT_RANGEenvironment variable. -x509-cert-dirTRUSTED_CERTIFICATE_DIRECTORY- Configure the job manager to search
TRUSTED_CERTIFICATE_DIRECTORYfor its list of trusted CA certificates and their signing policies. This value is also made available in the job environment via theX509_CERT_DIRenvironment variable. -cache-locationGASS_CACHE_DIRECTORY- Configure the job manager to use the path
GASS_CACHE_DIRECTORYfor its temporary GASS-cache files. This value is also made available in the job environment via theGLOBUS_GASS_CACHE_DEFAULTenvironment variable. -k- Configure the job manager to assume it is using Kerberos for authentication instead of X.509 certificates. This disables some certificate-specific processing in the job manager.
-extra-envvarsVAR=VAL,...- Configure the job manager to define a set of environment variables in the job environment beyond those defined in the base job environment. The format of the parameter to this argument is a comma-separated sequence of VAR=VAL pairs, where
VARis the variable name andVALis the variables value. -seg-moduleSEG_MODULE- Configure the job manager to use the schedule event generator module named by
SEG_MODULEto detect job state changes events from the local resource manager, in place of the less efficient polling operations used in GT2. To use this, one instance of the globus-job-manager-event-generator must be running to process events for the LRM into a generic format that the job manager can parse. -audit-directoryAUDIT_DIRECTORY- Configure the job manager to write audit records to the directory named by
AUDIT_DIRECTORY. This records can be loaded into a database using the globus-gram-audit program. -globus-toolkit-versionTOOLKIT_VERSION- Configure the job manager to use
TOOLKIT_VERSIONas the version for audit and usage stats records. -service-tagSERVICE_TAG- Configure the job manager to use
SERVICE_TAGas a unique identifier to allow multiple GRAM instances to use the same job state directories without interfering with each other's jobs. If not set, the valueuntaggedwill be used. -disable-streaming- Configure the job manager to disable file streaming. This is propagated to the LRM script interface but has no effect in GRAM5.
-disable-usagestats- Disable sending of any usage stats data, even if
-usagestats-targetsis present in the configuration. -usagestats-targetsTARGET- Send usage packets to a data collection service for analysis. The
TARGETstring consists of a comma-separated list of HOST:PORT combinations, each contaiing an optional list of data to send. See Usage Stats Packets for more information about the tags. Special tag strings ofall(which enables all tags) anddefaultmay be used, or a sequence of characters for the various tags. -condor-archARCH- Set the architecture specification for condor jobs to be
ARCHin job classified ads generated by the GRAM5 codnor LRM script. This is required for the condor LRM but ignored for all others. -condor-osOS- Set the operating system specification for condor jobs to be
OSin job classified ads generated by the GRAM5 codnor LRM script. This is required for the condor LRM but ignored for all others.
Environment
If the following variables affect the execution of globus-job-manager
HOME- User's home directory.
LOGNAME- User's name.
JOBMANAGER_SYSLOG_ID- String to prepend to syslog audit messages.
JOBMANAGER_SYSLOG_FAC- Facility to log syslog audit messages as.
JOBMANAGER_SYSLOG_LVL- Priority level to use for syslog audit messages.
GATEKEEPER_JM_ID- Job manager ID to be used in syslog audit records.
GATEKEEPER_PEER- Peer information to be used in syslog audit records
GLOBUS_ID- Credential information to be used in syslog audit records
GLOBUS_JOB_MANAGER_SLEEP- Time (in seconds) to sleep when the job manager is started. [For debugging purposes only]
GRID_SECURITY_HTTP_BODY_FD- File descriptor of an open file which contains the initial job request and to which the initial job reply should be sent. This file descriptor is inherited from the globus-gatekeeper.
X509_USER_PROXY- Path to the X.509 user proxy which was delegated by the client to the globus-gatekeeper program to be used by the job manager.
GRID_SECURITY_CONTEXT_FD- File descriptor containing an exported security context that the job manager should use to reply to the client which submitted the job.
Files
$HOME/.globus/job/HOSTNAME/LRM.TAG.red- Job manager delegated user credential.
$HOME/.globus/job/HOSTNAME/LRM.TAG.lock- Job manager state lock file.
$HOME/.globus/job/HOSTNAME/LRM.TAG.pid- Job manager pid file.
$HOME/.globus/job/HOSTNAME/LRM.TAG.sock- Job manager socket for inter-job manager communications.
$HOME/.globus/job/HOSTNAME/JOB_ID/- Job-specific state directory.
$HOME/.globus/job/HOSTNAME/JOB_ID/stdin- Standard input which has been staged from a remote URL.
$HOME/.globus/job/HOSTNAME/JOB_ID/stdout- Standard output which will be staged from a remote URL.
$HOME/.globus/job/HOSTNAME/JOB_ID/stderr- Standard error which will be staged from a remote URL.
$HOME/.globus/job/HOSTNAME/JOB_ID/x509_user_proxy- Job-specific delegated credential.
$GLOBUS_LOCATION/tmp/gram_job_state/job.HOSTNAME.JOB_ID- Job state file.
$GLOBUS_LOCATION/tmp/gram_job_state/job.HOSTNAME.JOB_ID.lock- Job state lock file. In most cases this will be a symlink to the job manager lock file.
$GLOBUS_LOCATION/etc/globus-job-manager.conf- Default location of the global job manager configuration file.
$GLOBUS_LOCATION/etc/grid-services/jobmanager-LRM- Default location of the LRM-specific gatekeeper configuration file.
Table of Contents
For a list of error codes generated by GRAM5, see Section 2, “Errors”.
For information about sys admin logging, see Chapter 9, Admin Debugging in the GRAM5 Admin Guide.
In case you run into problems you can do the following
- Check the GRAM5 documentation. Maybe you'll find hints here to solve your problem.
Check the GRAM5 log for errors.
In case you don't find anything suspicious you can increase the log-level of GRAM5 or other relevant components. Maybe the additional logging-information will tell you what's going wrong.
- Send e-mails to
<gram-user@globus.org>. You'll have to subscribe to a list before you can send an e-mail to it. See here for general e-mail lists and information on how to subscribe to a list and here for GRAM specific lists.
Table 11.1. GRAM5 Errors
| Error Code | Reason | Possible Solutions |
|---|---|---|
| 1 | one of the RSL parameters is not supported | Check RSL documentation |
| 2 | the RSL length is greater than the maximum allowed | Use RSL substitutions to reduce length of RSL strings |
| 3 | an I/O operation failed | Enable trace logging and report to gram-dev@globus.org |
| 4 | jobmanager unable to set default to the directory requested | Check that RSL directory attribute refers to a directory that exists on the target system. |
| 5 | the executable does not exist | Check that the RSL executable attribute refers to an executable that exists on the target system. |
| 6 | of an unused INSUFFICIENT_FUNDS | Unimplemented feature. |
| 7 | authentication with the remote server failed | Check that the contact string contains the proper X.509 DN. |
| 8 | the user cancelled the job | Don't cancel jobs you want to complete. |
| 9 | the system cancelled the job | Check RSL requirements such as maximum time and memory are valid for the job. |
| 10 | data transfer to the server failed | Check gatekeeper and/or job manager logs to see why the process failed. |
| 11 | the stdin file does not exist | Check that the RSL stdin attribute refers to a file that exists on the target system or has a valid ftp, gsiftp, http, or https URL. |
| 12 | the connection to the server failed (check host and port) | Check that the service is running on the expected TCP/IP port.
Check that no firewall prevents contacting that TCP/IP port.
Check for runtme configuration errors. |
| 13 | the provided RSL 'maxtime' value is not an integer | Check that the RSL maxtime value evaluates to an integer. |
| 14 | the provided RSL 'count' value is not an integer | Check that the RSL count value evaluates to an integer. |
| 15 | the job manager received an invalid RSL | Check that the RSL string can be parsed by using globusrun -p RSL. |
| 16 | the job manager failed in allowing others to make contact | Check job manager log. |
| 17 | the job failed when the job manager attempted to run it | Verify that the LRM is configured properly. |
| 18 | an invalid paradyn was specified | OBSOLETE IN GRAM2 |
| 19 | the provided RSL 'jobtype' value is invalid | The RSL jobtype attribute is not indicated as supported by the LRM. Valid jobtype values are single, multiple, mpi, and condor. |
| 20 | the provided RSL 'myjob' value is invalid | OBSOLETE IN GRAM5 |
| 21 | the job manager failed to locate an internal script argument file | Check that exists and is executable.
Check that the LRM-specific perl module is located in directory and is valid. The command perl -I$GLOBUS_LOCATION/lib/perl $GLOBUS_LOCATION/lib/perl/Globus/GRAM/JobManager/LRM.pm can be used to check if there are any syntax errors in the script. |
| 22 | the job manager failed to create an internal script argument file | Check that your home directory is writable and not full. |
| 23 | the job manager detected an invalid job state | Check job manager logs. |
| 24 | the job manager detected an invalid script response | Check job manager logs. This is likely a bug in the LRM script. |
| 25 | the job manager detected an invalid script status | Check job manager logs. This is likely a bug in the LRM script. |
| 26 | the provided RSL 'jobtype' value is not supported by this job manager | Check that the RSL jobtype attribute is implemented by the LRM script. Note that some job types require configuration |
| 27 | unused ERROR_UNIMPLEMENTED | LRM does not support some feature included in the job request. |
| 28 | the job manager failed to create an internal script submission file | Check that the user's home file system is not full. Check job manager log |
| 29 | the job manager cannot find the user proxy | Check that client is delegating a proxy when authenticating with the gatekeeper.
Check that the user's home filesystem and the /tmp file system are not full. |
| 30 | the job manager failed to open the user proxy | Check that the user's home filesystem and the /tmp file system are not full. |
| 31 | the job manager failed to cancel the job as requested | Check that the user's home filesystem and the /tmp file system are not full. |
| 32 | system memory allocation failed | Check job manager log for details. |
| 33 | the interprocess job communication initialization failed | OBSOLETE IN GRAM5 |
| 34 | the interprocess job communication setup failed | OBSOLETE IN GRAM5 |
| 35 | the provided RSL 'host count' value is invalid | Check that the RSL host_count attribute evaluates to an integer. |
| 36 | one of the provided RSL parameters is unsupported | Check job manager log for details about invalid parameter. |
| 37 | the provided RSL 'queue' parameter is invalid | Check that the RSL queue attribute evaluates to a string that corresponds to an LRM-specific queue name. |
| 38 | the provided RSL 'project' parameter is invalid | Check that the RSL project attribute evaluates to a string that corresponds to an LRM-specific project name. |
| 39 | the provided RSL string includes variables that could not be identified | Check that all RSL substitutions are defined before being used in the job description. |
| 40 | the provided RSL 'environment' parameter is invalid | Check that the RSL environment attribute contains a sequence of VARIABLE VALUE pairs. |
| 41 | the provided RSL 'dryrun' parameter is invalid | Remove the RSL dryrun attribute from the job description. |
| 42 | the provided RSL is invalid (an empty string) | Include a non-empty RSL string in your job submission request. |
| 43 | the job manager failed to stage the executable | Check that the file service hosting the executable is reachable from the GRAM5 service node. Check that the executable exists on the file service node. Check that there is sufficient disk space in the user's home directory on the service node to store the executable. |
| 44 | the job manager failed to stage the stdin file | Check that the file service hosting the standard input file is reachable from the GRAM5 service node. Check that the standard input file exists on the file service node. Check that there is sufficient disk space in the user's home directory on the service node to store the standard input file. |
| 45 | the requested job manager type is invalid | OBSOLETE IN GRAM5 |
| 46 | the provided RSL 'arguments' parameter is invalid | OBSOLETE IN GRAM2 |
| 47 | the gatekeeper failed to run the job manager | Check the gatekeeper or job manager logs for more information. |
| 48 | the provided RSL could not be properly parsed | Check that the RSL string can be parsed by using globusrun -p RSL. |
| 49 | there is a version mismatch between GRAM components | Ask system administrator to upgrade GRAM service to GRAM2 or GRAM5 |
| 50 | the provided RSL 'arguments' parameter is invalid | Check that the RSL arguments attribute evaluates to a sequence of strings. |
| 51 | the provided RSL 'count' parameter is invalid | Check that the RSL count attribute evaluates to a positive integer value. |
| 52 | the provided RSL 'directory' parameter is invalid | Check that the RSL directory attribute evaluates to a string. |
| 53 | the provided RSL 'dryrun' parameter is invalid | Check that the RSL dryrun attribute evaluates to either yes or no. |
| 54 | the provided RSL 'environment' parameter is invalid | Check that the RSL environment attribute evaluates to a sequence of VARIABLE, VALUE pairs. |
| 55 | the provided RSL 'executable' parameter is invalid | Check that the RSL executable attribute evaluates to a string value. |
| 56 | the provided RSL 'host_count' parameter is invalid | Check that the RSL host_count attribute evaluates to a positive integer value. |
| 57 | the provided RSL 'jobtype' parameter is invalid | Check that the RSL jobtype attribute evaluates to one of single, multiple, mpi, or condor |
| 58 | the provided RSL 'maxtime' parameter is invalid | Check that the RSL maxtime attribute evaluates to a positive integer value. |
| 59 | the provided RSL 'myjob' parameter is invalid | OBSOLETE IN GRAM5. |
| 60 | the provided RSL 'paradyn' parameter is invalid | OBSOLETE IN GRAM2. |
| 61 | the provided RSL 'project' parameter is invalid | Check that the RSL project attribute evaluates to a string value. |
| 62 | the provided RSL 'queue' parameter is invalid | Check that the RSL queue attribute evaluates to a string value. |
| 63 | the provided RSL 'stderr' parameter is invalid | Check that the RSL stderr attribute evaluates to a string value or a sequence of DESTINATION URLs with optional CACHE_TAG string parameters. |
| 64 | the provided RSL 'stdin' parameter is invalid | Check that the RSL stdin attribute evaluates to a string value. |
| 65 | the provided RSL 'stdout' parameter is invalid | Check that the RSL stdout attribute evaluates to a string value or a sequence of DESTINATION URLs with optional CACHE_TAG string parameters. |
| 66 | the job manager failed to locate an internal script | Check job manager log for more details. |
| 67 | the job manager failed on the system call pipe() | OBSOLETE IN GRAM5 |
| 68 | the job manager failed on the system call fcntl() | OBSOLETE IN GRAM2 |
| 69 | the job manager failed to create the temporary stdout filename | OBSOLETE IN GRAM5 |
| 70 | the job manager failed to create the temporary stderr filename | OBSOLETE IN GRAM5 |
| 71 | the job manager failed on the system call fork() | OBSOLETE IN GRAM2 |
| 72 | the executable file permissions do not allow execution | Check that the RSL executable attribute refers to an executable program or script. |
| 73 | the job manager failed to open stdout | Check that the RSL stdout attribute refers to one or more valid destination files or URLs. |
| 74 | the job manager failed to open stderr | Check that the RSL stderr attribute refers to one or more valid destination files or URLs. |
| 75 | the cache file could not be opened in order to relocate the user proxy | Check that the user's home directory is writable and not full on the GRAM5 service node. |
| 76 | cannot access cache files in ~/.globus/.gass_cache, check permissions, quota, and disk space | Check that the user's home directory is writable and not full on the GRAM5 service node. |
| 77 | the job manager failed to insert the contact in the client contact list | Check job manager log |
| 78 | the contact was not found in the job manager's client contact list | Don't attempt to unregister callback contacts that are not registered |
| 79 | connecting to the job manager failed. Possible reasons: job terminated, invalid job contact, network problems, ... | Check that the job manager process is running. Check that the job manager credential has not expired. Check that the job manager contact refers to the correct TCP/IP host and port. Check that the job manager contact is not blocked by a firewall. |
| 80 | the syntax of the job contact is invalid | Check the syntax of job contact string. |
| 81 | the executable parameter in the RSL is undefined | Include the RSL executable in all job requests. |
| 82 | the job manager service is misconfigured. condor arch undefined | Add the -condor-arch to the command-line or configuration file for a job manager configured to use the condor LRM. |
| 83 | the job manager service is misconfigured. condor os undefined | Add the -condor-os to the command-line or configuration file for a job manager configured to use the condor LRM. |
| 84 | the provided RSL 'min_memory' parameter is invalid | Check that the RSL min_memory attribute evaluates to a positive integer value. |
| 85 | the provided RSL 'max_memory' parameter is invalid | Check that the RSL max_memory attribute evaluates to a positive integer value. |
| 86 | the RSL 'min_memory' value is not zero or greater | Check that the RSL min_memory attribute evaluates to a positive integer value. |
| 87 | the RSL 'max_memory' value is not zero or greater | Check that the RSL max_memory attribute evaluates to a positive integer value. |
| 88 | the creation of a HTTP message failed | Check job manager log. |
| 89 | parsing incoming HTTP message failed | Check job manager log. |
| 90 | the packing of information into a HTTP message failed | Check job manager log. |
| 91 | an incoming HTTP message did not contain the expected information | Check job manager log. |
| 92 | the job manager does not support the service that the client requested | Check that the client is talking to the correct servce |
| 93 | the gatekeeper failed to find the requested service | OBSOLETE IN GRAM2 |
| 94 | the jobmanager does not accept any new requests (shutting down) | Execute queries before the job has been cleaned up. |
| 95 | the client failed to close the listener associated with the callback URL | Call globus_gram_client_callback_disallow() with a valid the callback contact. |
| 96 | the gatekeeper contact cannot be parsed | Check the syntax of the gatekeeper contact string you are attempting to contact. |
| 97 | the job manager could not find the 'poe' command | OBSOLETE IN GRAM2 |
| 98 | the job manager could not find the 'mpirun' command | Configure the LRM script with mpirun in your path. |
| 99 | the provided RSL 'start_time' parameter is invalid | OBSOLETE IN GRAM2 |
| 100 | the provided RSL 'reservation_handle' parameter is invalid | OBSOLETE IN GRAM2 |
| 101 | the provided RSL 'max_wall_time' parameter is invalid | Check that the RSL max_wall_time attribute evaluates to a positive integer. |
| 102 | the RSL 'max_wall_time' value is not zero or greater | Check that the RSL max_wall_time attribute evaluates to a positive integer. |
| 103 | the provided RSL 'max_cpu_time' parameter is invalid | Check that the RSL max_cpu_time attribute evaluates to a positive integer. |
| 104 | the RSL 'max_cpu_time' value is not zero or greater | Check that the RSL max_cpu_time attribute evaluates to a positive integer. |
| 105 | the job manager is misconfigured, a scheduler script is missing | Check that the adminstrator has configured the LRM by running its setup script. |
| 106 | the job manager is misconfigured, a scheduler script has invalid permissions | Check that the adminstrator has installed the script.
Check that the file system containing that script allows file execution. |
| 107 | the job manager failed to signal the job | OBSOLETE IN GRAM2 |
| 108 | the job manager did not recognize/support the signal type | Check that your signal operation is using the correct signal constant. |
| 109 | the job manager failed to get the job id from the local scheduler | OBSOLETE IN GRAM2 |
| 110 | the job manager is waiting for a commit signal | Send a two-phase commit signal to the job manager to acknowledge receiving the job contact from the job manager. |
| 111 | the job manager timed out while waiting for a commit signal | Send a two-phase commit signal to the job manager to acknowledge receiving the job contact from the job manager. Increase the two-phase commit time out for your job. Check that the job manager contact TCP/IP port is reachable from your client. |
| 112 | the provided RSL 'save_state' parameter is invalid | Check that the RSL save_state attribute is set to yes or no. |
| 113 | the provided RSL 'restart' parameter is invalid | Check that the RSL restart attribute evaluates to a string containing a job contact string. |
| 114 | the provided RSL 'two_phase' parameter is invalid | Check that the RSL two_phase attribute evaluates to a positive integer. |
| 115 | the RSL 'two_phase' value is not zero or greater | Check that the RSL two_phase attribute evaluates to a positive integer. |
| 116 | the provided RSL 'stdout_position' parameter is invalid | OBSOLETE IN GRAM5 |
| 117 | the RSL 'stdout_position' value is not zero or greater | OBSOLETE IN GRAM5 |
| 118 | the provided RSL 'stderr_position' parameter is invalid | OBSOLETE IN GRAM5 |
| 119 | the RSL 'stderr_position' value is not zero or greater | OBSOLETE IN GRAM5 |
| 120 | the job manager restart attempt failed | OBSOLETE IN GRAM2 |
| 121 | the job state file doesn't exist | Check that the job contact you are trying to restart matches one that the job manager returned to you. |
| 122 | could not read the job state file | Check that the state file directory is not full. |
| 123 | could not write the job state file | Check that the state file directory is not full. |
| 124 | old job manager is still alive | Contact the returned job manager contact to manage the job you are trying to restart. |
| 125 | job manager state file TTL expired | OBSOLETE in GRAM2 |
| 126 | it is unknown if the job was submitted | Check job manager log. |
| 127 | the provided RSL 'remote_io_url' parameter is invalid | Check that the RSL remote_io_url attribute evaluates to a string value. |
| 128 | could not write the remote io url file | Check that the user's home file system on the job manager service node is writable and not full. |
| 129 | the standard output/error size is different | Send a stdio update signal to redirect the job manager output to a new URL |
| 130 | the job manager was sent a stop signal (job is still running) | Submit a restart request to monitor the job. |
| 131 | the user proxy expired (job is still running) | Generate a new proxy and then submit a restart request to monitor the job. |
| 132 | the job was not submitted by original jobmanager | OBSOLETE IN GRAM2 |
| 133 | the job manager is not waiting for that commit signal | Do not send a commit signal to a job that is not waiting for a commit signal. |
| 134 | the provided RSL scheduler specific parameter is invalid | Check the LRM-specific documentation to determine what values are legal for the RSL extensions implemented by the LRM. |
| 135 | the job manager could not stage in a file | Check that the file service hosting the file to stage is reachable from the GRAM5 service node. Check that the file to stage exists on the file service node. Check that there is sufficient disk space in the user's home directory on the service node to store the file to stage. |
| 136 | the scratch directory could not be created | Check that the directory named by the RSL scratch_dir attribute exists and is writable.
Check that the directory named by the RSL scratch_dir attribute is not full. |
| 137 | the provided 'gass_cache' parameter is invalid | Check that the RSL gass_cache attribute evaluates to a string. |
| 138 | the RSL contains attributes which are not valid for job submission | Do not use restart- or signal-only RSL attributes when submitting a job. |
| 139 | the RSL contains attributes which are not valid for stdio update | Do not use submit- or restart-only RSL attributes when sending a stdio update signal to a job. |
| 140 | the RSL contains attributes which are not valid for job restart | Do not use submit- or signal-only RSL attributes when restarting a job. |
| 141 | the provided RSL 'file_stage_in' parameter is invalid | Check that the RSL file_stage_in attribute evaluates to a sequence of SOURCE DESTINATION pairs. |
| 142 | the provided RSL 'file_stage_in_shared' parameter is invalid | Check that the RSL file_stage_in_shared attribute evaluates to a sequence of SOURCE DESTINATION pairs. |
| 143 | the provided RSL 'file_stage_out' parameter is invalid | Check that the RSL file_stage_out attribute evaluates to a sequence of SOURCE DESTINATION pairs. |
| 144 | the provided RSL 'gass_cache' parameter is invalid | Check that the RSL gass_cache attribute evaluates to a string. |
| 145 | the provided RSL 'file_cleanup' parameter is invalid | Check that the RSL file_clean_up attribute evaluates to a sequence of strings. |
| 146 | the provided RSL 'scratch_dir' parameter is invalid | Check that the RSL scratch_dir attribute evaluates to a string. |
| 147 | the provided scheduler-specific RSL parameter is invalid | Check the LRM-specific documentation to determine what values are legal for the RSL extensions implemented by the LRM. |
| 148 | a required RSL attribute was not defined in the RSL spec | Check that the RSL executable attribute is present in your job request RSL.
Check that the RSL restart attributes is present in your restart RSL. |
| 149 | the gass_cache attribute points to an invalid cache directory | Check that the RSL gass_cache attributes evaluates to a directory that exists or can be created.
Check that the user's home file system is writable and not full. |
| 150 | the provided RSL 'save_state' parameter has an invalid value | Check that the RSL save_state attribute has a value of yes or no. |
| 151 | the job manager could not open the RSL attribute validation file | Check that is present and readable on the job manager service node.
Check that is readable on the job manager service node if present. |
| 152 | the job manager could not read the RSL attribute validation file | Check that is valid.
Check that is valid if present. |
| 153 | the provided RSL 'proxy_timeout' is invalid | Check that RSL proxy_timeout attribute evaluates to a positive integer. |
| 154 | the RSL 'proxy_timeout' value is not greater than zero | Check that RSL proxy_timeout attribute evaluates to a positive integer. |
| 155 | the job manager could not stage out a file | Check that the source file being staged exists on the job manager service node. Check that the directory of the destination file being staged exists on the file service node. Check that the directory of the destination file being staged is writable by the user. Check that the destination file service is reachable by the job manager service node. |
| 156 | the job contact string does not match any which the job manager is handling | Check that the job contact string matches one returned from a job request. |
| 157 | proxy delegation failed | Check that the job manager service node trusts the signer of your credential. Check that you trust the signer of the job manager service node's credential. |
| 158 | the job manager could not lock the state lock file | Check that the file system holding the job state directory supports POSIX advisory locking. Check that the job state directory is writable by the user on the service node. Check that the job state directory is not full. |
| 159 | an invalid globus_io_clientattr_t was used. | Check that you have initialized the globus_io_clientattr_t attribute prior to using it with the GRAM client API. |
| 160 | an null parameter was passed to the gram library | Check that you are passing legal values to all GRAM API calls. |
| 161 | the job manager is still streaming output | OBSOLETE IN GRAM5 |
| 162 | the authorization system denied the request | Check with your GRAM system administrator to allow a particular certificate to be authorized. |
| 163 | the authorization system reported a failure | Check with your system administrator to verify that the authorization system is configured properly. |
| 164 | the authorization system denied the request - invalid job id | Check with your system administrator to verify that the authorization system is configured properly. Use a credential which is authorized to interact with a particular GRAM job. |
| 165 | the authorization system denied the request - not authorized to run the specified executable | Check with your system administrator to verify that the authorization system is configured properly. Use a credential which is authorized to interact with a particular GRAM job. |
| 166 | the provided RSL 'user_name' parameter is invalid. | Check that the RSL user_name attribute evaluates to a string. |
| 167 | the job is not running in the account named by the 'user_name' parameter. | Ask with the GRAM system administrator to add an authorization entry to allow your credential to run jobs as the specified user account. |
Table of Contents
The following usage statistics are sent by default in a UDP packet (in addition to the GRAM component code, packet version, timestamp, and source IP address) at the end of each job.
- Job Manager Session ID
- dryrun used
- RSL Host Count
- Timestamp when job hit
GLOBUS_GRAM_PROTOCOL_JOB_STATE_UNSUBMITTED - Timestamp when job hit
GLOBUS_GRAM_PROTOCOL_JOB_STATE_FILE_STAGE_IN - Timestamp when job hit
GLOBUS_GRAM_PROTOCOL_JOB_STATE_PENDING - Timestamp when job hit
GLOBUS_GRAM_PROTOCOL_JOB_STATE_ACTIVE - Timestamp when job hit
GLOBUS_GRAM_PROTOCOL_JOB_STATE_FAILED - Timestamp when job hit
GLOBUS_GRAM_PROTOCOL_JOB_STATE_FILE_STAGE_OUT - Timestamp when job hit
GLOBUS_GRAM_PROTOCOL_JOB_STATE_DONE - Job Failure Code
- Number of times status is called
- Number of times register is called
- Number of times signal is called
- Number of times refresh is called
- Number of files named in file_clean_up RSL
- Number of files being staged in (including executable, stdin) from http servers
- Number of files being staged in (including executable, stdin) from https servers
- Number of files being staged in (including executable, stdin) from ftp servers
- Number of files being staged in (including executable, stdin) from gsiftp servers
- Number of files being staged into the GASS cache from http servers
- Number of files being staged into the GASS cache from https servers
- Number of files being staged into the GASS cache from ftp servers
- Number of files being staged into the GASS cache from gsiftp servers
- Number of files being staged out (including stdout and stderr) to http servers
- Number of files being staged out (including stdout and stderr) to https servers
- Number of files being staged out (including stdout and stderr) to ftp servers
- Number of files being staged out (including stdout and stderr) to gsiftp servers
- Bitmask of used RSL attributes (values are 2^id from the gram5_rsl_attributes table)
- Number of times unregister is called
- Value of the
countRSL attribute - Comma-separated list of string names of other RSL attributes not in the set defined in
globus-gram-job-manager.rvf - Job type string
- Number of times the job was restarted
- Total number of state callbacks sent to all clients for this job
The following information can be sent as well in a job status packet but it is not sent unless explicitly enabled by the system administrator:
- Value of the executable RSL attribute
- Value of the arguments RSL attribute
- IP adddress and port of the client that submitted the job
- User DN of the client that submitted the job
In addition to job-related status, the job manager sends information periodically about its execution status. The following information is sent by default in a UDP packet (in addition to the GRAM component code, packet version, timestamp, and source IP address) at job manager start and every 1 hour during the job manager lifetime:
- Job Manager Start Time
- Job Manager Session ID
- Job Manager Status Time
- Job Manager Version
- LRM
- Poll used
- Audit used
- Number of restarted jobs
- Total number of jobs
- Total number of failed jobs
- Total number of canceled jobs
- Total number of completed jobs
- Total number of dry-run jobs
- Peak number of concurrently managed jobs
- Number of jobs currently being managed
- Number of jobs currently in the UNSUBMITTED state
- Number of jobs currently in the STAGE_IN state
- Number of jobs currently in the PENDING state
- Number of jobs currently in the ACTIVE state
- Number of jobs currently in the STAGE_OUT state
- Number of jobs currently in the FAILED state
- Number of jobs currently in the DONE state
Also, please see our policy statement on the collection of usage statistics.
C
- Condor
A job scheduler mechanism supported by GRAM. See http://www.cs.wisc.edu/condor/ for more information.
L
- LSF
A job scheduler mechanism supported by GRAM.
For more information, see http://www.platform.com/Products/Platform.LSF.Family/Platform.LSF/.
S
- Scheduler Event Generator (SEG)
The Scheduler Event Generator (SEG) is a program which uses scheduler-specific monitoring modules to generate job state change events. Depending on scheduler-specific requirements, the SEG may need to run with privileges to enable it to obtain scheduler event notifications. As such, one SEG runs per scheduler resource. For example, on a host which provides access to both PBS and fork jobs, two SEGs, running at (potentially) different privilege levels will be running. One SEG instance exists for any particular scheduled resource instance (one for all homogeneous PBS queues, one for all fork jobs, etc). The SEG is implemented in an executable called the globus-scheduler-event-generator, located in the Globus Toolkit's libexec directory.
A
- audit logging, Audit Logging
C
- configuration interface, Configuring
- default local resource manager, Defining a default local resource manager
- disabling local resource manager adapter, Disabling an already installed local resource manager adapter
- lrm-specific, LRM-Specific Scheduler Event Generator configuration files
- non-default, Non-default Configuration
- authorization, Authorization
- typical, Typical Configuration
- LRM adapters, Configuring LRM Adapters
D
- deploying, Deploying
E
- errors, Errors
I
- installing
- prerequisites
- local scheduler, Local Resource Manager
- lrm adapter, LRM Adapters
P
- performance guide, Scalability and Performance Recommendations
- server-side, Server-side Recommendations
S
- SEG, Running the SEG
T
- troubleshooting, Troubleshooting
- check documentation, Troubleshooting tips
- errors, Troubleshooting
- gram log, Troubleshooting tips
- mailing lists, Troubleshooting tips
![[Important]](/docbook-images/important.gif)
![[Note]](/docbook-images/note.gif)