Table of Contents
The Grid Resource Allocation and Management (GRAM5) component is used to locate, submit, monitor, and cancel jobs on Grid computing resources. GRAM5 is not a job scheduler, but rather a set of services and clients for communicating with a range of different batch/cluster job schedulers using a common protocol. GRAM5 is meant to address a range of jobs where reliable operation, stateful monitoring, credential management, and file staging are important.
New Features new since 5.0.1
- A new tool globus-gram-streamer implements stdio streaming similar to gram2, for use with the grid-monitor.sh script from Condor. As a result, a Condor-G client which does not know about GRAM5 features will be able to submit many jobs to a GRAM5 server.
Other Standard Supported Features
- Remote job execution and management
- Uniform and flexible interface to local resource managers
- File staging before and after job execution
- File and directory clean up after job termination
- Service auditing for each submitted
Removed Features
- Condor SEG module is no longer included. Its functionality has been moved into the core of the job manager program.
- New RSL attribute
save_job_description. If set toyes, the job manager will write a file in the HOME directory of the user containing the perl representation of the job. This can be used to debug LRM interface problems. - Added command-line help for the globus-job-manager-script.pl command. To simplify debugging LRM interface problems.
- A new tool globus-gram-streamer implements stdio streaming similar to gram2, for use with the grid-monitor.sh script from Condor. As a result, a Condor-G client which does not know about GRAM5 features will be able to submit many jobs to a GRAM5 server.
- Eliminated the need for a global Condor log file and SEG module to parse it. In this version, the Condor logs are per-job and removed automatically by the job manager when the job is cleaned up.
- Resolved the TeraGrid SGE issues on Ranger and PBS issues on Queen Bee.
- GRAM-130: Individual Condor Logs per Job
- GRAM-136: Error message not precise when disk quota is exceeded
- GRAM-146: GRAM5 Usage stats ignores GLOBUS_USAGE_TARGETS environment
- GRAM-155: Leak in file_clean_up
- GRAM-156: job-failure-code not provided by jobmanager in response to status query
- GRAM-157: RSL substitutions not updated on a restart request
- GRAM-158: Remove references to setup-seg-job-manager.pl from documentation
- GRAM-160: double free in job manager
- GRAM-161: Don't count two phase commit time until state is sent
- GRAM-164: Invalid RSL value error doesn't indicate what values are valid
- GRAM-167: SGE LRM doesn't interpret path to stdout and stderr relative to directory
- GRAM-168: zombie job manager processes on Ranger SGE
- GRAM-169: Error messages in globus-job-get-output
- GRAM-170: globus-job-submit fails
- GRAM-171: Simplify debugging of script errors
- GRAM-172: Job Manager doesn't exit when proxy expires if jobs are present
- GRAM-173: SGE LRM doesn't set exit code for multiple jobs
The following problems and limitations are known to exist for GRAM5 at the time of the 5.0.2 release:
- GRAM-2: Investigate how to setup GRAM5 services in a HA setup
- GRAM-4: Add support for a "managed fork" service
- GRAM-5: Add gram-level prologue and epilogue script execution for mpi jobs
- GRAM-12: Gatekeeper's syslog output cannot be controlled
- GRAM-15: transition from httpg to https
- GRAM-22: client connections can't be timed out
- GRAM-23: Improved error codes and error reporting for users
- GRAM-24: Debug/verbose flags for globusrun, globus-job-run
- GRAM-51: configurable control of number of perl scripts that can run simultaneously
- GRAM-53: Generalize log path configuration
- GRAM-79: Add support for OSG's "NFS Lite" concept
- GRAM-99: Add a high-level diagram for the approach doc
- GRAM-104: globus-job-manager-event-generator loads all historical events the first time run
- GRAM-105: Held Condor jobs should be reported as SUSPENDED
- GRAM-110: softenv extensions for GRAM5
- GRAM-119: improve the GRAM LRM adapter doc
- GRAM-122: tracking gram client software
- GRAM-135: Improve developer doc for a reliable client
- GRAM-138: GRAM5 job manager uses a lot of memory when SEG is pointed to incorrect log path
- GRAM-139: SEG may deadlock with threads
- GRAM-149: GRAM5 Unix domain socket misbehaves on Snow Leopard
- GRAM-154: GASS Cache doesn't check for updates
- GRAM-159: GRAM5 Migration guide is outdated
- GRAM-163: improve error output for globusrun
- Bug 5621: gram2 credential refresh problems in 4.0.5
- Bug 1934: Gatekeeper's syslog output cannot be controlled
- Bug 2739: Gatekeeper AuthZ/Gridmap Callout result logging
- Bug 2741: catching SIGSEGV if dynamic loading of authorization modules fails
- Bug 4199: Patch pre-WS GRAM to use individual condor logs for jobs
- Bug 3795: jobmanager perl modules issues
- Bug 4235: globus-job-manager doesn't exit if the job fails.
- Bug 4730: MPI Jobs using Globus LSF in HP XC Cluster....
- Bug 4747: Need evaluation of patch to JobManager.pm
- Bug 4779: gram GT2 log files: timestamps are not ISO 8601 compatible
- Bug 5143: DONE state never reported for Condor jobs when using Condor-G grid monitor
- Bug 5429: stdin is lost when jobtype=multiple with jobmanager-lsf
- Bug 5554: GRAM2 4.0.5 setup-globus-job-manager-fork.pl silent failure
- Bug 5556: Audit directory setup instructions are insecure
- Bug 5775: gram status of old jobs incorrect on some lsf systems
- Bug 6184: pbs.pm jobmanager fails jobs on qstat failure
- Bug 6337: Cannot configure globus to use different certificate path than default
- Bug 6703: PBS scheduler adapter assumes that Globus is installed in the same location on the headnode of a cluster and on the work nodes.
- Bug 6768: Held Condor jobs should be reported as SUSPENDED by GRAM
- Bug 6815: Support standard install locations for globus-gram-protocol
- Bug 6819: Missing metatdata in globus-scheduler-event-generator
- Bug 6820: Support standard install locations for globus-gatekeeper
- Bug 6821: Support standard install locations for globus-gatekeeper-setup
- Bug 6822: Support standard install locations for globus-gram-job-manager-scripts
- Bug 6823: Support standard install locations for globus-gram-job-manager
- Bug 6824: Support standard install locations for globus-gram-job-manager-setup
- Bug 6825: Remove hardcoded paths in globus-gram-job-manager-setup-fork
- Bug 6826: Remove hardcoded paths in globus-gram-job-manager-setup-condor
- Bug 6840: The PBS job manager doesn't handle large environments well
- Bug 6855: Undefined variable in Makefiles
- Bug 6862: PBS job manager fails if job history is enabled
- Bug 6927: A Loadleveler LRM for GRAM5 should be very welcome
- Bug 720: allow gram client to detect the version of a gram server
- Bug 851: Add "cleanup" RSL attribute for cleaning up a job submission
- Bug 5536: Missing dependency in package globus_gram_job_manager_auditing
- Bug 5537: Missing dependency in package globus_gram_job_manager_auditing
- Bug 3373: globus removes the temporary job directory before pbs writes back into it
- Bug 5200: GRAM (pre-webservices) from OSG 0.6.0 (VDT 1.6.1) has bad syslog format
- Bug 5207: GRAM SoftEnv extension bug
- Bug 5250: Does not support mpi jobtype of RSL script
- Bug 5272: Invalid parsing of RSL file
Protocol changes in GRAM since GT5.0.1 series:
- The GRAM5 service uses a superset of the GRAM2 protocol for communciation between the client and service. The extensions supported in GRAM5 are implemented in such a way that they are ignored by GRAM2 services or clients. These extensions provide improved error messages and version detection.
- GRAM5 does not support task coallocation using DUROC and its related protocols. Jobs submitted using DUROC directives will fail.
- GRAM5 does not support file streaming. The standard output and standard error streams are sent after the job completes instead of during execution.
See GRAM5 for more information about this component.
J
- job scheduler
See the term scheduler.