GT 4.2.0 Release Notes: GRAM4


1. Component Overview

The Web Services Grid Resource Allocation and Management (GRAM4) component comprises a set of WSRF-compliant Web services to locate, submit, monitor, and cancel jobs on Grid computing resources. GRAM4 is not a job scheduler, but rather a set of services and clients for communicating with a range of different batch/cluster job schedulers using a common protocol. GRAM4 is meant to address a range of jobs where reliable operation, stateful monitoring, credential management, and file staging are important.

[Note]Note

The GRAM server is typically deployed in conjunction with the Delegation and RFT services to address data staging, delegation of proxy credentials, and computation monitoring and management in an integrated manner.

2. Feature summary

New Features new since 4.0.x

  • New terminate method in the client-side GramJob API
  • Improved job lifetime management for users and admins
  • Added configuration for "default" Local Resource Managers

Other Standard Supported Features

  • Remote job execution and management
  • Uniform and flexible interface to batch scheduling systems
  • File staging before and after job execution
  • File / directory clean up after job execution (after file stage out)
  • Service auditing for each submitted

Deprecated Features

  • With the addition of the new terminate method in the GramJob API, the destroy method is no longer necessary. For backward compatibility, the destroy method was left in the GramJob API, but it simply calls the terminate method. During the 4.2.x series, clients using the destroy method should change to instead use terminate. In GT 4.4, the plan is to remove the destroy method.

3. Summary of Changes in GRAM4

For a summary of the changes, please see the Migration Guide from 4.0 GRAM

4. Bug Fixes

  • Bug 2049: Batch providers need a namespace
  • Bug 2243: globus_scheduler_event_generator should use globus_extension
  • Bug 2306: Unique job id as RSL substitution variable
  • Bug 2527: Add a failure case for non-existant queue to scheduler test suite
  • Bug 2996: ws-gram does not handle Windows paths in RSL's
  • Bug 3005: Extra slashes stops PBS from mapping output files
  • Bug 3067: globusrun-ws -dumprsl argument
  • Bug 3204: InternalStateEnumeration schema cleanup
  • Bug 3212: Rename RSLHelper and it's methods
  • Bug 3241: Specifying multiple (node type
  • Bug 3372: -self option on globusrun-ws not passed to subjobs
  • Bug 3420: CAMPAIGN: WS-GRAM Command-line Client Tools
  • Bug 3534: globusrun-ws needs a -release option
  • Bug 3540: Throughput Tester hanging
  • Bug 3571: ant not found during install
  • Bug 3629: new tests to improve code coverage
  • Bug 3740: Recovery INFO lines with full EPRs instead of resource IDs
  • Bug 3766: CAMPAIGN: WS-GRAM Job Description Extensions Handling Support
  • Bug 3773: ws gram job description extensions for condor
  • Bug 3775: CAMPAIGN: Investigate job info from schedulers
  • Bug 3778: WS-GRAM dependant on GT2 job manager
  • Bug 3801: condor commands need CONDOR_CONFIG
  • Bug 3842: stdoutUrl and stderrUrl resource properties being set repeatedly
  • Bug 3905: NPE in state transition
  • Bug 3909: IllegalMonitorStateException
  • Bug 3967: CAMPAIGN: WS-GRAM TeraGrid SoftEnv Extensions
  • Bug 4154: a few gram scheduler tests from the trunk are broken
  • Bug 4211: -Sf and -Tf options not working for multi jobs
  • Bug 4312: WS GRAM resourceAllocationGroup bug
  • Bug 4518: SEG issues DONE for FAILED jobs
  • Bug 4664: performance improvements for large run job submissions using condor-g
  • Bug 4687: Notifications from RFT to WS-GRAM
  • Bug 4729: Add local_job_id RP
  • Bug 4748: Adding a submit-site chosen job name to the remote job manager
  • Bug 4749: client receives no state notifications if RSL gram:count is big
  • Bug 4785: globus-scheduler-provider-fork return wrong information - Solved
  • Bug 5016: globusrun-ws fails due to formatting issues in factoryEndpoint element
  • Bug 5247: job cancellation can lead to container hanging
  • Bug 5512: Review synchronization in MJFS.createManagedJob()
  • Bug 5513: Persisting a StagingFaultType with local invocations to RFT enabled fails
  • Bug 5598: LocalInvocationHelper: wrong way of getting container configuration values
  • Bug 5671: Problem with notification message type in ws-rendezvous
  • Bug 5672: Problem with notification message type in ws-gram
  • Bug 5682: Change name of topic in ws-gram currently known as RP_STATE
  • Bug 5683: Error in subject creation in LocalInvocationHelper
  • Bug 5695: problem with local user id and streaming in globusrun-ws
  • Bug 5744: Add a default factory
  • Bug 5767: Removing notification wrapper types from globusrun-ws code
  • Bug 5799: adjust globusrun-ws to changes in release method in gram in GT4.2
  • Bug 5800: adjust globusrun-ws to changes in job termination in gram in GT4.2
  • Bug 5801: adjust globusrun-ws to not destroy subscription resources in GT4.2
  • Bug 5802: adjust default termination time in globusrun-ws in GT4.2
  • Bug 5803: Change of lifetime policy in gram4's job resources.
  • Bug 5804: Destroy subscription resources of a job at the end of processing
  • Bug 5806: Change of GramJob.setDuration()
  • Bug 5830: Don't fetch a delegated job credential during job creation
  • Bug 5832: Remove InternalStateEnumeration from schema in Gram 4.2
  • Bug 5833: Setting termination time on MultiJob resources
  • Bug 5834: QName change of the state change topic of job resources
  • Bug 5835: New operation providers for Gram4
  • Bug 5836: Change in persistence behavior of Gram4 job resources
  • Bug 5837: Change processing the restart state of resources in a container start
  • Bug 5838: Share a threadpool for SEG event processing
  • Bug 5839: Finish currently running state processing tasks in an orderly JVM shutdown
  • Bug 5840: Adding priorities to internal job states regarding processing order
  • Bug 5841: StateMachine is unreadable and hardly maintainable
  • Bug 5842: Adapt tests of gram4 to changes in 4.2
  • Bug 5852: Make use of default factory resource in job submission with globusrun-ws
  • Bug 5858: Fix host authorization in GRAM4
  • Bug 5927: Cloning EPR's from stubs before storing them
  • Bug 5929: Log an info statement when factory returns EPR of existing job
  • Bug 5931: Make SchedulerEventGenerator.run() (Java) more robust
  • Bug 5936: Make job creation slimmer
  • Bug 5957: one rendezvous test fails
  • Bug 5958: Exceptions ws-gram recovery
  • Bug 5959: globusrun-ws error in job with invalid staging request
  • Bug 5986: New fault type for job resource expiration
  • Bug 5995: globusrun-ws problems with extensions element in job description
  • Bug 6004: adapt globusrun-ws to handle new faults and change of RP fault to array
  • Bug 6027: delegated user proxy job file is not being removed
  • Bug 6028: globusrun-ws fails in termination in special situation
  • Bug 6071: info about how to disable an already installed LRM adapter is missing
  • Bug 6116: Limited delegation in tests for MultiJobs not sufficient
  • Bug 6117: Error in termination of jobs in internal state None
  • Bug 6131: 4.2 rc2 GRAM4 server doesn't send usage packets
  • Bug 6134: GT4.2 RC2: error reporting in globusrun-ws
  • Bug 6138: Interrupt staging if jobs are terminated and staging is active
  • Bug 6139: Deadlock situation in job termination
  • Bug 6140: taging response handlers destroy transfers if they are still active
  • Bug 6141: Getting the service hosts in DelegatedCredentialDestroyHelper is ambiguous

5. Known Problems

The following problems and limitations are known to exist for GRAM4 at the time of the 4.2.0 release:

5.1. Limitations

  • [list limitations]

5.2. Outstanding bugs

  • Bug 2250:delegation required resource property
  • Bug 2578:reliable state change notification
  • Bug 2579:reliable state change notification
  • Bug 2623:service summary/diagnostics
  • Bug 2624:Multiple job hold states and parameterized release operation
  • Bug 2629:performance timings in globusrun-ws
  • Bug 2734:non-shared FS scheduler file list
  • Bug 3088:Default Job Environment
  • Bug 3242:Software selection thru WS GRAM RSL
  • Bug 3569:Selectable jobType default per factory
  • Bug 3575:SEG dependent on GLOBUS_LOCATION env var
  • Bug 3948:Service must release all of its resources on deactivation
  • Bug 4009:Use pbsdsh if available
  • Bug 4181:Allow File Staging To/From globusrun-ws application without an external server
  • Bug 4791:Job state history RP
  • Bug 5017:gram[24] tests that need to be updated
  • Bug 5402:invalid password error messages
  • Bug 5433:public interface doc lists non-public/internal APIs
  • Bug 5484:Review and Update 4.2 GRAM doc
  • Bug 5510:Reduction of notification consumers in WS-GRAM RFT interaction
  • Bug 5745:Allow users to specify JDD in a different order than RSL schema
  • Bug 5805:Change threadpools from Gram custom implementation to java.util.concurrent
  • Bug 5853:create automated tests for globus-job-*-ws programs
  • Bug 6019:CEDPS: Add executable path to log statement
  • Bug 6024:GRAM Audit v2
  • Bug 6102:GRAM4 throughput tester for 4.2
  • Bug 6109:Add ability for setuid programs to be plugged into GRAM4
  • Bug 6110:Add comments to GRAM4 service code for audit v2
  • Bug 6172:Bad error message for file not found

6. Technology dependencies

GRAM depends on the following GT components:

Other scheduler adapters available for GT 4.2.0 release:

7. Tested platforms

Tested platforms for GRAM4:

  • Linux

    • Fedora Core 1 i686
    • Fedora Core 3 i686
    • Fedora Core 3 yup xeon
    • RedHat 7.3 i686
    • RedHat 9 x86
    • Debian Sarge x86
    • Debian 3.1 i686

Tested containers for GRAM4:

  • Java WS Core container

8. Backward compatibility summary

Protocol changes since GRAM4 in the GT4.0 series:

  • The Java WS Core Framework has been updated from the draft versions of the WSRF/WSN and WS Addressing specifications to the final versions WSRF 1.2, WSN 1.3 and WS Addressing 1.0. There is no backward compatibility between this version and any previous versions.

9. Associated Standards

See the Java WS Core related standards

10. For More Information

See GRAM4 for more information about this component.

Glossary

J

job scheduler

See the term scheduler.