Table of Contents
Starting with version 4.0.5, GT includes the GridWay Metascheduler, which enables large-scale, reliable and efficient sharing of computing resources (clusters , computing farms, servers, supercomputers...), managed by different LRM (Local Resource Management) systems, such as PBS, SGE, LSF, Condor..., within a single organization (enterprise grid) or scattered across several administrative domains (partner or supply-chain grid).
There exist a number of commercial and open source workload management and scheduling systems available today, each one suitable for different underlying computer infrastructures and execution profiles. GridWay stands out from other metascheduling systems because it has been specifically designed to work on top of Globus services, offering the highest functionality, quality of service and reliability on this kind of infrastructure, namely:
For project and infrastructure directors: GridWay is an open-source community project, adhering to Globus philosophy and guidelines for collaborative development.
For system integrators: GridWay is highly modular, allowing adaptation to different grid infrastructures, and supports several OGF standards.
For system managers: GridWay provides a scheduling framework similar to that found on local LRM systems, supporting resource accounting and the definition of state-of-the-art scheduling policies.
For application developers: GridWay implements the OGF standard DRMAA API (C and JAVA bindings), assuring compatibility between applications with LRM systems that implement the standard, such as SGE, Condor, Torque...etc.
For end users: GridWay provides an LRM-like CLI for submitting, monitoring, synchronizing and controlling jobs, that could be described using the OGF standard JSDL.
GridWay gives end users, application developers and managers of Globus infrastructures a scheduling functionality similar to that found on local DRM systems.
- Advanced Scheduling Capabilities
GridWay implements several state-of-the-art Grid-aware scheduling policies, comprising job prioritization policies (fixed priority, urgency, share, deadline and waiting-time) and resource prioritization policies (fixed priority, usage, failure and rank).
These policies are combined with:
Adaptive Scheduling, to periodically adapt the schedule considering applications' demands and Grid resource characteristics.
Adaptive Execution to migrate running applications in terms of resource availability, capacity or cost, and new application requirements or preferences.
- Transparent Grid Access
GridWay interfaces infrastructures with different middleware stacks. With GridWay, users can access heterogeneous resources in a transparent way. For example, it can access resources configured with both GRAM pre web services and GRAM web services. It also permits the use of different grids with different software stacks, for example, Teragrid with Globus and EGEE with gLite.
- Flexible Deployment Capabilities
GridWay supports multiple-user operation mode, and does not require additional middleware installation (apart from standard Globus services). Globus installation is not required in each end-user system.
GridWay allows different Grid deployment strategies, like Enterprise Grids, Partner Grids or Utility Grids.
- Different Application Profiles
GridWay executes different Grid application profiles:
Array (Bulk) jobs, for parameter sweep applications
DAG Workflows
Single-site MPI applications
- Fault Detection and Recovery
GridWay is able to detect several problems that can occur when executing a remote job. It also implements mechanisms that make the execution more reliable. It can detect a remote system crash, a job failure (via the job exit code) or even a network disconnection (using the polling mechanism) and migrate the problematic job to another resource.
GridWay also performs periodic saves of its state in order to recover from local failure.
- Reporting and Accounting
GridWay provides detailed statistics of Grid usage. In this way, the Grid administrator can properly plan usage policies and forecast workload. In addition, these statistics can be used by the scheduler to predict (per user) Grid resource response time.
- Standard Compliance
GridWay is an open-source project, flexible and completely based on standards to leverage its usability and interoperability. For example, users can describe their jobs using JSDL. Similarly, programmers can build grid enabled applications using the DRMAA standard.
- User Interface
GridWay provides end-users with a familiar environment similar to that found on classical LRM systems. So GridWay CLI eases the adoption of Grid technologies.
GridWay uses different Globus services to perform the tasks of information gathering, job execution and data transfer.
GridWay depends on the following GT components for job execution:
- GRAM: GridWay can interface with both GRAM 2 (pre web serrvices) and GRAM 4 (web services)
GridWay depends on the following GT components for data staging:
- RFT
- GridFTP
GridWay depends on the following GT component for information gathering:
- MDS
GridWay relies on the Globus basic security infrastructure for authentication and authorization. On top of that, GridWay can use other Globus services and components to complement this infrastructure:
- Delegation Service
- MyProxy
Tested platforms for GridWay:
GridWay builds successfully for the following platforms:
- Linux
- Tru64
- Mac OS X
- Solaris
- Aix
In addition, GridWay has been tested with the major Grid infrastructures. Click the following links to find more information on how to use GridWay with EGEE,TeraGrid and OSG.
GridWay 5.2.1 is a new component in GT starting with this release. The following information regards compatibility with its previous version, 5.2:
API changes since GridWay 5.2:
- None
CLI changes since GridWay 5.2:
- None
Configuration interface changes since GridWay 5.2:
- Arguments can be specified for the EM
and TM drivers. Beware that
gwd.confconfiguration files from previous versions of GridWay will not work with version 5.2.1
Since GridWay 5.2, development activities have been focused on easing the integration of GridWay with the major Grid infrastructures: EGEE, TeraGrid and OSG. As a result, the flexibility of GridWay has been considerably improved. In addition, an important effort has been made to improve the reliability of GridWay's core.
Also, GridWay 5.2.1 is the first release shipped with the Globus Toolkit. Therefore, some modifications have been introduced to build and install GridWay in a GT tree: the directory layout has been slighly changed, and support to build GridWay with GPT has been added.
New to GridWay 5.2.1:
- Integration with major Grid Infrastructures
GridWay 5.2.1 can be easily integrated with all the major Grid infrastructures. Its functionality has been extended to operate different Grid deployments, which includes support for different execution/transfer schemes, information models and service configurations.
- Improved Reliability
Previous GridWay releases do not handle MAD crashes. GridWay 5.2.1 will reload a MAD process whenever a MAD is killed or crashed.
- New Information model
The dynamic information of a host gathered from the Grid Information server can be mixed with custom variables defined by the GridWay administrator. This functionality is very useful when you need to extend the information scheme but have no access to the Grid server. For example, you can use this new feature to add software or license attributes to grid resources that can subsequently be used for resource requirement expressions.
- Configuration Interface for MADs
Middleware Access Drivers need some environment variables to work properly. Usually, sudo must be configured to preserve these variables (e.g.
GLOBUS_LOCATIONorGW_LOCATION). GridWay 5.2.1 has a new configuration interface to ease this configuration. In this way, global (and per user) environment variables can be defined for MAD execution. This new feature allows GridWay to work with delegated credentials when configured with a GRAM interface.- Flexible definition of file transfer servers
A GridFTP server, different from the GRAM server, can be defined for file staging. The storage server must be defined (
SE_HOSTNAME, attribute) for those resources operated on in this way. This can be done either by modifying the IM's MAD or using the new static-dynamic information model.- Support for JSDL HPC profile
Now job templates can be also defined using the OGF standard JSDL HPC profile.
- Bug 5090: Autotools broken when --disable-jsdl and --disable-ws flags enabled
- Bug 5113: GridWay does not execute custom monitor scripts
- Bug 5333: An alternative wrapper can not be specified with relative paths
- Bug 5231: Possible errors for the deadline policy
- Bug 5114: Host information is not updated when a parse error occurs
- Bug 5202: Buffer overflow in gwhost
- Bug 5302: Scheduler dies when a user is reload
- Bug 4922: Middleware drivers are not reloaded when they crash
- Bug 5251: WS IM_MAD launches one java process per host for monitoring
- Bug 5320: Control the number of active IM MADs
The following is a list of all of the bugs known at the time for the 4.0.5 release:
- Bug 5308: gwd doesn't recover AIDs and TIDs
Click here for more information about this component.