Software Links
Getting Started
- Doc Structure
- A Globus Primer
- Globus Is Modular!
- Quickstart
- Installing GT
- Platform Notes
- GT Developer's Guide
- GT User's Guide (coming soon)
- Migrating from GT2
- Migrating from GT3
Reference
- Best Practices
- Coding Guidelines
- API docs
- Public Interfaces (coming soon)
- Resource Properties
- Samples
- Glossary
- Performance Studies (coming soon)
Manuals
Common Runtime
Security
- Non-WS (General) Security
- WS Java Security
- Message-level
- Authz Framework
- CAS
- Delegation Service
- MyProxy
- GSI-OpenSSH
- SimpleCA
- SGAS
Data Mgt
MDS4
Execution Mgt
Table of Contents
Problem: If RFT is not configured properly to talk to a PostgreSQL database, you will see this message displayed on the console when you start the container:
"Error creating RFT Home: Failed to connect to database ... Until this is corrected all RFT request will fail and all GRAM jobs that require staging will fail".
Solution: The usual cause is that Postmaster is not accepting TCP connections, which means that you must restart Postmaster with the -i option (see Configuring RFT).
Problem: Make RFT print more verbose error messages
Solution: Edit $GLOBUS_LOCATION/container-log4j.properties
and add the following line to it:
log4j.category.org.globus.transfer=DEBUG. For more verbosity add
log4j.category.org.globus.ftp=DEBUG, which will print out Gridftp
messages too.
RFT uses PostgreSQL to check-point transfer state in the form of restart markers and recover from transient transfer failures, using retry mechanism with exponential backoff, during a transfer. RFT has been tested to recover from source and/or destination server crashes during a transfer, network failures, container failures (when the machine running the container goes down), file system failures, etc. RFT Resource is implemented as a PersistentResource, so ReliableFileTransferHome gets initialized every time a container gets restarted. Please find a more detailed description of fault-tolerance and recovery in RFT below:
- Source and/or destination GridFTP failures: In this case RFT retries the transfer for a configurable number of maximum attempts with exponential backoff for each retry (the backoff time period is configurable also). If a failure happens in the midst of a transfer, RFT uses the last restart marker that is stored in the database for that transfer and uses it to resume the transfer from the point where it failed, instead of restarting the whole file. This failure is treated as a container-wide backoff for the server in question. What this means is that all other transfers going to/from that server, across all the requests in a container, will be backed off and retried. This is done in order to prevent further failures of the transfers by using knowledge available in the database.
- Network failures: Sometimes this happens due to heavy load on a network or for any other reason packets are lost or connections get timed out. This failure is considered a transient failure and RFT retries the transfer with exponential backoff for that particular transfer (and not the whole container, as with the source and/or destination GridFTP failures).
- Container failures: These type of failures occur when the machine running the container goes down or if the container is restarted with active transfers. When the container is restarted, it restarts ReliableTransferHome, which looks at the database for any active RFT resources and restarts them.