Globus Toolkit RLS FAQ

Understanding RLS

What is the RLS?

The replica location service (RLS) maintains and provides access to mapping information from logical names for data items to target names. These target names may represent physical locations of data items, or an entry in the RLS may map to another level of logical naming for the data item.

The RLS is not intended to provide data storage, data transfer or other replication services. Additional replication services (outside of the scope of the RLS) are required to physically replicate files, to provide consistency guarantees between entries relating to physical files, and to maintain metadata pertaining to logical or target names.

The RLS is a component of the Globus Toolkit and runs as a server on most Linux and UNIX platforms.

What is the Local Replica Catalog (LRC)?

The LRC is a service of the RLS that maintains mappings between logical file names (LFNs) and physical file names (PFNs), and sends updates to the RLI as required.

What is the Replica Location Index (RLI)?

The RLI is a service of the RLS that maintains an index of logical file names (LFNs). For each LFN in the index, one or more LRCs are identified as maintaining a mapping between the LFN and the PFN but only at the LRC.

What is a storage element?

A storage element is a separate service (unrelated to the RLS) that maintains physical collections of data. Examples of storage elements include file systems, FTP or GridFTP servers, and database management systems.

What is a Logical File Name (LFN)?

The LFN is a logical name representing a collection of semantically equivalent physical files which may be dispersed at any number of storage elements.

What is a Physical File Name (PFN)?

The PFN is a name representing a physical element of data. PFNs may take the form of file names, URIs, or any other identifier meaningful to a storage element.

Is there or will there be a WSRF RLS?

The Globus RLS team is planning a WSRF RLS implementation following the GT 4.0 release to comply with the OGSA Replication (OREP) standard currently being developed. It is important to note that at this time we do not forsee an end of support for the current implementation of the RLS. We welcome your input and involvement in this process.

What interfaces exists for the RLS?

The RLS provides administrative and client command-line interfaces, a C API, and a Java API (JNI wrapper to the C API).

Getting started with RLS

Where can I find the official RLS download page?

Various download links are available here.

What is the most recent, stable, and supported release of RLS?

Stable, supported releases include:

  • RLS 2.2 (for use with GT 3.2)
  • RLS 2.1.5 (for use with GT 2.4)

Note: RLS 2.2 and 2.15 provide the same functionality, with the only difference being the support GT release.

What platforms are supported?

The RLS is expected to work well with Linux and most UNIX variants. The Globus RLS team is most familiar working with RedHat, Solaris, and Debian.

What database management systems are supported?

The RLS supports certain versions of MySQL, PostgreSQL, and Oracle* database management systems.

* Please note that Oracle support is relatively new. We would appreciate comments and feedback from individuals working with Oracle-based deployments.

Which database do you recommend?

We recommend using MySQL 4.0.1 (and above) or Postgres 7.2.3 (and above). The Globus RLS team is most familiar working with the MySQL database management system.

What database managers and drivers are supported?

The RLS supports certain versions of iODBC, MyODBC and psqlODBC.

The Globus RLS team is most familiar with the following versions:

  • iODBC version 3.0.5
  • MyODBC version 3.52
  • psqlODBC 7.2.5.

Which Globus Toolkit is required?

If your environment is based on GT 2.4, use RLS 2.1.5 release. If your environment is based on GT 3.2, use RLS 2.2 release.

Does RLS support 64-bit platforms?

Currently, RLS is supported on 32-bit platforms only. Testing and evaluation on 64-bit platforms is pending. Please contact us at discuss@globus.org if you are interested in a 64-bit deployment of RLS.

What kind of performance can I expect from RLS?

The Globus RLS team conducted performance tests and published the results as of early 2004. Please see http://www.isi.edu/~annc/papers/chervenakhpdc13.pdf.

At what scale has RLS been deployed in production environments?

Typical deployments involve 10 or fewer servers and range up to 10s of millions of mappings.

I don’t have host certificates, how do I use RLS?

The RLS may be deployed without authentication enabled.

To startup the RLS server without authentication use the -N option. For example, from the command-line, run:

$GLOBUS_LOCATION/bin/globus-rls-server -N

Consequently, to connect to the RLS via one of the client interfaces, use the rlsn protocol. For example, from the command-line, run:

$GLOBUS_LOCATION/bin/globus-rls-admin -p rlsn://<your server>

How many clients can RLS support simultaneously?

By default, the RLS is configured to accept up to 100 concurrent client connections.

What are the system requirements for deploying RLS?

On x86 architecture, a minimum of 1GHz CPU and 1 GB RAM should be used for deployments managing thousands of mappings.

For larger deployments, we recommend Dual 1 GHz CPUs and 2+ GB RAM.

Disk space will be dependent upon the scale of your local replica catalog.

Can I run RLS on a box with CPU speed X and memory Y?

Occasionally, we receive support requests for an RLS deployed on a sub-optimal system – one that does not meet the recommended (or even minimum) system requirements. Please understand that operating the RLS on a sub-optimal system may significantly degrade the performance of your system. We strongly advise users to deploy RLS on systems meeting the minimum requirements. In addition, proper capacity planning for your RLS deployment will help to ensure acceptable performance and reliability of your system.

Can RLS be replicated between two servers for failover protection?

At present, the RLS may be deployed on a single node only. Suggestions for improving fault-tolerance for your system include:

  • deploy RLS and your RDBMS on separate nodes.
  • deploy your RDBMS in a clustered configuration, if supported.
  • run a backup RLS and use scripts (e.g., a cron job) to duplicate your mappings on the backup machine.

Getting support for RLS

How can I get support for my RLS related questions?

Subscribe and submit your questions to discuss@globus.org or developer-discuss@globus.org.

How can I submit a bug?

Search for and submit bugs at http://bugzilla.globus.org/bugzilla.

What information should I include along with my bug submission?

Please include as much detail as you can to help us resolve your issue. We ask that, at a minimum, you include all of the following information if possible:

  1. RLS version
  2. Operating System
    1. OS name and version
    2. GLIBC version
    3. Pthread Lib version
  3. Database
    1. RDBMS name and version
    2. ODBC Manager name and version
    3. DB Driver name and version
  4. Globus Toolkit
    1. GT version
    2. Globus IO library version
  5. RLS Configuration information
  6. If the problem can be reproduced, a test case (Any set of interactions with the server which causes it to hang)

Additionally, if the bug is severe (involving a hang or crash) it may be helpful to attach:

  1. RLS log file
  2. GDB stack trace

General usage

How do I change the RLS default configuration?

1

Locate the RLS configuration file:

$GLOBUS_LOCATION/etc/globus-rls-server.conf

You will find the complete list of parameter names and default values towards the bottom of the file.

2

To change any value from the default, un-comment the parameter by deleting the '#' at the beginning and changing the value of the parameter.

We recommend copying the parameter to the top of the file so that the change is immediately visible.

Do I have to restart RLS in order for configuration changes to take effect?

To change a configuration value at runtime you may use the -C option of the globus-rls-admin client. For a description of this command, please see the globus-rls-admin usage instructions.

Please note that configuration options changed during runtime will NOT change the globus-rls-server.conf settings, you must separately change the permanent configuration.

How can I increase client side timeouts?

If for any reason you need to increase the client timeouts, use the -t <timeout value> option. You may use this option with the globus-rls-cli and globus-rls-admin clients.

Where is the RLS log file?

The RLS server logs messages through syslog using the LOG_DAEMON facility. If you do not find the log messages in the default syslog location, you should check your syslog configuration. You will not find a separate log file for RLS.

Where are the RLS log messages when the server is started in debug mode (using the -d option)?

When the RLS server is started in debug mode, by using the -d command-line option, the log messages are displayed directly to the user console and are not sent to the syslog.

Are the LFNs, PFNs and associated attributes that are maintained by the RLS case sensitive?

Case sensitivity is dependent upon the underlying database management system. In the case of MySQL, string comparisons are performed case-insensitive, while Oracle performs case-sensitive string comparisons. Presently, the RLS makes no attempt to enforce case-sensitivity.

Using the RLI index

What is “immediate mode”?

When configured for “immediate mode”, the LRC updates the RLI as soon as one of the following happens:

  1. About 100 mappings have been added to the LRC.
  2. 30 seconds have elapsed since the latest mapping has been added to the LRC.

When not configured for “immediate mode”, the LRC would update the RLI with all of its LFNs only once every day (by default).

The 30 seconds mentioned above is controlled by the update_buftime parameter in the configuration file. If you want to change this without shutting down the server, you can use:

globus-rls-admin -C update_buftime <new value> rls://<your server>

Be careful when setting this value, because changing this to a really low value could result in the LRC sending several small updates to the RLI when it is under heavy load. This is likely to be inefficient.

How can I change the update/expire intervals while using LFN list updates?

The update interval at the LRC and the expire interval at the RLI are interrelated. The expire interval at the RLI should be equal to or greater than the update interval at the LRC. This ensures that a new full update will be sent to the RLI before it expires stale entries in its database.

The following example illustrates the LRC and RLI settings.

At the LRC:

update_ll_int 3600

Full LFN list updates are sent every hour (3600 seconds)

update_factor 1

This ensures that the update_ll_int above is not multiplied (multiplied by 1, actually). The multiplication happens when update_immediate is set to true. In this case we want immediate updates, but we also want to send a full update every hour.

At the RLI:

rli_expire_stale 3600

Now, the RLI will expire stale entries every hour. This can be changed to a higher value if you wish.

rli_expire_int 1800

The thread which checks for stale entries will now be run every 30 minutes. This interval should be less than the rli_expire_stale int value to ensure that entries are expired on time.

What are Bloom filters?

The Bloom filter algorithm provides an efficient way to perform a membership test. The caveat when using Bloom filters is that they allow for a small chance (~1%) of false positives. In the context of the RLS, Bloom filters are used to efficiently send updates from a LRC to a RLI. The RLI uses the Bloom filter to check for the existence of a LFN in one of its corresponding LRCs. We also refer to this as “compressed” updates because the Bloom filter only requires 1 bit of memory for every LFN.

Should I use LFN lists or Bloom filters?

We recommend using Bloom filters when the LRC manages a large scale of LFNs – in the order of a few hundred thousand or more. Large scale users of RLS will find that using Bloom filters significantly reduces the memory usage at the LRC node and reduces the processor consumption when sending updates to RLIs.

How can I configure the LRC to send updates to RLIs using Bloom filters?

Use the -A <rli url> option of the globus-rls-admin tool.

How can I configure the RLI to accept bloom filters sent from LRCs?

In the RLS configuration file ($GLOBUS_LOCATION/etc/globus-rls-server.conf), find the rli_bloomfilter property. It should be commented out (remove the # at the beginning of the line) and have a value of false. Create an entry in the file as rli_bloomfilter true. Next time you restart the RLI, it will begin accepting Bloom filter updates from LRCs.

How can I save Bloom filters for faster RLI startup?

In the RLS configuration file ($GLOBUS_LOCATION/etc/globus-rls-server.conf), find the rli_bloomfilter_dir property. It should be commented out (remove the # at the beginning of the line) and have a value of none. Create an entry in the file as:

rli_bloomfilter_dir <path to a valid dir>

Next time you restart the RLI, it will save Bloom filters from LRCs and upon subsequent restarts it will load the Bloom filters from the specified directory.

If I use Bloom filters, do I need an RLI database?

Yes. The RLI uses the database to maintain other important information.

Are there any disadvantages to using Bloom filter (compressed) updates?

Despite the significant advantages of using Bloom filter (or “compressed”) updates, there are two disadvantages that should be understood when using them.

  • False Positives: There is a small chance (~1%) of getting a false positive when querying the RLI for an LFN. When developing RLS client applications, we recommend that the client validate that the LRC does contain the LFN.
  • Wildcards Not Supported: Because of the nature of the Bloom filter membership test, it is not possible to support wildcard queries when using the Bloom filters.

How do I stop an LRC from sending updates to an RLI?

Use the -d <rli url> option of the globus-rls-admin tool to remove an RLI from the list of RLIs that the LRC updates.

How can I find out if my LRC is sending updates to any RLIs or whether my RLI has accepted updates from any LRCs?

Use the -S option of the globus-rls-admin tool to show statistics of the RLS.