GT 3.0 CAS: Performance Notes
Summary of performance tests
- Basic query test - whoami
- Query test - findPolicyData
- Credential test - getMaximalAssertion
- Size of credential test
Each of the first three tests were done once with 200 clients talking to the server serially and then with all them talking to the server concurrently. Each number is an average of the time taken to get a locator and handle to the CASPort and the actual method invocation. All numbers are in milliseconds.
The stress test framework in GT3 Core was used for testing. This framework allows a block of code defined in an "init" method for initializing, code to be timed in "run" method and finally a "postRun" methd. Two of the numbers the stress framework that are of relevance here are
- "totalAverageTime" - this is the average time that each performance run takes (initialization, run and postRun). In the table below, these are the numbers given outside the bracket. (appears to be weird in concurent case since it does not account for the large time each thread takes).
- "totalCallAverageTime" - this is the average time that each run of the relevant code takes (run). In the table below, these are the numbers given inside the bracket. In concurent runs, the time taken are very high.But look at end of this document to see what infrastructure contributes.)
The server and client were run on the same machine. Database (PostGres) was used and was also on the same machine.
Platform
All tests were run on a i686, 2.2GHz machine running Linux 2.4.18.Database status
All tests were run with two different sets of data in the backend database. But in both cases the default bootstrap was done which populated the database with service/action implict to CAS service.
In the case of smaller data set, other than the data described above, a single trust anchor, usergroup, user and object was added. Two policies were added - one granting that user group super user permissions on the CAS server and another granting it access to the external object added. (only the latter is returned on an assertion generation request.)
In the other case, 100 user groups, 100 users, 100 resources and 1000 polices was added above the deafult data. Each user group was given ten polices across the resources.
Test results
Basic query test
This test measures the time taken to query the database to get the CAS nickname of the user making the call. There are no parameters to process in this call and is the most rudimentary query in the CAS service. Every single CAS invocation does this to get the name of the user to ascertain permissiosns.
The test was run with 200 client threads, serially and cocurrently.
| Serial | Concurrent | ||||
| Small | 253 (253) | 155(26832) | |||
| Large | 253(253) | 214(40958) |
Query test
This test measures the time taken to query the database to get all applicable policy for the user making the call. The CAS server checks permissions for the operation and performs the actual retrieval of policy and is returned as an array of PolicyData objects.
The test was run with 200 client threads, serially and cocurrently. With smaller data set only one policy is returned, with larger dataset 10 amongst the 1000 policies are returned.
| Serial | Concurrent | ||||
| Small | 315(315) | 219(34425) | |||
| Large | 479(479) | 286(40298) |
Credential test
This test measures the time taken to query the database to get all applicable assertions for the user making the invocation. The CAS server checks permissions for the operation and performs the actual retrieval of assertion. Since the assertion is returned as xsd:any the time taken to convert them in SAMLAssertion object is also measure.
The test was run with 200 client threads, serially and cocurrently. With smaller data set assertion contains only one authorizatin policy , with larger dataset 10 polices amongst the 1000 policies are embedded in the assertion.
| Serial | Concurrent | ||||
| Small | 416(416) | 303(50032) | |||
| Large | 568(568) | 400(52304) |
Size of credential test
The database was populated appropraitely to generate credentials of various sizes and saved to file. The largest credential size tested was approximately 70K. This required that the default timeout in the Stub be incresed to allow for large call time.
For credendial of size 70995, using 50 threads serially, the avarage timetaken was 6833ms.
Running performance tests
All the classes needed to run these tests are in the package org.globus.ogsa.impl.base.cas.server.test.performance and targets have been provided in the build file for running them.Setting up database
These targets add some specific number of users to the database. One of the users needs to have the subect DN of the credential that will be used while running the tests. This is picked up from a variable "subject1DN" from a file, whose name is passes as a system property to each of this target. But default, this is set to "casTestProperties". To override it, set -Dcas.test.properties while invoking the target.
To get configuration parameters for database access, a file with database information is required as shown here. This is by default, "casDBProperties". To override it, set -Dcas.db.properties while invoking the target.
To set up small datasets in database, the following from CAS_HOME
psql -U tester -d casDatabase -f etc/delete.sql
ant bootstrap
ant populateDB -Dargs="small"
To set up large datasets, but credential size less than 50K, in database, the following from CAS_HOME
psql -U tester -d casDatabase -f etc/delete.sql
ant bootstrap
ant populateDB -Dargs="large"
To set up large datasets, but credential size greater than 50K, in database, the following from CAS_HOME
psql -U tester -d casDatabase -f etc/delete.sql
ant bootstrap
ant populateDB -Dargs="large largeCred"
Setting up GT3 container
For large runs (greater than 200 clients) and/or large credential generation(greater than 50K), the default timout on the Stub needs to be altered. This can be done by editing the GLOBUS_LOCATION/build.xml, stress target and adding the following as another system property. The value needs to be set appropriately.<sysproperty key="org.globus.ogsa.client.timeout" value="200000"/>
The above is the solution if "Read timeout error" is seen on the client side. In all other cases, a default GT3 container with CAS service deployed should suffice.
Running test targets
All tests use the stress framework in GT3 core. For more details, refer to test documentation in GT3 distribution. All targets have the following options:- -Dstress.threads = number of threads
- -Dstress.processes = number of processes
- -Dstress.concurrent = set to "yes" or "no"
-
Basic Query tests
Running the following target after setting up the database and GT3 container runs a test that measures time taken for a call to get the CAS nickname of the user.ant basicQueryStress -Dstress.threads=300 -Dstress.processes=1 -Dstress.concurrent=yes
-
Query tests
Running the following target after setting up the database and GT3 container runs a test that measures time taken for getting all applicable policies of the user.ant queryStress -Dstress.threads=300 -Dstress.processes=1 -Dstress.concurrent=no
-
Credential tests
Running the following target after setting up the database and GT3 container runs a test that measures time taken for generating credential of all policies of user.ant credentialStress -Dstress.threads=200 -Dstress.processes=1 -Dstress.concurrent=yes
-
Size of Credential
This test is used to ensure that credentials of large sizes can be generated. To run this test, set up the data base using the "populateDB" target with options "large largeCred". This will ensure that the credential returned on running the following is around 70K. This test requires that the timeout be set to a high value as descibed here. Since this is a test of large credential generation, it maybe prudent to just have one or few threads execution, rather than many.ant credentialStress -Dstress.threads=1 -Dstress.processes=1 -Dstress.concurrent=no
Infrasture overhead test
The whoami method on CAS port was overridden to just return a dummy string without any processing. The above described "Basic Query Test" was done to measure performance with 200 serial and concurrent client. On serial testing, time taken was 252ms (252) and on concurrent testing it was 187ms (34209). So it appears that irrespective of the application code, the infrastructure takes a considerably long time on concurrent calls.