Sessions are one of the key components of OpenAM, so it can be quite beneficial to have a good understanding of how they are modelled, stored and used. In the followings I will try to do a deep dive on how sessions work, what crosstalk means, and how session failover with CTS works.
Originally this post was also meant to discuss how CTS reduced crosstalk mode changes things, but I couldn’t really finish with all the necessary tests so that I could write it up 100% accurately unfortunately… Hopefully this will be a good read on Sessions regardless 🙂
Firstly let’s clarify that OpenAM does NOT use HttpSessions (i.e. the session that the container maintains through the JSESSIONID cookie) for session tracking purposes, instead OpenAM uses it’s own session storage system with its own session ID format. The sessions by default are stored in memory in a Hashtable. This means that if you have more than one OpenAM server in a deployment, then each of the servers are going to have different sets of sessions stored in memory. How can one server then validate an another server’s session?
Since each session is essentially bound to an OpenAM server, validating a session in a multi-node deployment can be only done by validating the session at the server that actually owns it. The cross-server session validation is done by one server making an HTTP request to the other server (i.e. to the server that is meant to own the session), and this is what in OpenAM world we call crosstalk.
Before going into any further details though, we should talk a bit about the format of the SessionID:
The first part of the session ID up until the .* contains the random identifier of the session, and the value between the .* and the trailing * is the extension part of the session. After correctly decoding the extension, one can see that the following information is stored in there:
- Server ID: 01
- Site ID: 02
- Storage Key: -4410267496494311876
In short, the Server ID and the Site ID allows the OpenAM servers to figure out which server actually owns a given session, the Storage Key will become important when we look into the session failover in more detail. In case you want to read more about sessions then I would strongly suggest to check out Bill Nelson’s blog post about it.
Now that we know how OpenAM servers can determine who owns a given session, it’s time to demonstrate how crosstalk really works. Let’s assume that we have the following (simplified) deployment diagram:
There is a session validation request received by Server 03 for a session that is actually owned by Server 01.
Server 03 will first look for the session locally, then checks which server actually is meant to own the session. Since the session in this case is owned by Server 01, Server 03 will perform an HTTP request against Server 01’s sessionservice PLL (Platform Low Level -> essentially an XML over HTTP protocol) endpoint. At this point, Server 01 will check the session locally and if it exists, then it will be returned to Server 03 in an XML format (called SessionInfo). Server 03 then processes the returned SessionInfo (or error) and is able to answer the question whether the session in question is actually valid or not. The SessionInfo retrieved from Server 01 will then be stored on Server03 for caching purposes.
A sequence diagram to demonstrate this would look something like (the session listener part is explained a bit further down):
In this scenario there is a session validation request received by Server 04 for a session that is actually owned by Server 01.
In this case the flow is slightly different, since Server 04 will actually send the sessionservice request to Site 02. Depending on LB configuration/stickiness/luck there is a chance that the request will be routed to Server 03, in which case Scenario 1 kicks in.
Moral of these scenarios
One of the most important thing to understand when it comes to crosstalk is that all these HTTP requests that are sent between the different instances are blocking calls. For example in the second scenario both Server 04 and (potentially) Server 03 will have 1 request processing thread waiting for Server 01 (and Server 03) to respond, which means that a single user request kept 3 different request processing threads busy. This is why when it comes to sizing a deployment it is key to understand how much crosstalk is expected in the environment (which will be very much dependent on how sticky your LB is).
Since we are talking about HTTP requests, it’s also key to ensure that the HTTP Connect and Read timeout settings are configured to sensible values. In case of improper settings, there is a chance that an unresponsive instance kills an another (otherwise perfectly running) instance, because the good instance ends up waiting 30 seconds (or potentially more) on the bad server. During this time period the request processing threads will be occupied with user requests (which in turn are waiting on the crosstalk responses), essentially making the container inaccessible for other users. The connect and read timeout settings are advanced server properties (stored under Configuration – Servers and sites – Default Server Settings – Advanced tab), and they should probably look something like:
Or something similar. This is a tuning parameter really, so take this example only as a guideline please, and make sure you tune/test this setting in your environment first before changing anything.
After all of this, probably you are already wondering how this scales well, and what prevents the OpenAM servers from just querying each other for the same sessions over and over again. As I’ve already shortly mentioned in Scenario 1, the retrieved SessionInfo is cached on the server that made the crosstalk request. The interval of how long this SessionInfo is cached is controlled by the setting called Maximum Caching Time under Configuration – Global – Session. Within this interval OpenAM will not update the idle timeout of the session, and also it will just use the cached SessionInfo, instead of asking for the same information again. Once the interval passes and there is an incoming request for the corresponding session, OpenAM will obtain the SessionInfo again from the authoritative server and cache it again.
Performing non-read operations (like setting a session property, or a logout) will result in a crosstalk request regardless of the caching setting, as changes to the sessions can be only made by the authoritative server.
In order to prevent the cached session data from becoming stale, the servers are also registering notification listeners. This way when the session gets updated, the authoritative server can send out notifications to all the interested OpenAM servers about the changes.
As usually the purpose of a multi-node environment is to achieve high availability of the service, we should think about what happens when an OpenAM instance goes down. Since OpenAM stores the sessions in memory, once the JVM shuts down, all the sessions that are hosted by that OpenAM instance will be lost, forcing end-users to reauthenticate.
In certain deployments this behavior may not be acceptable. To make this more seamless for end-users, OpenAM’s session failover solution can be enabled. Session failover essentially means the following three things:
- Storing session information in a persistent storage system.
- Monitoring the other servers in the deployment for availability.
- Recovering sessions if the host server is down.
To be able to discuss these, first let’s assume that we have the following slightly more complex deployment:
Since OpenAM 11.0.0, sessions are stored by the Core Token Service (CTS) into an embedded/external OpenDJ instance as directory entries.
When it comes to session failover, we are talking about the storage and retrieval of session tokens. These session tokens are converted into a generic token format, and then stored in the directory server. An example session would look something like this in OpenDJ:
dn: coreTokenId=474806826517738981,ou=famrecords,ou=openam-session,ou=tokens,dc=openam,dc=forgerock,dc=org objectClass: top objectClass: frCoreToken coreTokenString02: AQIC5wM2LY4SfcyiG2X5Sgn2teO8deDdR6NUPHxmITkXNhg.*AAJTSQACMDIAAlNLABI0NzQ4MDY4MjY1MTc3Mzg5ODEAAlMxAAIwMQ..* coreTokenType: SESSION coreTokenId: 474806826517738981 coreTokenString03: shandle:AQIC5wM2LY4Sfcw2KnP6-SuDR0OueZg90KT938-gFM6jWY4.*AAJTSQACMDIAAlMxAAIwMQACU0sAEjQ3NDgwNjgyNjUxNzczODk4MQ..* coreTokenUserId: id=demo,ou=user,dc=openam,dc=forgerock,dc=org coreTokenExpirationDate: 20140728160134+0100 coreTokenString01: 1406557594 coreTokenObject: <long JSON blob representing a session>
The most important part of the above token is actually the coreTokenObject field, which is actually what really represents the full session object, that can be restored at a later time if necessary.
It is good to keep in mind that the CTS stores these entries in a directory server, more specifically in OpenDJ. Amongst many other things, this means that the tokens needs to be transferred over the network from OpenAM to one of the OpenDJ instances using the LDAP protocol. A regular session token’s size can vary between 3k and 10k depending on deployment size and session usage; and this data needs to get transferred over the network for persistence.
In a multi-node OpenDJ deployment – where replication has been set up -, each of the token related (ADD/MODIFY/DELETE) LDAP operations gets tracked in the changelog, which means all the token information gets persisted essentially twice (note that the changelog gets periodically purged, see replication-purge-delay). By reducing the token size firstly you will write less data to the disk (less disk IO) and secondly the network traffic should be smaller as well (for both the initial operation, and for the replication itself afterwards).
The CTS framework implements two kind of compression mechanism for the coreTokenObject field in order to reduce the token size:
- attribute compression: In this case the JSON object itself is compressed by essentially shortening the field names for example sessionProperties -> sP (com.sun.identity.session.repository.enableAttributeCompression advanced server property).
- compression: GZIP compression (com.sun.identity.session.repository.enableCompression advanced server property).
Both of these approaches should reduce the size of the session tokens, but they also increase the processing time on the OpenAM side. Personally I would strongly recommend enabling GZIP compression to reduce the token size in deployments where session failover is enabled.
As we already know, under normal circumstances OpenAM uses crosstalk when requests gets misrouted, and this is something that does not change once session failover is enabled. Although the sessions are stored in OpenDJ, OpenAM will remain to send crosstalk requests to the other instances (in essence, this behavior is what the reduced crosstalk mode attempts to change).
In case of session failover it doesn’t really help if AM waits on crosstalk response from a node which may be actually unavailable. In order to prevent this scenario, there is a component that gets automatically enabled once session failover has been enabled: the ClusterStateService (or CSS as I’d like to call it). The ClusterStateService essentially tries to monitor the servers within the current site (and since 12 it also monitors remote site URLs), so that if there is a need to do a crosstalk, OpenAM first checks with ClusterStateService whether the given node is available, before sending any kind of crosstalk request.
To see what kind of settings are there to control CSS, have a look at the Admin Guide.
If the server that is meant to host the session is down, then OpenAM will recover the session.
Before the session can be recovered from CTS, OpenAM needs to figure out first who will own the session in the absence of its current owner. Each OpenAM server has a thing called the PermutationGenerator, which is essentially to ensure that all the OpenAM servers within the same deployment will always generate numbers in the exact same order. The owner of the session is determined by ClusterStateService and PermutationGenerator, i.e. the first server that is available and is next in the line for the current session, will have the responsibility of hosting the session for the currently unavailable server.
The recovery itself is quite simple: the session gets retrieved from CTS, and a session object gets created based on the information stored in the token.
That’s it for now
Hopefully you found this post helpful.
Once I finally get some time to test reduced crosstalk in a real agent-enabled environment there will be a new post coming your way. 😉