How to compile 3.3.3 web agents on Linux

Trying to build the web policy agents may be a little frustrating at times, especially because it is generally not that well documented. As part of the 3.3.3 agent release I had the joy of going through the build process of the Linux web agents, so here you go, here are the steps I’ve gone through to build the agents on a freshly installed CentOS 5.10 (32 & 64 bit). For the very keen, I’ve also aggregated all the commands into shell scripts, you can find them at the bottom of this post.

Packages

On an out of the box CentOS 5.10 installation gcc is already installed, but there are some other components you should make sure you have installed before attempting to compile the agents:

  • subversion: To check out the source code of the agents.
  • ant: To perform the build, the agents are using Ant scripts to define the build steps.
  • zlib-devel: Compilation time dependency.
  • gcc-c++: C++ compiler.
  • rpm-build: RPM packages are being generated as part of the build process.
  • gcc44: Building NSS requires a newer GCC than the default one.
  • java-1.6.0-openjdk-devel: JDK to be able to build the agent installers (and run the Ant scripts).

The agent dependencies

The agents rely on the following external libraries:

  • NSS 3.16.3 & NSPR 4.10.6
    On 32 bit the component should be built with:

    make BUILD_OPT=1 NS_USE_GCC=1 NSS_ENABLE_ECC=1 NSDISTMODE="copy" CC="/usr/bin/gcc44 -Wl,-R,'$$ORIGIN/../lib' -Wl,-R,'$$ORIGIN'" nss_build_all

    On 64 bit the command is:

    make USE_64=1 BUILD_OPT=1 NS_USE_GCC=1 NSS_ENABLE_ECC=1 NSDISTMODE="copy" CC="/usr/bin/gcc44 -Wl,-R,'$$ORIGIN/../lib' -Wl,-R,'$$ORIGIN'" nss_build_all
  • PCRE 8.31
    On 32 bit:

    ./configure --with-pic --enable-newline-is-anycrlf --disable-cpp

    On 64 bit:

    CC="gcc -m64" CXX="g++ -m64" ./configure --with-pic --enable-newline-is-anycrlf --disable-cpp
  • libxml2 2.9.1
    On either platform:

    ./configure --prefix=/opt/libxml2 --with-threads --with-pic --without-iconv --without-python --without-readline
  • The header files of various web containers

Once all the libraries are in place, you can run the build on 32 bit:

ant nightly

on 64 bit:

ant nightly -Dbuild.type=64

And here are the shell scripts:
64-bit
32 bit

Cross-Domain Single Sign-On

A couple of blog posts ago I’ve been detailing how regular cookie based Single Sign-On (SSO) works. I believe now it is time to have a look at this again, but make it a bit more difficult: let’s see what happens if you want to achieve SSO across different domains.

So let’s say that you have an OpenAM instance on the login.example.com domain and the goal is to achieve SSO with a site on the cool.application.com domain. As you can see right away, the domains are different, which means that regular SSO won’t be enough here (if you don’t see why, have a read of my SSO post).

Cross-Domain Single Sign-On

We already know now that in the regular SSO flow the point is that the application is in the same domain as OpenAM, hence it has access to the session cookie, which in turn allows the application to identify the user. Since in the cross-domain scenario the session cookie isn’t accessible, our goal is going to be to somehow get the session cookie from the OpenAM domain to the application’s domain, more precisely:
We need to have a mechanism that is implemented by both OpenAM and the application, which allows us to transfer the session cookie from one place to the other.
Whilst this sounds great, it would be quite cumbersome for all the different applications to implement this cookie sharing mechanism, so what else can we do?

Policy Agents to the rescue

For CDSSO OpenAM has its own proprietary implementation to transfer the session ID across domains. It’s proprietary, as it does not follow any SSO standard (for example SAML), but it does attempt to mimic them to some degree.
OpenAM has the concept of Policy Agents (PA) which are (relatively) easily installable modules for web servers (or Java EE containers) that can add extra security to your applications. It does so by ensuring that the end-users are always properly authenticated and authorized to view your protected resources.
As the Policy Agents are OpenAM components, they implement this proprietary CDSSO mechanism in order to simplify SSO integrations.

Without further ado, let’s have a look at the CDSSO sequence diagram:

cdsso-web-sequence

A bit more detail about each step:

  1. The user attempts to access a resource that is protected by the Policy Agent.
  2. The PA is unable to find the user’s session cookie, so it redirects the user to…
  3. …the cdcservlet. (The Cross-Domain Controller Servlet is the component that will eventually share the session ID value with the PA.)
  4. When the user accesses the cdcservlet, OpenAM is able to detect whether the user has an active session (the cdcservlet is on OpenAM’s domain, hence a previously created session cookie should be visible there), and…
  5. when the token is invalid…
  6. we redirect to…
  7. …the sign in page,
  8. which will be displayed to the user…
  9. …and then the user submits her credentials.
  10. If the credentials were correct, we redirect the user back to…
  11. …the cdcservlet, which will…
  12. …ensure…
  13. …that the user’s session ID is actually valid, and then…
  14. …displays a self-submitting HTML form to the user, which contains a huge Base64 encoded XML that holds the user’s session ID.
  15. The user then auto-submits the form to the PA, where…
  16. …the PA checks the validity…
  17. …of the session ID extracted from the POST payload…
  18. …and then performs the necessary authorization checks…
  19. …to ensure that the user is allowed to access the protected resource…
  20. …and then the PA creates the session cookie for the application’s domain, and the user either sees the requested content or an HTTP 403 Forbidden page. For subsequent requests the PA will see the session cookie on the application domain, hence this whole authentication / authorization process will become much simpler.

An example CDSSO LARES (Liberty Alliance RESponse) response that gets submitted to PA (like in step 15 above) looks just like this:

<lib:AuthnResponse xmlns:lib="http://projectliberty.org/schemas/core/2002/12" xmlns:saml="urn:oasis:names:tc:SAML:1.0:assertion" xmlns:samlp="urn:oasis:names:tc:SAML:1.0:protocol" xmlns:ds="http://www.w3.org/2000/09/xmldsig#" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" ResponseID="sb976ed48177fd6c052e2241229cca5dee6b62617"  InResponseTo="s498ed3a335122c67461c145b2349b68e5e08075d" MajorVersion="1" MinorVersion="0" IssueInstant="2014-05-22T20:29:46Z"><samlp:Status>
<samlp:StatusCode Value="samlp:Success">
</samlp:StatusCode>
</samlp:Status>
<saml:Assertion  xmlns:saml="urn:oasis:names:tc:SAML:1.0:assertion" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"  xmlns:lib="http://projectliberty.org/schemas/core/2002/12"  id="s7e0dad257500d7b92aca165f258a88caadcc3e9801" MajorVersion="1" MinorVersion="0" AssertionID="s7e0dad257500d7b92aca165f258a88caadcc3e9801" Issuer="https://login.example.com:443/openam/cdcservlet" IssueInstant="2014-05-22T20:29:46Z" InResponseTo="s498ed3a335122c67461c145b2349b68e5e08075d" xsi:type="lib:AssertionType">
<saml:Conditions  NotBefore="2014-05-22T20:29:46Z" NotOnOrAfter="2014-05-22T20:30:46Z" >
<saml:AudienceRestrictionCondition>
<saml:Audience>https://cool.application.com:443/?Realm=%2F</saml:Audience>
</saml:AudienceRestrictionCondition>
</saml:Conditions>
<saml:AuthenticationStatement  AuthenticationMethod="vir" AuthenticationInstant="2014-05-22T20:29:46Z" ReauthenticateOnOrAfter="2014-05-22T20:30:46Z" xsi:type="lib:AuthenticationStatementType"><saml:Subject   xsi:type="lib:SubjectType"><saml:NameIdentifier NameQualifier="https://login.example.com:443/openam/cdcservlet">AQIC5wM2LY4SfcxADqjyMPRB8ohce%2B6kH4VGD408SnVyfUI%3D%40AAJTSQACMDEAAlNLABQtMzk5MDEwMTM3MzUzOTY5MTcyOQ%3D%3D%23</saml:NameIdentifier>
<saml:SubjectConfirmation>
<saml:ConfirmationMethod>urn:oasis:names:tc:SAML:1.0:cm:bearer</saml:ConfirmationMethod>
</saml:SubjectConfirmation>
<lib:IDPProvidedNameIdentifier  NameQualifier="https://login.example.com:443/openam/cdcservlet" >AQIC5wM2LY4SfcxADqjyMPRB8ohce%2B6kH4VGD408SnVyfUI%3D%40AAJTSQACMDEAAlNLABQtMzk5MDEwMTM3MzUzOTY5MTcyOQ%3D%3D%23</lib:IDPProvidedNameIdentifier>
</saml:Subject><saml:SubjectLocality  IPAddress="127.0.0.1" DNSAddress="localhost" /><lib:AuthnContext><lib:AuthnContextClassRef>http://www.projectliberty.org/schemas/authctx/classes/Password</lib:AuthnContextClassRef><lib:AuthnContextStatementRef>http://www.projectliberty.org/schemas/authctx/classes/Password</lib:AuthnContextStatementRef></lib:AuthnContext></saml:AuthenticationStatement></saml:Assertion>
<lib:ProviderID>https://login.example.com:443/openam/cdcservlet</lib:ProviderID></lib:AuthnResponse>

If you watch carefully you can see the important bit right in the middle:

<saml:NameIdentifier NameQualifier="https://login.example.com:443/openam/cdcservlet">AQIC5wM2LY4SfcxADqjyMPRB8ohce%2B6kH4VGD408SnVyfUI%3D%40AAJTSQACMDEAAlNLABQtMzk5MDEwMTM3MzUzOTY5MTcyOQ%3D%3D%23</saml:NameIdentifier>

As you can see the value of the NameIdentifier element is the user’s session ID. Once the PA creates the session cookie on the application’s domain you’ve pretty much achieved single sign-on across domains, well done! πŸ˜‰

P.S.: If you are looking for a guide on how to set up Policy Agents for CDSSO, check out the documentation.

Certificate authentication over REST

A little bit of background

Amongst the various different authentication mechanisms that OpenAM supports, there is one particular module that always proves to be difficult to get correctly working: Client certificate authentication, or Certificate authentication module as defined in OpenAM. The setup is mainly complex due to the technology (SSL/TLS) itself, and quite frankly in most of the cases the plain concept of SSL is just simply not well understood by users.

Disclaimer: I have to admit I’m certainly not an expert on SSL, so I’m not going to deep dive into the details of how client certificate authentication itself works, instead, I’m just going to try to highlight the important bits that everyone should know who wants to set up a simple certificate based authentication.

The main thing to understand is that client cert authentication happens as part of the SSL handshake. That is it… It will NOT work if you access your site over HTTP. The authentication MUST happen at the network component that provides HTTPS for the end users.
Again, due to SSL’s complexity there are several possibilities: it could be that SSL is provided by the web container itself, but it is also possible that there is a network component (like a load balancer or a reverse proxy) where SSL is terminated. In the latter case it is quite a common thing for these components to embed the authenticated certificate in a request header for the underlying application (remember: the client cert authentication is part of the SSL handshake, so by the time OpenAM is hit, authentication WAS already performed by the container).

Now this is all nice, but how do you actually authenticate using your client certificate over REST?

Setting it all up

Now some of this stuff may look a bit familiar to you, but for the sake of simplicity let me repeat the exact steps of setting this up:

  • Go to Access Control – realm – Authentication page and Add a new Module Instance called cert with type Certificate
  • Open the Certificate module configuration, and make sure the LDAP Server Authentication User/Password settings are correct.
  • Generate a new self signed certificate by following this guide, but make sure that in the CSR you set the CN to “demo”. The resulting certificate and private key for me was client.crt and client.key respectively.
  • Create PKCS#12 keystore for the freshly generated private key and certificate:
    openssl pkcs12 -export -inkey client.key.org -in client.crt -out client.p12
  • Install the PKCS#12 keystore in your browser (Guide for Firefox)
  • Enable Client Authentication on the container. For example on Tomcat 7.0.53 you would have to edit conf/server.xml and set up the SSL connector like this:
    <Connector port="8443" protocol="org.apache.coyote.http11.Http11Protocol"
     maxThreads="150" SSLEnabled="true" scheme="https" secure="true"
     keystoreFile="/Users/aldaris/server.jks" keystorePass="changeit"
     truststoreFile="/Users/aldaris/trust.jks" truststorePass="changeit"
     clientAuth="true" sslProtocol="TLS" />
    
  • Add your user’s public certificate into the truststore (since the container performs the authentication, it needs to trust the client):
    keytool -import -keystore /Users/aldaris/trust.jks -file client.crt -alias demo

    In cases when the truststoreFile attribute is missing from the Connector settings, Tomcat by default falls back to the JVM’s truststore, so make sure you update the correct truststore.

  • Restart the container

Now if you did everything correctly, you should be able to go to /openam/UI/Login?module=cert and the browser should ask for the client certificate. After selecting the correct certificate you should get access to OpenAM right away.

It’s time to test it via REST

For my tests I’m going to use curl, I must say though that curl on OSX Mavericks (7.30) isn’t really capable to do client authentication, hence I would suggest installing curl from MacPorts (7.36) instead.
To perform the login via REST one would:

$ curl -X POST -v -k --cert-type pem --cert client.crt --key client.key.org "https://openam.example.com:8443/openam/json/authenticate?authIndexType=module&authIndexValue=cert"

And if you want to authenticate into a subrealm:

$ curl -X POST -v -k --cert-type pem --cert client.crt --key client.key.org "https://openam.example.com:8443/openam/json/subrealm/authenticate?authIndexType=module&authIndexValue=cert"

Simple as that. πŸ˜‰

How to use the mailing lists

In this article I will explain how to effectively ask for help on ForgeRock mailing lists. Following these simple guidelines will help you to solve your problem much more quickly and will benefit others in the community.

Look up the documentation first

In case you have a question about a certain behavior, then the product documentation should be the first place to check. If the documentation is sparse or ambiguous on a given subject, please do file a documentation bug about it in our public JIRA.

Check if your question has been asked already

The mailing list has an official archive (and a non-official mirror), both of which should be searchable using your favorite search engine. When using Google you can use advanced search filters like “site:lists.forgerock.org” which can increase the number of useful hits.

Check debug logs for clues

OpenAM has very verbose debug logging, so be sure that you’ve checked it for additional clues. Please note that in case of OpenAM, the debug logs are under the debug folder and the log folder actually only contains audit logs. For more details on debug logs please see this wiki article, for other useful troubleshooting tips see my blog post.

Provide version details

When encountering a bug/issue it is very useful to have version details, as it makes quite a difference whether you are using a 3,5 year old version of the product or the latest nightly. With older versions it is more likely that you’ve ran into an issue that is already fixed, while with a more recent version of the product we can rule all those out and suspect a possible regression/new bug.

Provide error messages

You’ve ran into some odd behavior and see some error message in your browser? Provide the exact error message you’ve seen. If it is a generic error message (like Authentication failed), then assume that it won’t be useful and think about including additional details of the use-case.

Be clear in your e-mails

Think about the mailing list members as people who spend time on trying to help you, so the least you can do is try to be as clear as possible, so make sure

  • the problem statement is understandable and has the necessary level of detail
  • it is absolutely clear what isn’t working under what scenario.

When the problem is resolved, provide solution

So you just spent a good day looking at some odd problem? Be sure you share your freshly gained knowledge and provide the solution for the other members.

Help others if you can

The mailing list is a community effort, so don’t just ask for help, but help others if you can. It is a really good feeling to resolve someone’s problem, believe me. Plus you never know how many free beer offers you can get out of these. πŸ™‚

One-Time Passwords – HOTP and TOTP

About One-Time Passwords in general

One-Time Passwords (OTP) are pretty much what their name says: a password that can be only used one time. Comparing to regular passwords OTP is considered safer since the password keeps on changing, meaning that it isn’t vulnerable against replay attacks.

When it comes to authentication mechanisms, usually OTP is used as an additional authentication mechanism (hence OTP is commonly referred to as two factor authentication/second factor authentication/step-up authentication). The main/first authentication step is still using regular passwords, so in order your authentication to be successful you need to prove two things: knowledge and possession of a secret.

Since the OTP is only usable once, memorizing them isn’t quite simple. There is two main way of acquiring these One-Time Passwords:

  • hardware tokens: for example YubiKey devices, which you can plug in to your USB port and will automatically type in the OTP code for you.
  • software tokens: like Google Authenticator, in this case a simple Android application displays you the OTP code which you can enter on your login form.

There is two main standard for generating One-Time Passwords: HOTP and TOTP, both of which are governed by the Initiative For Open Authentication (OATH). In the followings we will discuss the differences between these algorithms and finally we will attempt to use these authentication mechanisms with OpenAM.

Hmac-based One-Time Password algorithm

This algorithm relies on two basic things: a shared secret and a moving factor (a.k.a counter). As part of the algorithm an HmacSHA1 hash (to be precise it’s a hash-based message authentication code) of the moving factor will be generated using the shared secret. This algorithm is event-based, meaning that whenever a new OTP is generated, the moving factor will be incremented, hence the subsequently generated passwords should be different each time.

Time-based One-Time Password algorithm

This algorithm works similarly to HOTP: it also relies on a shared secret and a moving factor, however the moving factor works a bit differently. In case of TOTP, the moving factor constantly changes based on the time passed since an epoch. The HmacSHA1 is calculated in the same way as with HOTP.

Which one is better?

The main difference between HOTP and TOTP is that the HOTP passwords can be valid for an unknown amount of time, while the TOTP passwords keep on changing and are only valid for a short window in time. Because of this difference generally speaking the TOTP is considered as a more secure One-Time Password solution.

So how does OpenAM implement OTP?

In OpenAM currently there is two authentication module offering One-Time Password capabilities: HOTP and OATH. So let’s see what is the difference between them and how to use them in practice.

HOTP authentication module

The HOTP module – as you can figure – tries to implement the Hmac-based One-Time Password algorithm, but it does so by circumventing some portions of the specification. Let me try to summarize what’s different exactly:

  • The shared secret isn’t really shared between the user and OpenAM, actually it is just a freshly generated random number, each HOTP code will be based on a different shared secret.
  • The moving factor is statically held in OpenAM, and it just keeps on incrementing by each user performing HOTP based authentications.
  • A given HOTP code is only valid for a limited amount of time (so it’s a bit similar to TOTP in this sense).

These differences also mean that it does not work with a hardware/software token since the OTP code generation depends on OpenAM’s internal state and not on shared information. So in order to be able to use the generated OTP code, OpenAM has to share it somehow with the user: this can be via SMS or E-mail or both.

So let’s create an example HOTP module using ssoadm:

openam/bin/ssoadm create-auth-instance --realm / --name HOTP --authtype HOTP --adminid amadmin --password-file .pass

Now we have an HOTP module in our root realm, let’s configure it:

openam/bin/ssoadm update-auth-instance --realm / --name HOTP --adminid amadmin --password-file .pass -D hotp.properties

Where hotp.properties contains:

sunAMAuthHOTPAuthLevel=0
sunAMAuthHOTPSMSGatewayImplClassName=com.sun.identity.authentication.modules.hotp.DefaultSMSGatewayImpl
sunAMAuthHOTPSMTPHostName=smtp.gmail.com
sunAMAuthHOTPSMTPHostPort=465
sunAMAuthHOTPSMTPUserName=example
sunAMAuthHOTPSMTPUserPassword=secret123
sunAMAuthHOTPSMTPSSLEnabled=SSL
sunAMAuthHOTPSMTPFromAddress=demo@example.com
sunAMAuthHOTPPasswordValidityDuration=5
sunAMAuthHOTPPasswordLength=8
sunAMAuthHOTPasswordDelivery=E-mail
sunAMAuthHOTPAutoClicking=true
openamEmailAttribute=mail

NOTE: The names of the service attributes can be found in the OPENAM_HOME/config/xml/amAuthHOTP.xml file.

At this stage I’ve modified the built-in demo user’s e-mail address, so I actually receive the OTP codes for my tests.

Last step is to create an authentication chain with the new HOTP module:

openam/bin/ssoadm create-auth-cfg --realm / --name hotptest --adminid amadmin --password-file .pass
openam/bin/ssoadm add-auth-cfg-entr --realm / --name hotptest --modulename DataStore --criteria REQUISITE --adminid amadmin --password-file .pass
openam/bin/ssoadm add-auth-cfg-entr --realm / --name hotptest --modulename HOTP --criteria REQUIRED --adminid amadmin --password-file .pass

Here the first command created the chain itself, the other two commands just added the necessary modules to the chain.

We can test this now by visiting /openam/UI/Login?service=hotptest . Since I don’t like to take screenshots you just have to believe me that the authentication was successful and I’ve received my HOTP code in e-mail, which was in turn accepted by the HOTP module. πŸ™‚

OATH authentication module

The OATH module is a more recent addition to OpenAM which implements both HOTP and TOTP as they are defined in the corresponding RFCs. This is how they work under the hood:

  • Both OTP mode retrieves the shared secret from the user profile (e.g. it is defined in an LDAP attribute of the user’s entry).
  • HOTP retrieves the moving factor from the user profile.
  • TOTP does not let a user to authenticate within the same TOTP window (since a given TOTP may be valid for several time windows), hence it stores the last login date in the user profile.

For the following tests I’ve been using the Google Authenticator Android app to generate my OTP codes.

Testing OATH + HOTP

First set up the authentication module instance:

openam/bin/ssoadm create-auth-instance --realm / --name OATH --authtype OATH --adminid amadmin --password-file .pass
openam/bin/ssoadm update-auth-instance --realm / --name OATH --adminid amadmin --password-file .pass -D oathhotp.properties

Where oathhotp.properties contains:

iplanet-am-auth-oath-auth-level=0
iplanet-am-auth-oath-password-length=6
iplanet-am-auth-oath-min-secret-key-length=32
iplanet-am-auth-oath-secret-key-attribute=givenName
iplanet-am-auth-oath-algorithm=HOTP
iplanet-am-auth-oath-hotp-window-size=100
iplanet-am-auth-oath-hotp-counter-attribute=sn
iplanet-am-auth-oath-add-checksum=False
iplanet-am-auth-oath-truncation-offset=-1

NOTE: here I’m using the givenName and the sn attributes for my personal tests but in a real environment I would use dedicated attributes for these of course.

Again let’s create a test authentication chain:

openam/bin/ssoadm create-auth-cfg --realm / --name oathtest --adminid amadmin --password-file .pass
openam/bin/ssoadm add-auth-cfg-entr --realm / --name oathtest --modulename DataStore --criteria REQUISITE --adminid amadmin --password-file .pass
openam/bin/ssoadm add-auth-cfg-entr --realm / --name oathtest --modulename OATH --criteria REQUIRED --adminid amadmin --password-file .pass

In order to actually test this setup I’m using a QR code generator to generate a scannable QR code for the Google Authenticator with the following value:

otpauth://hotp/Example:demo@example.com?secret=JBSWY3DPK5XXE3DEJ5TE6QKUJA======&issuer=Example&counter=1

NOTE: The Google Authenticator needs the secret key in Base32 encoded format, but the module needs the key to be in HEX encoded format (“48656c6c6f576f726c644f664f415448”) in the user profile.

After all these changes bring up /openam/UI/Login?service=oathservice and just simply ask for a new HOTP code on your phone, you should see something like this:

Google Authenticator - HOTP

Enter the same code on the login screen and you are in, yay. πŸ™‚

Testing OATH + TOTP

For testing TOTP I’ll be reusing the authentication modules and chains from the OATH HOTP test:

openam/bin/ssoadm update-auth-instance --realm / --name OATH --adminid amadmin --password-file .pass -D oathtotp.properties

Where oathtotp.properties contains:

iplanet-am-auth-oath-auth-level=0
iplanet-am-auth-oath-password-length=6
iplanet-am-auth-oath-min-secret-key-length=32
iplanet-am-auth-oath-secret-key-attribute=givenName
iplanet-am-auth-oath-algorithm=TOTP
iplanet-am-auth-oath-size-of-time-step=30
iplanet-am-auth-oath-steps-in-window=2
iplanet-am-auth-oath-last-login-time-attribute-name=sn

We need to generate a new QR code now with the following content:

otpauth://totp/Example:demo@example.com?secret=JBSWY3DPK5XXE3DEJ5TE6QKUJA======&issuer=Example

Again, let’s test it at /openam/UI/Login?service=oathtest and this time we should have a slightly different UI on the Google Authenticator:

Google Authenticator - TOTP

As you can see there is a nice indicator now showing us how long this TOTP code will remain valid.

I think now we know about OTP codes more than enough, hope you’ve found this article useful. πŸ˜‰

Troubleshooting 101

When running an application in a production environment I would say the most important thing is to ensure that the application keeps behaving correctly and the provided services remain in a nice and working state. Usually people use some sort of a monitoring solution (for example Icinga) for this, which periodically checks the health of the service, and sometimes even asserts that the service produces expected results. While this is an important part of the administration process, today we are going to talk about something else (though closely related), namely what to do when things go wrong: the service becomes unavailable, or operates with degraded performance/functionality. In the followings I will try to demonstrate some possible problems with OpenAM, but within reasonable limits the troubleshooting techniques mentioned here should be applicable to any Java based applications.

Step-by-step

As the very first step we always need to determine what is actually not working, for example: is OpenAM accessible at all? Does the container react to user requests? Is it that just certain components are not functioning correctly (e.g. authentication or policy), or is everything just completely “broken”?

The second step is just simply: DON’T PANIC. The service is currently unavailable, users can be already affected by the outage, but you need to stay calm and think about the root causes. I’ve seen it way too many times that a service was restarted right away after outage without collecting any sort of debug information. This is almost the worst thing you can possibly do to resolve the underlying problem, as without these details it is not possible to guarantee that the problem won’t reoccur again some time later. Also it is not even guaranteed that a service restart resolves the problem.. So basically if you look at it, in the end you have two choices really:

  • Either restart the service and potentially you end up missing crucial debug data to identify the root cause of the issue, and essentially you’re risking to run into this problem again causing yet another outage for you.
  • OR collect the most (and hopefully relevant) debug data from your system for later investigations (bearing in mind that during this period users are still unable to access the application), and then restart the service.

I hope I don’t have to tell you, the second option is the good choice.

In any case

Always look at the application logs (in case of OpenAM, debug logs are under the debug folder and audit logs are under the log directory). If there is nothing interesting in the logs, then have a look at the web container’s log files for further clues.

When functionality is partially unavailable

Let’s say authentication does not work in OpenAM, e.g. every user who tries to authenticate gets an authentication failure screen. In this case one of the first things you would need to look at is the OpenAM debug logs (namely Authentication), and determine which of the followings cause the problem:

  • It could be that a dependent service (like the LDAP server) is not operational, causing the application level error.
  • It could be that there is a network error between OpenAM and the remote service, e.g. there is a network partition, or the firewall decided to block some connections.
  • It could be that everything else works fine, but OpenAM just is in an erroneous state (like thinking that the LDAP server is not accessible, but actually it is).
  • Or my favorite one: a combination of these. πŸ™‚

Based on the findings you are either going to need to look at the remote service or maybe even at the network components to see if the service is otherwise accessible. In the local scenario it may be that the logs is all you got, so preferably you should get as much debug information out of the working system as you can, i.e. enable message level debug logging (if amadmin can still authenticate), and then reproduce the scenario.

Upon finding clues (through swift and not necessarily thorough analysis) it may become straightforward that some other debugging information needs to be collected, so collect those and hope for the best when restarting the system.

When everything is just broken πŸ™

Well this is not the best situation really, but there are several things that you can check, so let’s go through them one by one.

Dealing with OutOfMemoryError

Firstly OutOfMemoryErrors are usually visible in the container specific log files, and they tend to look like:

java.lang.OutOfMemoryError: Java heap space

If you see this sort of error message you should:

  • Verify that you have configured the process to use the minimal requirements for the applications (for example OpenAM likes to run with -Xmx1g -XX:MaxPermSize=256m settings as a minimum).
  • Try to collect a heap dump using the jmap command, for example:
    jmap -dump:live,format=b,file=heap.bin <PID>
    

    if that doesn’t work, then try to use the force switch:

    jmap -F -dump:format=b,file=heap.bin <PID>
    

To make sure that you have good debug data you should also do the followings (potentially before the problem actually happens):

  • Set the following JVM properties to enable automatic heap dump generation upon OOME:
    -XX:+HeapDumpOnOutOfMemoryError	-XX:HeapDumpPath=./oome.hprof	
    
  • Enable GC logging with the following JVM properties, so you can see the memory usage of the application over time:
    -Xloggc:/home/myuser/gc.log -XX:+PrintGC -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -verbose:gc
    

In case of an OutOfMemoryError usually the container stops responding to any requests, so in such cases it is useful to check the container logs or the heap dump path to see if there was an OOME.

Handling deadlocks

Deadlocks are mostly showing up quite similarly to OOMEs, as in the container stops responding to requests (or in certain cases they only affect a certain component within the application). This is why if you don’t get any response from the application, then it may as well be due to a deadlock. To handle these cases it is advised to collect several thread dumps from the JVM using the jstack tool:

jstack <PID>

NOTE: that jstack generally seems to work better when it is run by the same user as who runs the target JVM. Regardless, if it still doesn’t want to give useful input, try to use:

jstack -F <PID>

If it still doesn’t want to work, you can still attempt to run:

kill -3 <PID>

In this case the java process is not really killed, but it is instructed to generate a thread dump and print it to the container logs.

Generate a few thread dumps and save each output to different files (incremental file names are helpful), this way it should be possible to detect long running or deadlocked threads.

High CPU load

In this case the service is usually functional, however the CPU usage appear to be unusually high (based on previous monitoring data of course). In this case it is very likely that there is an application error (like endless loop), but in certain cases it can be just simply the JVM running GCs more often than expected. To hunt down the source of the problem three thing is needed:

  • thread dumps: tells you which thread is doing what exactly.
  • system monitoring output: tells you which application thread consumes the most CPU.
  • GC logs: this will tell you how often did the JVM perform GC and how long did those take, just in case the high CPU utilization is due to frequent GCs.

On Linux systems you should run:

top -H

and that should give you the necessary details about per-thread CPU usage. Match that information with the thread dump output and you got yourself a rogue thread. πŸ™‚

Conclusion

Monitoring is really nice when actively done, and sometimes can even help to identify if a given service is about to go BOOM. When there is a problem with the service, just try to collect information about the environment (use common sense!), and only attempt to restore things afterwards.

Setting up Java Fedlet with Shibboleth IdP

The Java Fedlet is basically a lightweight SAML Service Provider (SP) implementation that can be used to add SAML support to existing Java EE applications. Today we are going to try to set up the fedlet sample application with a Shibboleth IdP (available at testshib.org).

Preparing the fedlet

There is two kind of Java fedlet in general: configured and unconfigured. The configured fedlet is what you can generate on the OpenAM admin console, and that will basically preconfigure the fedlet to use the hosted OpenAM IdP instance, and it will also set up the necessary SP settings. The unconfigured fedlet on the other hand is more like starting from scratch (as the name itself suggests πŸ™‚ ) and you have to perform all the configuration steps manually. To simplify things, today we are going to use the configured fedlet for our little demo.

To get a configured fedlet first you have to install OpenAM of course. Once you have an OpenAM set up, Create a new dummy Hosted IdP (to generate a fedlet it is currently required to also have a hosted IdP):

  • On the Common Tasks page click on Create Hosted Identity Provider.
  • Leave the entity ID as the default value.
  • For the name of the New Circle Of Trust enter cot.
  • Click on the Configure button.

Now to generate the configured fedlet let’s go back to the Common Tasks page and click on Create Fedlet option.

  • Here you can set the Name to any arbitrary string, this will be the fedlet’s entity ID. For the sake of simplicity let’s use the fedlet’s URL as entity ID, e.g., http://fedlet.example.com:18080/fedlet.
  • The Destination URL of the Service Provider which will include the Fedlet setting on the other hand needs to be the exact URL of the fedlet, so for me this is just a matter of copy paste: http://fedlet.example.com:18080/fedlet.
  • Click on the Create button.

This will generate a fedlet that should be available under the OpenAM configuration directory (in my case, it was under ~/openam/myfedlets/httpfedletexamplecom18080fedlet/Fedlet.zip), let’s grab this file and unzip it to a convenient location. Now we need to edit the contents of the fedlet.war itself and modify the contents of the files under the conf folder. As a first step open sp.xml and remove the RoleDescriptor and XACMLAuthzDecisionQueryDescriptor elements from the end of the XML.

At this point in time, we have everything we need for REGISTERing our fedlet on the testshib.org site, so let’s head there and upload our metadata (sp.xml), but in order to prevent clashes with other entity configurations, we should rename the sp.xml file to something more unique first, like fedlet.example.com.xml.

After successful registration the next step is to grab the testshib IdP metadata and add it to the fedlet as idp.xml, but there are some small changes we need to make on the metadata, to make it actually work with the fedlet:

  • Remove the EntitiesDescriptor wrapping element, but make sure you copy the xmlns* attributes to the EntityDescriptor element.
  • Since now the XML has two EntityDescriptor root elements, you should only keep the one made for the IdP (i.e. the one that has the “https://idp.testshib.org/idp/shibboleth” entityID), and remove the other.

The next step now is that we need to update the idp-extended.xml file by replacing the entityID attribute’s value in the EntityConfig element to the actual entity ID of the testshib instance, which should be https://idp.testshib.org/idp/shibboleth.

After all of this we should have all the standard and extended metadata files sorted, so the last thing to sort out is to set up the Circle Of Trust between the remote IdP and the fedlet. To do that we need to edit the fedlet.cot file and update the sun-fm-trusted-providers property to have the correct IdP entity ID:

cot-name=cot
sun-fm-cot-status=Active
sun-fm-trusted-providers=https://idp.testshib.org/idp/shibboleth,http://fedlet.example.com:18080/fedlet
sun-fm-saml2-readerservice-url=
sun-fm-saml2-writerservice-url=

It’s time to start the testing now, so let’s repackage the WAR (so it has all the updated configuration files) and deploy it to an actual web container. After deploying the WAR, let’s access it at http://fedlet.example.com:18080/fedlet. Since there is no fedlet home directory yet, the fedlet suggests to click on a link to create one based on the configuration in the WAR file, so let’s click on it and hope for the best. πŸ™‚

Testing the fedlet

If we did everything correctly, we end up on a page finally where there are some details about the fedlet configuration, and there are also some links to initiate the authentication process. As a test let’s click on the Run Fedlet (SP) initiated Single Sign-On using HTTP POST binding one, and now we should be facing the testshib login page where you can provide one of the suggested credentials. After performing the login an error message is shown at the fedlet saying Invalid Status code in Response. After investigating a bit further, the debug logs tells us what’s going on under ~/fedlet/debug/libSAML2:

SPACSUtils.getResponse: got response=<samlp:Response xmlns:samlp="urn:oasis:names:tc:SAML:2.0:protocol" ID="_0b33f19185348a26fffe9c3a1aa6e652" InResponseTo="s2be040cf929456a64f444527dfc1d7413ce178531" Version="2.0" IssueInstant="2013-12-04T18:39:32Z" Destination="http://agent.sch.bme.hu:18080/fedlet/fedletapplication"><saml:Issuer xmlns:saml="urn:oasis:names:tc:SAML:2.0:assertion" Format="urn:oasis:names:tc:SAML:2.0:nameid-format:entity">https://idp.testshib.org/idp/shibboleth</saml:Issuer><samlp:Status xmlns:samlp="urn:oasis:names:tc:SAML:2.0:protocol">
<samlp:StatusCode xmlns:samlp="urn:oasis:names:tc:SAML:2.0:protocol"
Value="urn:oasis:names:tc:SAML:2.0:status:Responder">
</samlp:StatusCode>
<samlp:StatusMessage xmlns:samlp="urn:oasis:names:tc:SAML:2.0:protocol">
Unable to encrypt assertion
</samlp:StatusMessage>
</samlp:Status></samlp:Response>

Now we can also look into the testshib logs (see TEST tab), and that will tell us what was the real problem:

Could not resolve a key encryption credential for peer entity: http://fedlet.example.com:18080/fedlet

So this just tells us that the Shibboleth IdP tries to generate an encrypted assertion for our fedlet instance, however it fails to do so, because it is unable to determine the public certificate for the fedlet. This is happening because the basic fedlet metadata does not include a certificate by default, to remedy this let’s do the followings:

  • Acquire the PEM encoded certificate for the default OpenAM certificate:
    $ keytool -exportcert -keystore ~/openam/openam/keystore.jks -alias test -file openam.crt -rfc
  • Drop the BEGIN and the END CERTIFICATE lines from the cert, so you only have the PEM encoded data, and then you’ll need to add some extra XML around it to look something like this:
    <KeyDescriptor>
    <ds:KeyInfo xmlns:ds="http://www.w3.org/2000/09/xmldsig#">
    <ds:X509Data>
    <ds:X509Certificate>
    MIICQDCC...
    </ds:X509Certificate>
    </ds:X509Data>
    </ds:KeyInfo>
    </KeyDescriptor>
    
  • Add the KeyDescriptor under the SPSSODescriptor as a first element.
  • Upload the updated SP metadata with the same filename (fedlet.example.com.xml) again at the testshib site.

Since the decryption process requires the presence of the private key of the certificate, we need to ensure that the private key is available, so let’s do the followings:

  • Copy the ~/openam/openam/keystore.jks file to the fedlet home directory
  • Visit the http://fedlet.example.com:18080/fedlet/fedletEncode.jsp page and enter changeit (the password of the default keystore and private key as well).
  • Grab the encrypted value and create ~/fedlet/.storepass and ~/fedlet/.keypass files containing only the encrypted password.
  • Open up ~/fedlet/sp-extended.xml and ensure that the encryptionCertAlias setting has the value of test.
  • Restart the container, so the changes are picked up.

At this stage we should retry the login process again, and if nothing went wrong you can see all the nice details of the received SAML assertion from testshib.org! It Works! πŸ™‚

Fun with Active Directory

As we all know Active Directory can cause quite a few headscratching moments in general, now I’m trying to conclude some of my findings so maybe others won’t suffer from the same thing.

Creating user fails with LDAP 53

LDAP 53 is UNWILLING_TO_PERFORM, now there is two main candidate reason for this to happen during an ADD operation:

  • SSL isn’t used. AD will not process LDAP operations that involves setting/modifying the unicodePwd attribute if the request was made on a non-secure connection. Always make sure you use LDAPS when connecting to AD.
  • The provided password does not satisfy the password policies configured in AD. This is a bit harder to identify, but for the sake of testing try to use complex passwords and see if the user gets created like that.

Changing password fails

As stated already: AD requires SSL connection when sending the unicodePwd attribute. If this wouldn’t be already enough you have to also enclose the password value with double quote characters and the password should be in a UTF-16LE encoded format. A sample code snippet would look something like:

public byte[] encodePassword(String password) {
    return (""" + password + """).getBytes(Charset.forName("UTF-16LE"));
}

There is two main way to change passwords by the way:

  • A non-administrative password change: you BIND as the user whose password needs to be changed, and then you send a ModifyRequest with a DELETE and an ADD ModificationItem containing the old and the new password respectively (in the encoded format of course).
  • Administrative password reset: in this case you BIND as an admin user and then send a ModifyRequest with a REPLACE ModificationItem only containing the new password value.

Easy, right?

UPDATE

It appears that the DELETE/ADD combination can fail with the following error:

Constraint Violation: 0000052D: AtrErr: DSID-03190F80, #1:
	0: 0000052D: DSID-03190F80, problem 1005 (CONSTRAINT_ATT_TYPE), data 0, Att 9005a (unicodePwd)

In our case the root cause was the default password minimum age policy in AD (1 days). Once I’ve ran the following command from an Administrator CMD:

net accounts /MINPWAGE:0

AD started to like my ModifyRequests.

Identity Repositories

Identity Repositories (or Data Stores if you’d like) are almost the most important part of OpenAM. They enable you to manage identities within your organization (to a certain degree), and are involved in many processes in general:

  • during authentication it is often necessary to contact the IdRepo to get informations about the user (like the e-mail address, so an OTP code can be sent out)
  • after authentication takes place the user’s profile is looked up (for example to check that the user account is not inactivated)
  • once you have a session you can easily configure a PA to retrieve profile attributes in HTTP headers
  • when you authenticate via SAML, the IdRepo is contacted to retrieve the values for the SAML attributes in the assertion
  • etc

Some (boring) background about IdRepos

As you can see IdRepo is quite an important part of the product, so let’s try to get a better understanding of it.
In general, a given identity repository basically represents a single source of identity data. In this case identity can mean users, groups (and with ODSEE it can even mean roles and filtered roles). While it is common to use a regular relational database to store application data, in enterprise systems it is much more common to use some LDAP-capable directory server (OpenDJ, Active Directory, etc) instead. Since OpenAM is meant to be used as an enterprise solution it has been designed to work mainly with directory servers, but it is not limited to those (there is also Database IdRepo implementation, and with some patience any other sort of implementation can be written just as well). To make it simple let’s only talk about directory servers for now…

So as you already can guess, OpenAM has an LDAPv3Repo named IdRepo implementation which basically should be able to communicate with any LDAPv3 compliant directory server. While this sounds great, this also means that the implementation has grown quite big with time. There are parts for example where special handling is required for a given directory server type (yes, I’m talking about AD)..
In case of versions prior to 10.2.0, the LDAPv3Repo implementation has been using the quite old Netscape LDAP SDK to perform the LDAP operations. Although this solution works, I must say that support for the Netscape SDK is pretty much non-existent (and on top of that the whole SDK has been basically repackaged within OpenAM), so when it comes to debugging, it can become quite difficult to tackle down a given problem (again, we are talking about legacy code here). Because of this, and the fact that the OpenDJ team has started to develop an own LDAP SDK (which by the way ROCKS!), we decided to drop Netscape from the product. The good news is that this change already starts with 10.2.0 (though unfortunately there will be still parts of OpenAM using the old SDK, but hopefully not for too long).

So how do IdRepos work?

You can define any amount of Identity Repositories (Data Stores) within a realm, the only caveat is that all of the configured IdRepos will be contacted whenever an operation needs to be performed. You may ask why, well here it is:
OpenAM uses IdRepos to abstract away from identity repository differences, and within a realm it uses AMIdentityRepository to abstract away from the multiple IdRepo concept. This means basically that OpenAM only knows about the single AMIdentityRepository, which then knows everything about the different IdRepo configurations within the realm. To complicate things AMIdentityRepository is using IdServices to access all the IdRepo implementations, so let’s see the 4 different IdServices implementation for reference:

  • IdServicesImpl – the most basic implementation, this one just looks through the IdRepo list and executes the given operation on all of them. At the end it combines the results from the different IdRepos.
  • IdCachedServicesImpl – this implementation stores operation results in a cache, and when a given requested data cannot be found in the cache (which by the way is invalidated based on persistent search results), it just asks IdServicesImpl to contact the IdRepos for it.
  • IdRemoteServicesImpl – a remote implementation which is using JAX-RPC to contact an OpenAM server (which in turn will use Id(Cached)ServicesImpl) to perform operations remotely
  • IdRemoteCachedServicesImpl – as its name suggests: caches on the client side and asks OpenAM via IdRemoteServicesImpl if there is a cache miss.

Because IdRepos are implemented this way, you can hopefully see why we say that OpenAM wasn’t meant to be used for identity management. Creating/reading/updating/deleting entries from all the configured IdRepos cannot always fulfill all the requirements, and the expected behavior is poorly defined (for example what happens if you run a delete on an entry that only exists in one IdRepo?).

Alright, alright, but what’s new in OpenAM 10.2.0?

We have implemented a new IdRepo implementation, which is only using OpenDJ LDAP SDK to perform LDAP operations, keeping in mind that the behavior should be similar to the old Netscape-based IdRepo implementation. The main changes however are:

  • The LDAP failover code has been rewritten, and it should be much more reliable with the new release.
  • In the old version the user search attribute was misused, and it was basically handled as the naming attribute of the entry (i.e. first attribute name in the DN), this has been changed, so now the user search attribute is only used when performing search operations in the IdRepo, and the already existing naming attribute setting is used as naming attribute now.
  • Two new option has been introduced: heartbeat interval and heartbeat timeunit. These settings control the interval for sending out heartbeat search requests for idle connections. This means that if the connection is not idle, there won’t be any heartbeat request either.
  • Because of this the idle timeout setting has been removed. When there is a network component closing idle connections after a certain period of time, you should just configure the heartbeat interval correctly, and that should basically prevent the connection becoming idle.
  • Data store level cache has been removed (not implemented).

I think we can all agree on the fact that 10.2.0 will be an exciting release. πŸ™‚