IDM Deployment Patterns — Centralized Repo- Based vs. Immutable File-Based

Introduction

I recently blogged about how customers can architect ForgeRock Access Management to support an immutable, DevOps style deployment pattern — see link. In this post, we’ll take a look at how to do this for ForgeRock Identity Management (IDM).

IDM is a modern OSGi-based application, with its configuration stored as a set of JSON files. This lends itself well to either a centralized repository- (repo) based deployment pattern, or a file-based immutable pattern. This blog explores both options, and summarizes the advantages and disadvantages of each.

IDM Architecture

Before delving into the deployment patterns, it is useful to summarize the IDM architecture. IDM provides centralized, simple management, and synchronization of users, devices, and things. It is a highly flexible product, and caters to a multitude of different identity management use cases, from provisioning, self-service, password management, synchronization, and reconciliation, to workflow, relationships, and task execution. For more on IDM’s architecture, check out the following link.

IDM’s core deployment architecture is split between a web application running in an OSGi framework within a Jetty Web Server, and a supported repo. See this link for a list of supported repos.

Within the IDM web application, the following components are stored:

  • Apache Felix and Jetty Web Server hosting the IDM binaries
  • Secrets
  • Configuration and scripts — this is the topic of this blog.
  • Policy scripts
  • Audit logs (optional)
  • Workflow BAR files
  • Bundles and application JAR files (connectors, dependencies, repo drivers, etc)
  • UI files

Within the IDM repo the following components are stored:

  • Centralized copies of configuration and policies — again, the topic of this blog
  • Cluster configuration
  • Managed and system objects
  • Audit logs (optional)
  • Scheduled tasks and jobs
  • Workflow
  • Relationships and link data

Notice that configuration is listed twice, both on the IDM node’s filesystem, and within the IDM repo. This is the focus of this blog, and how manipulation of this can either support a centralized repository deployment pattern, or a file-based immutable configuration deployment pattern.

Centralized, Repo-Based Deployment Pattern

This is the out-of-the-box (OOTB) deployment pattern for IDM. In this model, all IDM nodes share the same repository to pull down their configuration on startup, and if necessary, overwrite their local files. Any configuration changes made through the UI or over REST (REST ensures consistency) are pushed to the repo and then down to each IDM node via the cluster service. The JSON configuration files within the ../conf directory on the IDM web application are present, but should not be manipulated directly, as this can lead to inconsistencies in the configuration between the local file system and the authoritative repo configuration.

The following component-level diagram illustrates this deployment pattern:

Configuration Settings

The main configuration items for a multi-instance, immutable, file-based deployment pattern are:

  • The ../resolver/boot.properties file — This file stores IDM boot specifics like the IDM host, ports, SSL settings, and more. The key configuration item in this file for this blog post is openidm.node.id, which needs to be a string that is unique to each IDM node to allow the cluster service to identify each host.
  • The ../conf folder — This contains all JSON configuration files. On startup, these files will be pushed to the IDM repo. As a best practice (see link), the OOTB ../conf directory should not be used. Instead, a project folder containing the contents of the ../conf and ../script directory should be created, and IDM started with the “-p </path/to/my/project/location>” flag. This ensures OOTB and custom configurations are kept separately to ease version control, upgrades, backouts, and others.
  • The ../<my_project>/conf/system.properties file. This file contains 2 key settings:
openidm.fileinstall.enabled=false

This setting can either be left-commented (for example, true by default) or uncommented, and explicitly set to true. This, combined with the setting below, pushes all configurations except those from your project’s directory (for example, . ../conf and ../script) to the repo:

openidm.config.repo.enabled=false 

This setting needs to be uncommented to ensure IDM does not read the configuration from the repo, or push the configuration to the repo:

  • The ../<my_project>/conf/config.properties file. The key setting in this file is:
felix.fileinstall.enableConfigSave=false 

This setting needs to be uncommented. This means any changes made via REST or the UI are not pushed down to the local IDM file system. This effectively makes the IDM configuration read-only, which is key to immutability.

Note: Direct manipulation of configuration files and promotion to other IDM environments can fail if the JSON files contain crypto material. See the following KB article for information on how to handle this. You can also use the IDM configexport tool (IDM version 6.5 and above).

Following are key advantages and disadvantages of this deployment pattern:

Advantages

  • Follows core DevOps patterns for an immutable configuration: push configuration into a repo like GIT, parameterize, and promoted up to production. A customer knows without a doubt which configuration is running in production.
  • This pattern offers the ability to pre-bake a configuration into an image (such as, a Docker image, an Amazon Machine Image, and others) for auto-deployment of IDM configuration using orchestration tools.
  • Supports “stack by stack” deployments, as configuration changes can be made to a single node without impacting the others. Rollback is also far simpler—just restore the previous configuration.
  • The IDM configuration is set to read-only; meaning, accidental UI or REST-based configuration changes cannot alter configuration, and can potentially go on to impact functionality.

Disadvantages

  • As each IDM node holds its own configuration, the UI cannot be used to make configuration changes. This could present a challenge to customers new to IDM.
  • The customer is left to ensure processes are put in place to ensure all IDM nodes run from exactly the same configuration. This requires strong DevOps methodologies and experience.
  • Limited benefits for customers who do not modify their IDM configuration often.

Immutable, File-Based Deployment Pattern

The key difference in this model is IDM’s configuration is not stored in the repository. Instead, IDM pulls the configuration from the local filesystem and stores it in memory. The repo is still the authoritative source for all other IDM components (cluster configuration, schedules, and optionally, audit logs, system and managed objects, links, relationships, and others).

The following component level diagram illustrates this deployment pattern:

Configuration Settings

The main configuration items for a multi-instance, immutable, file-based deployment pattern are:

  • The ../resolver/boot.properties file — This file stores IDM boot specifics like the IDM host, ports, SSL settings, and more. The key configuration item in this file for this blog post is openidm.node.id, which needs to be a string unique to each IDM node to let the cluster service identify each host.
  • The ../conf folder — This contains all JSON configuration files. On startup, these files will be pushed to the IDM repo. As a best practice, (see link), the OOTB ../conf directory should not be used. Instead, a project folder containing the contents of the ../conf and ../script directory should be created, and IDM started with the “-p </path/to/my/project/location>” flag. This ensures OOTB and custom configurations are kept separate, to ease version control, upgrades, backouts, and others.
  • The ../<my_project>/conf/system.properties file. This file contains 2 key settings:
openidm.fileinstall.enabled=false

This setting can either be left commented (for example, true by default) or uncommented, and explicitly set to true. This, combined with the setting below pushes all configurations except that from your project’s directory (such as ../conf and ../script) to the repo:

openidm.config.repo.enabled=false 

This setting needs to be uncommented to ensure IDM does not read the configuration from the repo or push the configuration to the repo:

  • The ../<my_project>/conf/config.properties file. The key setting in this file is:
felix.fileinstall.enableConfigSave=false 

This setting needs to be uncommented. This means any changes made via REST or the UI are not pushed down to the local IDM filesystem. This effectively makes the IDM configuration read-only, which is key to immutability.

Note: Direct manipulation of configuration files and promotion to other IDM environments can fail if the JSON files contain crypto material. See the following KB article for information on how to handle this. You can also use the IDM configexport tool (IDM version 6.5 and above).

The following presents key advantages and disadvantages of this deployment pattern:

Advantages

  • Follows core DevOps patterns for immutable configuration: push configuration into a repo like GIT, parameterize, and promoted up to production. A customer knows without a doubt which configuration is running in production.
  • This pattern offers the ability to pre-bake the configuration into an image (such as a Docker image, an Amazon Machine Image, and others) for auto-deployment of IDM configuration using orchestration tools.
  • Supports “stack by stack” deployments, as configuration changes can be made to a single node without impacting the others. Rollback is also far simpler—restore the previous configuration.
  • The IDM configuration is set to read-only; meaning, accidental UI or REST-based configuration changes cannot alter configuration and potentially go on to impact functionality.

Disadvantages

  • As each IDM node holds its own configuration, the UI cannot be used to make configuration changes. This could present a challenge to customers new to IDM.
  • The customer is left to guarantee processes are put in place to ensure all IDM nodes run from exactly the same configuration. This requires strong DevOps methodologies and experience.
  • Limited benefit for customers who do not modify their IDM configuration often.

Summary of Configuration Parameters

The following table summarizes the key configuration parameters used in the centralized repo, and in file-based, immutable deployment patterns:

Conclusion

There you have it, two different deployment patterns—the centralized, repo-based pattern for customers who wish to go with the OOTB configuration, and/or do not update the IDM configuration often, and the immutable, file- based deployment pattern for those customers who demand it and/or are well-versed in DevOps methodologies and wish to treat IDM like code.

An All Active Persistent Data Layer? No Way! Yes Way!

Problem statement

Most database technologies (Cloud DB as a Service offerings, traditional DBs, LDAP services, etc.) typically run in a single primary mode, with multiple secondary nodes to ensure high availability. The main rationale is it’s the only surefire way to ensure data consistency, and integrity is maintained.

If an active topology was enabled, replication delay (the amount of time it takes for a data WRITE operation on one node to propagate to a peer) may cause the following to occur:

  1. The client executes a WRITE operation on node 1.
  2. The client then executes a READ operation soon after the WRITE.
  3. In an all active topology this READ operation may target node 2, but because the data has not yet been replicated (due to load, network latency, etc) you get a data miss and application level chaos ensues.

Another scenario is lock counters:

  1. User A is an avid Manchester UTD football fan (alas more of a curse than a blessing nowadays!) and is keen to watch the game. In haste, they try to login but supply an incorrect password. The lock counter increments by +1 on User A’s profile on node 1. Counter moves from 0 to 1.
  2. User A, desperate to catch the game then quickly tries to login again, but again supplies an incorrect password.
  3. This time, if replication is not quick enough, node 2 may be targeted and thus the lock counter moves from 0 to 1 instead of from 1 to 2. Fail!!!

These scenarios and others like it mandate a single primary topology, which for high load environments, results in high cost, as the primary needs to be vertically scaled to handle all of the load (plus headroom) and wasted compute resource as the secondaries (same spec as the primary) are sat idle costing $$$ for no gain.

Tada — Roll up Affinity Based Load Balancing

ForgeRock Directory Services (DS) is the high performance, high scale, LDAP-based persistent layer product within the ForgeRock Identity Platform. Any DS instance can take both WRITE and READ operations at scale; for many customers enabling an all active infrastructure without Affinity Based Load Balancing is viable.

However, for high scale customers and/or those who need to guarantee absolute consistency of data, then Affinity Based Load Balancing, a technology unique to the ForgeRock Identity Platform is the key to enabling an all active persistence layer. Nice!

Affinity what now?

It is a load balancing algorithm built into the DS SDK which is part of both the ForgeRock Directory Proxy product and the ForgeRock Access Management (AM) product.

It works like this:

For each and every inbound LDAP request which contains a distinguished name (DN) like uid=Darinder,ou=People,o=ForgeRock, the SDK takes a hash and allocates the result to a specific DS instance. In the case of AM, all servers in the pool compute the same hash, and thus send all READ/MODIFY/DELETE requests for uid=Darinder to say DS Node 1 (origin node).

A request with a different DN (e.g. uid=Ronaldo,ou=People,o=ForgeRock) is again hashed but may be sent to DS Node 2; all READ/MODIFY/DELETE operations for uid=Ronaldo target this specific origin node and so on. This means all READ/MODIFY/DELETE operations for a specific DN always target the same DS instance, thus eliminating issues caused by replication delay and solving the scenarios (and others) described in the Problem statement above. Sweet!

The following topology depicts this architecture:

All requests from AM1 and AM2 for uid=Darinder target DS Node 1. All requests from AM1 and AM2 for uid=Ronaldo target DS Node 2.

What else does this trick DS SDK do then?

Well… The SDK also makes sure ADD requests are spread evenly across all DS nodes in the pool to not overloaded one DS node while the others remain idle.

Also for a bit of icing on top, the SDK is instance aware if the origin DS node becomes unavailable (in our example, say DS Node 1 for uid=Darinder), the SDK detects this and re-routes all requests for uid=Darinder to another DS node in the pool, and then (here’s the cherry) ensures all further requests remains sticky to this new DS node (it becomes the new origin node). Assuming data has been replicated in time; there will be no functional impact.

Oh, and when the original DS node comes back online, all requests fail back for any DNs where it was the origin server (so, in our case, uid=Darinder would flip back to DS Node 1). Booom!

Which components of the ForgeRock Platform support Affinity Based Load Balancing?

  • Directory Proxy
  • ForgeRock AM’s DS Core Token Service (CTS)
  • ForgeRock AM’s DS User / Identity Store
  • ForgeRock AM’s App and Policy Stores

Note: the AM Configuration Store does not support affinity but this is intentional as AM configuration will soon move to file-based configuration (FBC) and in the interim customers can look to deploy like this.

What are the advantages of Affinity?

  • As the title says, Affinity Based Load Balancing enables an active persistent storage layer
  • Instead of having a single massively vertically scaled primary DS instance, DS can be horizontally scaled so all nodes are primary to increase throughput and maximise compute resource.
  • As the topology is all active, smaller (read: cheaper) instances can be used; thus, significantly reducing costs, especially in a Cloud environment.
  • Eliminates functional, data integrity, and data consistency issues causes by replication delay.

More Innnnput!

To learn more about how to configure ForgeRock AM for Affinity Based Load Balancing check out this.

This blog post was first published @ https://medium.com/@darinder.shokar included here with permission.

Immutable Deployment Pattern for ForgeRock Access Management (AM) Configuration without File Based…

Immutable Deployment Pattern for ForgeRock Access Management (AM) Configuration without File Based Configuration (FBC)

Introduction

The standard Production Grade deployment pattern for ForgeRock AM is to use replicated sets of Configuration Directory Server instances to store all of AM’s configuration. The deployment pattern has worked well in the past, but is less suited to the immutable, DevOps enabled environments of today.

This blog presents an alternative view of how an immutable deployment pattern could be applied to AM in lieu of the upcoming full File Based Configuration (FBC) for AM in version 7.0 of the ForgeRock Platform. This pattern could also support easier transition to FBC.

Current Common Deployment Pattern

Currently most customers deploy AM with externalised Configuration, Core Token Service (CTS) and UserStore instances.

The following diagram illustrates such a topology spread over two sites; the focus is on the DS Config Stores hence the CTS and DS Userstore connections and replication topology have been simplified . Note this blog is still applicable to deployments which are single site.

Dual site AM deployment pattern. Focus is on the DS Configuration stores

In this topology AM uses connection strings to the DS Config stores to enable an all active Config store architecture, with each AM targeting one DS Config store as primary and the second as failover per site. Note in this model there is no cross site failover for AM to Config stores connections (possible but discouraged). The DS Config stores do communicate across site for replication to create a full mesh as do the User and CTS stores.

A slight divergence from this model and one applicable to cloud environments is to use a load balancer between AM and it’s DS Config Stores, however we have observed many customers experience problems with features such as Persistent Searches failing due to dropped connections. Hence, where possible Consulting Services recommends the use of AM Connection Strings.

It should be noted that the use of AM Connection Strings specific to each AM can only be used if each AM has a unique FQDN — for example: https://openam1.example.com:8443/openam, https://openam2.example.com:8443/openam and so on.

For more on AM Connection Strings click here

Problem Statement

This model has worked well in the past; the DS Config stores contain all the stuff AM needs to boot and operate plus a handful of runtime entries.

However, times are a changing!

The advent of Open Banking introduces potentially hundreds of thousands of OAuth2 clients, AM policies entry numbers are ever increasing and with UMA thrown in for good measure; the previously small, minimal footprint are fairly static DS Config Stores are suddenly much more dynamic and contains many thousands of entries. Managing the stuff AM needs to boot and operate and all this runtime data suddenly becomes much more complex.

TADA! Roll up the new DS App and Policy Stores. These new data stores address this by allowing separation from this stuff AM needs to boot and operate from long lived environment specifics data such as policies, OAuth2 clients, SAML entities etc. Nice!

However, one problem still remains; it is still difficult to do stack by stack deployments, blue/green type deployments, rolling deployments and/or support immutable style deployments as DS Config Store replication is in place and needs to be very carefully managed during deployment scenarios.

Some common issues:

  • Making a change to one AM can quite easily have a ripple effect through DS replication, which impacts and/or impairs the other AM nodes both within the same site or remote. This behaviour can make customers more hesitant to introduce patches, config or code changes.
  • In a dual site environment the typical deployment pattern is to stop cross site replication, force traffic to site B, disable site A, upgrade site A, test it in isolation, force traffic back to the newly deployed site A, ensure production is functional, disable traffic to site B, push replication from site A to site B and re-enable replication, upgrade site B before finally returning to normal service.
  • Complexity is further increased if App and Policy stores are not in use as the in service DS Config stores may have new OAuth2 clients, UMA data etc created during transition which needs to be preserved. So in the above scenario an LDIF export of site B’s DS Config Stores for such data needs to be taken and imported in site A prior to site A going live (to catch changes while site A deployed was in progress) and after site B is disabled another LDIF export needs to taken from B and imported into A to catch any last minute changes between the first LDIF export and the switch over. Sheesh!
  • Even in a single site deployment model managing replication as well as managing the AM upgrade/deployment itself introduces risk and several potential break points.

New Deployment Model

The real enabler for a new deployment model for AM is the introduction of App and Policy stores, which will be replicated across sites. They enable full separation from the stuff AM needs to boot and run, from environmental runtime data. In such a model the DS Config stores return to a minimal footprint, containing only AM boot data with the App and Policy Stores containing the long lived environmental runtime data which is typically subject to zero loss SLAs and long term preservation.

Another enabler is a different configuration pattern for AM, where each AM effectively has the same FQDN and serverId allowing AM to be built once and then cloned into an image to allow rapid expansion and contraction of the AM farm without having to interact with the DS Config Store to add/delete new instances or go through the build process again and again.

Finally the last key component to this model is Affinity Based Load Balancing for the Userstore, CTS, App and Policy stores to both simplify the configuration and enable an all-active datastore architecture immune to data misses as a result of replication delay and is central to this new model.

Affinity is a unique feature of the ForgeRock platform and is used extensively by many customers. For more on Affinity click here.

The proposed topology below illustrates this new deployment model and is applicable to both active-active deployments and active-standby. Note cross site replication for the User, App and CTS stores is depicted, but for global/isolated deployments may well not be required.

Localised DS Config Store for each AM with replication disabled

As the DS Config store footprint will be minimal, to enable immutable configuration and massively simplify step-by-step/blue green/rolling deployments the proposal is to move the DS Config Stores local to AM with each AM built with exactly the same FQDN and serverId. Each local DS Config Store lives in isolation and replication is not enabled between these stores.

In order to provision each DS Config Store in lieu of replication, either the same build script can be executed on each host or a quicker and more optimised approach would be to build one AM-DS Config store instance/Pod in full, clone it and deploy the complete image to deploy a new AM-DS instance. The latter approach removes the need to interact with Amster to build additional instances and for example Git to pull configuration artefacts. With this model any new configuration changes require a new package/docker image/AMI, etc, i.e. an immutable build.

At boot time AM uses its local address to connect to its DS Config Store and Affinity to connect to the user Store, CTS and the App/Policy stores.

Advantages of this model:

  • As the DS Config Stores are not replicated most AM configuration and code level changes can be implemented or rolled back (using a new image or similar) without impacting any of the other AM instances and without the complexity of managing replication. Blue/green, rolling and stack by stack deployments and upgrades are massively simplified as is rollback.
  • Enables simplified expansion and contraction of the AM pool especially if an image/clone of a full AM instance and associated DS Config instance is used. This cloning approach also protects against configuration changes in Git or other code repositories inadvertently rippling to new AM instances; the same code and configuration base is deployment everywhere.
  • Promotes the cattle vs pet paradigm, for any new configuration deploy a new image/package.
  • This approach does not require any additional instances; the existing DS Config Stores are repurposed as App/Policy stores and the DS Config Stores are hosted locally to AM (or in a small Container in the same Pod as AM).
  • The existing DS Config Store can be quickly repurposed as App/Policy Stores no new instances or data level deployment steps are required other than tuning up the JVM and potentially uprating storage; enabling rapid switching from DS Config to App/Policy Stores
  • Enabler for FBC; when FBC becomes available the local DS Config stores are simply stopped in favour of FBC. Also if transition to FBC becomes problematic, rollback is easy — fire up the local DS Config stores and revert back.

Disadvantages of this model:

  • No DS Config Store failover; if the local DS Config Store fails the AM connected to it would also fail and not recover. However, this fits well with the pets vs cattle paradigm; if a local component fails, kill the whole instance and instantiate a new one.
  • Any log systems which have logic based on individual FQDNs for AM (Splunk, etc) would need their configuration to be modified to take into account each AM now has the same FQDN.
  • This deployment pattern is only suitable for customers who have mature DevOps processes. The expectation is no changes are made in production, instead a new release/build is produced and promoted to production. If for example a customer makes changes via REST or the UI directly then these changes will not be replicated to all other AM instances in the cluster, which would severely impair performance and stability.

Conclusions

This suggested model would significantly improve a customer’s ability to take on new configuration/code changes and potentially rollback without impacting other AM servers in the pool, makes effective use of the App/Policy stores without additional kit, allows easy transition to FBC and enables DevOps style deployments.

This blog post was first published @ https://medium.com/@darinder.shokar included here with permission.