5 Minute Briefing: Designing for Security Outcomes

This is the first in a set of blogs focused on high level briefings - typically 5 minute reads, covering design patterns and meta trends relating to security architecture and design.

When it comes to cyber security design, there have been numerous ways of attempting to devise the investment profile and allocative efficiency metric.  Should we protect a $10 bike with a $6 lock, if the chance of loss is 10% - that sort of stuff.  I don’t want to tackle the measurement process per-se.

I want to focus upon taking the generic business concept of outcomes, alongside some of the uncertainty that is often associated with complex security and access control investments.

I guess, to start with, a few definitions to get us all on the same page.  Firstly, what are outcomes?  In a simple business context, an outcome is really just a forward looking statement – where do we want go get to?  What do we want to achieve?  In the objective, strategy and tactics model of analysis, it is likely the outcome could fall somewhere in the objective and possibly strategy blocks.

A really basic example of an OST breakdown could be the following:

    • Objective: fit in to my wedding dress by July 1st
    • Strategy: eat less and exercise more
    • Tactic: don’t eat snacks between meals and walk to work



So how does this fit into security?  Well cyber is typically seen as a business cost – with risk management another parallel cost centre, used to manage the implementation of cyber security investment and the subsequent returns - or associated loss reduction.

The end result of traditional information security management, is something resembling a control – essentially a pretty fine grained and repeatable step that can be measured.  Maybe something like “run anti-virus version x or above on all managed desktops”.  

But how does something so linear and pretty abstract in some cases, flow back to the business objectives?  I think in general, it doesn’t (or even can’t) which results in the inertia associated with security investment – the overall security posture is compromised and business investment in those controls is questioned.

The security control could be seen as a tactic – but is often not associated with any strategy or IT objective – and certainly very rarely associated with a business objective.  The business wants to sell widgets, not worry about security controls and quite rightly so.

Improve Security Design


So how are outcomes better than simple controls?  I think there are two aspects to this.  First is about security design and second is about security communications.

If we take the AV control previously described – what is that trying to achieve?  Maybe a more broad brush outcome, is that malware isn’t great and should be avoided.  Why should malware be avoided?  Perhaps metrics can attribute firewall performance reductions of 18% due to malware call home activity, which in turn reduces the ability for the business to uphold a 5 minute response SLA for customer support calls?

Or that 2 recent data breaches, were attributable to a bot net miner browser plug-in, that resulted in 20,000 identity records being leaked at a cost of $120 per record in fines?  

Does a security outcome such as “a 25% reduction in malware activity” result in a more productive, accountable and business understandable work effort?  

It would certainly require multiple different strategies and tactics to make it successful, covering lots of different aspects of people, process and technology.  Perhaps one of the tactics involved is indeed running up to date AV.   I guess the outcome can act as both a modular umbrella and also a future proofed and autonomous way of identifying the most value driven technical control.

Perhaps outcomes really are more about reporting and accountability?


Improve Security Communications


Accountability and communications are often a major weakness of security design and risk management.  IT often doesn’t understand the nuance of certain security requirements – anyone heard of devsecops (secdevops)?

Business understanding is vitally important when it comes to security design and that really is what “security communications” is all about.  I’m not talking about TLS (nerd joke alert), but more about making sure both the business and IT functions not only use a common language, but also work towards common goals. 

Security controls tend to be less effective when seen as checkbox exercises, powered by internal and external audit processes (audit functions tend to exist in areas of market failure, where the equilibrium state of the market results in externalities….but I won’t go there here).

Controls are often abstracted away from business objectives via a risk management layer and can lose their overall effectiveness – and in turn business confidence.  Controls also tend to be implicitly out of date by the time they are designed and certainly when they implemented.

If controls are emphasised less, and security outcomes more – and making sure outcomes are tied more closely with business objectives, an alignment on accountability and in turn investment profiles can be made.

Summary


So what are trying to say?  At a high level, try to move away from controls and encourage more goals and outcomes based design when it comes to security.  By leveraging an outcomes based model, procurement and investment decisions can be crystallised and made more accountable.  

The business objectives can be contributed towards and security essentially can become more effective – resulting in fewer data breaches, better returns on investment and greater clarity on where investment should be made.

Principles of Usable Security

I want to talk about the age old trade off between the simplicity of a website or app, versus the level of friction, restriction and inhibition associated with applying security controls. There was always a tendency to split security at the other end of the cool and usable spectrum. If it was secure, it was ugly. If it was easy to use and cool, it was likely full of exploitable vulnerability. Is that still true?

In recent years, there have been significant attempts – certainly by vendors – but also by designers and architects, to meet somewhere in the middle – and deliver usable yet highly secure and robust systems. But how to do it? I want to try and capture some of those points here.

Most Advanced Yet Acceptable


I first want to introduce the concept of MAYA: Most Advanced Yet Acceptable. MAYA was a concept created by the famous industrial design genius, Raymond Loewy [1] in the 1950’s. The premise, was that to create successful innovative products, you had to reach a point of inflexion, between novelty and familiarity. If something was unusual in its nature or use, it would only appeal to a small audience. An element of familiarity had to anchor the viewer or user in order to allow incremental changes to take the product in new directions.

Observation


When it comes to designing – or redesigning a product or piece of software – it is often the case that observation is the best ingredient. Attempting to design in isolation can often lead to solutions looking for problems, or in the case of MAYA, something so novel that it is not fit for purpose. A key focus of Loewy’s modus operandi, was to observe users of the product he was aiming to improve. Be it a car, a locomotive engine or copying machine. He wanted to see how it was being broken. The good things, the bad, the obstacles, the areas which required no explanation and the areas not being used at all. The same applies when improving software flow.

Take the classic sign-up and sign-in flows seen on nearly every website and mobile application. To the end user, these flows are the application. If they fail, create unnecessary friction, or are difficult to understand, the end user will become so frustrated they are likely to attribute the entire experience to the service or product they are trying to access.  And go to the nearest competitor.

In order to improve, there needs to be a mechanism to view, track and observe how the typical end user will use and interact with the flow. Capture clicks, drop outs, the time it takes to perform certain operations. All these steps, provide invaluable input in how to create a more optimal set of interactions. These observations of course, need comparing to a baseline or some sort of acceptable SLA.


Define Usable?


But why are we observing and how to define usable in the first place? Security can be a relatively simple metric. Define some controls. Compare process or software to said controls. Apply metrics. Rinse and repeat. Simple right? But how much usability is required? And where does that usability get counted?

Usable for the End User

The most obvious stand point is usability for the end user. If we continue the sign-up and sign-in flows, they would need to be simply labelled and responsive – altering their expression dynamically depending on the device type and maybe location the end user is accessing from.

End user choice is also critical, empowering the end user without overloading them with options and overly complex decisions. Assumptions are powerful, but only if enough information is available to the back-end system, that allows for the creation of a personalised experience.

Usable for the Engineer

But the end user is only one part of the end to end delivery cycle for a product. The engineering team needs usability too. Complexity in code design, is the enemy of security. Modularity, clean interfaces and nice levels of cohesion, allow for agile and rapid feature development, that reduces the impact on unrelated areas. Simplicity in code design, makes testing simpler and helps reduce attack vectors.

Usable for the Support Team

The other main area to think about, is that of the post sales teams. How do teams support, repair and patch existing systems that are in use? Does that process inhibit either the end user happiness or underlying security posture of the system? Does it allow for enhancements or just fixes?

Reduction


A classic theme of Loewy’s designs, if you look at them over time, is that of reduction. Reduction in components, features, lines and angles involved in the overall product. By reducing the number of fields, buttons, screens and steps, the end user user then has fewer decisions to make. Fewer decisions result in fewer mistakes. Fewer mistakes result in less friction. Less friction, seems a good design choice when it comes to usability.

Fewer components, should also reduce the attack surface and the support complexity.


Incremental Change


But change needs to be incremental. An end user does not like shocks. A premise of MAYA, is to think to the future, but observe and provide value today. Making radical changes will reduce usability as features and concepts will be too alien and too novel.

Develop constructs that benefit the end user immediately, instil familiarity, that allows trust in the incremental changes that will follow.  All whist keeping those security controls in mind.


[1] - https://en.wikipedia.org/wiki/Raymond_Loewy

Principles of Usable Security

I want to talk about the age old trade off between the simplicity of a website or app, versus the level of friction, restriction and inhibition associated with applying security controls. There was always a tendency to split security at the other end of the cool and usable spectrum. If it was secure, it was ugly. If it was easy to use and cool, it was likely full of exploitable vulnerability. Is that still true?

In recent years, there have been significant attempts – certainly by vendors – but also by designers and architects, to meet somewhere in the middle – and deliver usable yet highly secure and robust systems. But how to do it? I want to try and capture some of those points here.

Most Advanced Yet Acceptable


I first want to introduce the concept of MAYA: Most Advanced Yet Acceptable. MAYA was a concept created by the famous industrial design genius, Raymond Loewy [1] in the 1950’s. The premise, was that to create successful innovative products, you had to reach a point of inflexion, between novelty and familiarity. If something was unusual in its nature or use, it would only appeal to a small audience. An element of familiarity had to anchor the viewer or user in order to allow incremental changes to take the product in new directions.

Observation


When it comes to designing – or redesigning a product or piece of software – it is often the case that observation is the best ingredient. Attempting to design in isolation can often lead to solutions looking for problems, or in the case of MAYA, something so novel that it is not fit for purpose. A key focus of Loewy’s modus operandi, was to observe users of the product he was aiming to improve. Be it a car, a locomotive engine or copying machine. He wanted to see how it was being broken. The good things, the bad, the obstacles, the areas which required no explanation and the areas not being used at all. The same applies when improving software flow.

Take the classic sign-up and sign-in flows seen on nearly every website and mobile application. To the end user, these flows are the application. If they fail, create unnecessary friction, or are difficult to understand, the end user will become so frustrated they are likely to attribute the entire experience to the service or product they are trying to access.  And go to the nearest competitor.

In order to improve, there needs to be a mechanism to view, track and observe how the typical end user will use and interact with the flow. Capture clicks, drop outs, the time it takes to perform certain operations. All these steps, provide invaluable input in how to create a more optimal set of interactions. These observations of course, need comparing to a baseline or some sort of acceptable SLA.


Define Usable?


But why are we observing and how to define usable in the first place? Security can be a relatively simple metric. Define some controls. Compare process or software to said controls. Apply metrics. Rinse and repeat. Simple right? But how much usability is required? And where does that usability get counted?

Usable for the End User

The most obvious stand point is usability for the end user. If we continue the sign-up and sign-in flows, they would need to be simply labelled and responsive – altering their expression dynamically depending on the device type and maybe location the end user is accessing from.

End user choice is also critical, empowering the end user without overloading them with options and overly complex decisions. Assumptions are powerful, but only if enough information is available to the back-end system, that allows for the creation of a personalised experience.

Usable for the Engineer

But the end user is only one part of the end to end delivery cycle for a product. The engineering team needs usability too. Complexity in code design, is the enemy of security. Modularity, clean interfaces and nice levels of cohesion, allow for agile and rapid feature development, that reduces the impact on unrelated areas. Simplicity in code design, makes testing simpler and helps reduce attack vectors.

Usable for the Support Team

The other main area to think about, is that of the post sales teams. How do teams support, repair and patch existing systems that are in use? Does that process inhibit either the end user happiness or underlying security posture of the system? Does it allow for enhancements or just fixes?

Reduction


A classic theme of Loewy’s designs, if you look at them over time, is that of reduction. Reduction in components, features, lines and angles involved in the overall product. By reducing the number of fields, buttons, screens and steps, the end user user then has fewer decisions to make. Fewer decisions result in fewer mistakes. Fewer mistakes result in less friction. Less friction, seems a good design choice when it comes to usability.

Fewer components, should also reduce the attack surface and the support complexity.


Incremental Change


But change needs to be incremental. An end user does not like shocks. A premise of MAYA, is to think to the future, but observe and provide value today. Making radical changes will reduce usability as features and concepts will be too alien and too novel.

Develop constructs that benefit the end user immediately, instil familiarity, that allows trust in the incremental changes that will follow.  All whist keeping those security controls in mind.


[1] - https://en.wikipedia.org/wiki/Raymond_Loewy

Leveraging AD Nested Groups With AM

This article comes from an issue raised by multiple customers, where ForgeRock Access Management (AM) was not able to retrieve a user’s group memberships when using Active Directory (AD) as a datastore with nested groups. I’ve read in different docs about the “embedded groups” expression, as well as the “transitive groups” or “recursive groups” or “indirect groups”, and finally, the “parent groups” expressions. I’m just quoting them all here for search engines.

As a consequence, it was not, for example, possible for AM agents or any policy engine client, such as a custom web application, to enforce access control rules based on these memberships. In the same manner, applications relying on the AM user session or profile, or custom OAUTH 2, or OpenID Connect tokens could not safely be used to retrieve the entire list of groups a user belonged to. In the best case scenario, only the “direct” groups were fetched from AD, and other errors could occur. Read more about it below:

Indeed, historically, AM has used the common memberOf or isMemberOf attribute by default, (depending on the type of LDAP user store), while AD had a different implementation that also evolved across time.

So, initially, when AM was issuing “(member=$userdn)” LDAP searches against an AD, if, for example, a user was member of the AD “Engineering” group, and that group was itself member of the “Staff” group, the above search was only returning the user’s direct group; in this case, the “Engineering” group.

A patch was written for AM to leverage a new feature of AD 2003 SP2 and above, providing the ability to retrieve the AD groups and nested groups a user belongs to, thanks to a search like this: (member:1.2.840.113556.1.4.1941:=$userdn).

See, for example, https://social.technet.microsoft.com/wiki/contents/articles/5392.active-directory-ldap-syntax-filters.aspxon this topic.

This worked in some deployments. But for some large ones, that search was slow and sometimes induced timeouts and failing requests; for example, when the AM agent was retrieving a user’s session. Thus, the agent com.sun.identity.agents.config.receive.timeout parameter had to be increased (by 4 seconds by default).

Fortunately, since AD 2012 R2, there’s a new feature available—a base search from the user’s DN (the LDAP DN of the user in AD) with a filter of
“(objectClass=user)”. Requesting the msds-memberoftransitive attribute will return the whole of the user’s groups, including the parent groups of the nested groups a user is member of. That search can be configured from the AM console.

You can find more information about that attribute here: https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-adts/c5c7d019-8d88-4bfa-b84d-4413bbf189b5

Also, our bug tracker holds a reference to that issue: https://bugster.forgerock.org/jira/browse/OPENAM-9674

But from now on, you should hopefully be able to leverage AD nested groups gracefully with AM.

Proof of Concept

This article is an overview of a proof of concept (PoC) we recently completed with one of our partners. The purpose was to demonstrate the ability to use the ForgeRock Identity platform to quickly provide rich authentication (such as biometric authentication by face recognition), and authorization capabilities to a custom mobile application, written from scratch.

Indeed, it took about two weeks to develop and integrate the mobile application with the ForgeRock Identity platform, thanks to the externalization of both the registration and the authentication code, and logic out the mobile application itself. The developers had to mostly use the ForgeRock REST API’s to implement these features.

Some important facts: 

  • While the authentication leveraged the ForgeRock REST API and authentication trees, the device registration was implemented using a mobile SDK provided by the biometric vendor. Another option could have been to use an existing ForgeRock authentication tree node for the registration, but it would have required the use of a browser. The ideal goal was to provide all the features using a single, custom mobile application for a better user experience.
  • It was decided to use Daon as the biometric authentication solution provider/vendor, even though the ForgeRock Identity platform can be integrated with other authentication solutions. The ForgeRock marketplace is a good place to figure out what’s available for whatever type of authentication you’re looking for.
  • Depending on the solution, we may or may not provide a node for user or device registration. When not available, it can be either developed, or the vendor will provide an SDK to implement registration directly with them.
  • Daon provides two different ways to manage user credentials; they can be left on the user’s device, leveraging the FIDO protocol, or they can be stored in a specific tenant on the Daon’s Identity X server. For that PoC, we chose to use the FIDO protocol, because we wanted to provide the best user experience. So, for example, with as little network latency as possible, and checking a face locally on a mobile device looked faster than having to send it (or a hash of it or so) to a server for verification.

The functional objectives of that PoC were to provide:

  • A way for the mobile application to dynamically discover the available authentication methods, rather than hardcoding or defining in the application configuration the list of methods. We did that using an authentication tree which included three authentication methods, and the ForgeRock Access Management REST API and callbacks to discover the available choices.
  • A way to provide different authentication choices based on some criteria, such as the domain of the user’s email address and the status of that user (already registered in the authentication platform or not).
  • The ability to deliver OTPs by either SMS or email, based on the user’s profile. A user profile including a mobile phone number had to trigger the OTP delivery by SMS, and by email without a filled mobile phone number.
  • Biometric authentication by face recognition, embedded in the custom mobile application, thus providing the best possible user experience, without the need to rely on an extra browser session or additional device.
  • Biometric-enabled device registration: Not only was the face recognition was at authentication time, it was also used for the device registration.
  • OAUTH 2 access tokens delivery, introspection, and usage to gain access to business APIs.
  • Protection of the APIs by authorization rules and the ForgeRock authorization engine.

The PoC logical architecture was as follows:

The authentication trees looked like this, with the first couple of screenshots showing the main tree, and the third screenshot showing the biometric authentication tree, embedded in the main tree:

In the main tree, we used a few custom JavaScript scripted nodes to implement the desired logic; for example, to expose different authentication choices based on the user’s email address domain:

Below, you can see the registration flow diagram:

  • The relying party app is the custom mobile application developed during the PoC.
  • The FIDO client in our case was the Daon’s mobile SDK running in the mobile app. That SDK is especially responsible for triggering the camera to take a picture of the user’s face.
  • The relying party server is ForgeRock Access Management.
  • The FIDO server is Daon’s Identity X server:

The depicted registration flow above is actually the one that occurs when using a browser and the ForgeRock registration node for Daon. When using a custom mobile app and the Daon mobile SDK, the registration request and responses go directly from the mobile application to the Daon’s Identity X server, so without going through ForgeRock Access Management.

On the contrary, the authentication flow always goes through ForgeRock Access Management, leveraging the nodes developed for that purpose:

Feel free to ask for questions for more details!

Overview of Options of Authentication By Face Recognition in ForgeRock Identity Platform

The following table provides solution designers and architects with a comparative overview of the different options available as of today to for authentication by face recognition to a ForgeRock Identity platform deployment.

The different columns represent some important criteria to consider when one searches for such a solution, some criteria is self-explanatory while the others are detailed below:

  • The Device agnostic column helps to figure out which type of device can be used (any vs a subset).
  • The Requires a 3rdparty solution column indicates whether or not other software is required in addition to the ForgeRock platform.
  • The Security column represents the relative level of security brought by the solution.
  • The ForgeRock supported column represents the level of effort required to integrate the solution.
  • Flows: That criteria gives an idea, from the user’s perspective (rather than from a purely technical perspective), of whether registration and/or authentication occurs with or without friction. As a rule of thumb, in band flows, it can be considered as frictionless (or so, since it involves use cases where a user needs a single device or uses a single browser session), while out of band flows can be seen as more secure (in some contexts at least), since different channels are involved. Some exceptions may exist, such as Face ID, which can be used in a purely mobile scenario (so, rather in band-like) or just as a means to register or authenticate on one side with a mobile device, while accessing a website or service from another device.

An All Active Persistent Data Layer? No Way! Yes Way!

Problem statement

Most database technologies (Cloud DB as a Service offerings, traditional DBs, LDAP services, etc.) typically run in a single primary mode, with multiple secondary nodes to ensure high availability. The main rationale is it’s the only surefire way to ensure data consistency, and integrity is maintained.

If an active topology was enabled, replication delay (the amount of time it takes for a data WRITE operation on one node to propagate to a peer) may cause the following to occur:

  1. The client executes a WRITE operation on node 1.
  2. The client then executes a READ operation soon after the WRITE.
  3. In an all active topology this READ operation may target node 2, but because the data has not yet been replicated (due to load, network latency, etc) you get a data miss and application level chaos ensues.

Another scenario is lock counters:

  1. User A is an avid Manchester UTD football fan (alas more of a curse than a blessing nowadays!) and is keen to watch the game. In haste, they try to login but supply an incorrect password. The lock counter increments by +1 on User A’s profile on node 1. Counter moves from 0 to 1.
  2. User A, desperate to catch the game then quickly tries to login again, but again supplies an incorrect password.
  3. This time, if replication is not quick enough, node 2 may be targeted and thus the lock counter moves from 0 to 1 instead of from 1 to 2. Fail!!!

These scenarios and others like it mandate a single primary topology, which for high load environments, results in high cost, as the primary needs to be vertically scaled to handle all of the load (plus headroom) and wasted compute resource as the secondaries (same spec as the primary) are sat idle costing $$$ for no gain.

Tada — Roll up Affinity Based Load Balancing

ForgeRock Directory Services (DS) is the high performance, high scale, LDAP-based persistent layer product within the ForgeRock Identity Platform. Any DS instance can take both WRITE and READ operations at scale; for many customers enabling an all active infrastructure without Affinity Based Load Balancing is viable.

However, for high scale customers and/or those who need to guarantee absolute consistency of data, then Affinity Based Load Balancing, a technology unique to the ForgeRock Identity Platform is the key to enabling an all active persistence layer. Nice!

Affinity what now?

It is a load balancing algorithm built into the DS SDK which is part of both the ForgeRock Directory Proxy product and the ForgeRock Access Management (AM) product.

It works like this:

For each and every inbound LDAP request which contains a distinguished name (DN) like uid=Darinder,ou=People,o=ForgeRock, the SDK takes a hash and allocates the result to a specific DS instance. In the case of AM, all servers in the pool compute the same hash, and thus send all READ/MODIFY/DELETE requests for uid=Darinder to say DS Node 1 (origin node).

A request with a different DN (e.g. uid=Ronaldo,ou=People,o=ForgeRock) is again hashed but may be sent to DS Node 2; all READ/MODIFY/DELETE operations for uid=Ronaldo target this specific origin node and so on. This means all READ/MODIFY/DELETE operations for a specific DN always target the same DS instance, thus eliminating issues caused by replication delay and solving the scenarios (and others) described in the Problem statement above. Sweet!

The following topology depicts this architecture:

All requests from AM1 and AM2 for uid=Darinder target DS Node 1. All requests from AM1 and AM2 for uid=Ronaldo target DS Node 2.

What else does this trick DS SDK do then?

Well… The SDK also makes sure ADD requests are spread evenly across all DS nodes in the pool to not overloaded one DS node while the others remain idle.

Also for a bit of icing on top, the SDK is instance aware if the origin DS node becomes unavailable (in our example, say DS Node 1 for uid=Darinder), the SDK detects this and re-routes all requests for uid=Darinder to another DS node in the pool, and then (here’s the cherry) ensures all further requests remains sticky to this new DS node (it becomes the new origin node). Assuming data has been replicated in time; there will be no functional impact.

Oh, and when the original DS node comes back online, all requests fail back for any DNs where it was the origin server (so, in our case, uid=Darinder would flip back to DS Node 1). Booom!

Which components of the ForgeRock Platform support Affinity Based Load Balancing?

  • Directory Proxy
  • ForgeRock AM’s DS Core Token Service (CTS)
  • ForgeRock AM’s DS User / Identity Store
  • ForgeRock AM’s App and Policy Stores

Note: the AM Configuration Store does not support affinity but this is intentional as AM configuration will soon move to file-based configuration (FBC) and in the interim customers can look to deploy like this.

What are the advantages of Affinity?

  • As the title says, Affinity Based Load Balancing enables an active persistent storage layer
  • Instead of having a single massively vertically scaled primary DS instance, DS can be horizontally scaled so all nodes are primary to increase throughput and maximise compute resource.
  • As the topology is all active, smaller (read: cheaper) instances can be used; thus, significantly reducing costs, especially in a Cloud environment.
  • Eliminates functional, data integrity, and data consistency issues causes by replication delay.

More Innnnput!

To learn more about how to configure ForgeRock AM for Affinity Based Load Balancing check out this.

This blog post was first published @ https://medium.com/@darinder.shokar included here with permission.

Immutable Deployment Pattern for ForgeRock Access Management (AM) Configuration without File Based…

Immutable Deployment Pattern for ForgeRock Access Management (AM) Configuration without File Based Configuration (FBC)

Introduction

The standard Production Grade deployment pattern for ForgeRock AM is to use replicated sets of Configuration Directory Server instances to store all of AM’s configuration. The deployment pattern has worked well in the past, but is less suited to the immutable, DevOps enabled environments of today.

This blog presents an alternative view of how an immutable deployment pattern could be applied to AM in lieu of the upcoming full File Based Configuration (FBC) for AM in version 7.0 of the ForgeRock Platform. This pattern could also support easier transition to FBC.

Current Common Deployment Pattern

Currently most customers deploy AM with externalised Configuration, Core Token Service (CTS) and UserStore instances.

The following diagram illustrates such a topology spread over two sites; the focus is on the DS Config Stores hence the CTS and DS Userstore connections and replication topology have been simplified . Note this blog is still applicable to deployments which are single site.

Dual site AM deployment pattern. Focus is on the DS Configuration stores

In this topology AM uses connection strings to the DS Config stores to enable an all active Config store architecture, with each AM targeting one DS Config store as primary and the second as failover per site. Note in this model there is no cross site failover for AM to Config stores connections (possible but discouraged). The DS Config stores do communicate across site for replication to create a full mesh as do the User and CTS stores.

A slight divergence from this model and one applicable to cloud environments is to use a load balancer between AM and it’s DS Config Stores, however we have observed many customers experience problems with features such as Persistent Searches failing due to dropped connections. Hence, where possible Consulting Services recommends the use of AM Connection Strings.

It should be noted that the use of AM Connection Strings specific to each AM can only be used if each AM has a unique FQDN — for example: https://openam1.example.com:8443/openam, https://openam2.example.com:8443/openam and so on.

For more on AM Connection Strings click here

Problem Statement

This model has worked well in the past; the DS Config stores contain all the stuff AM needs to boot and operate plus a handful of runtime entries.

However, times are a changing!

The advent of Open Banking introduces potentially hundreds of thousands of OAuth2 clients, AM policies entry numbers are ever increasing and with UMA thrown in for good measure; the previously small, minimal footprint are fairly static DS Config Stores are suddenly much more dynamic and contains many thousands of entries. Managing the stuff AM needs to boot and operate and all this runtime data suddenly becomes much more complex.

TADA! Roll up the new DS App and Policy Stores. These new data stores address this by allowing separation from this stuff AM needs to boot and operate from long lived environment specifics data such as policies, OAuth2 clients, SAML entities etc. Nice!

However, one problem still remains; it is still difficult to do stack by stack deployments, blue/green type deployments, rolling deployments and/or support immutable style deployments as DS Config Store replication is in place and needs to be very carefully managed during deployment scenarios.

Some common issues:

  • Making a change to one AM can quite easily have a ripple effect through DS replication, which impacts and/or impairs the other AM nodes both within the same site or remote. This behaviour can make customers more hesitant to introduce patches, config or code changes.
  • In a dual site environment the typical deployment pattern is to stop cross site replication, force traffic to site B, disable site A, upgrade site A, test it in isolation, force traffic back to the newly deployed site A, ensure production is functional, disable traffic to site B, push replication from site A to site B and re-enable replication, upgrade site B before finally returning to normal service.
  • Complexity is further increased if App and Policy stores are not in use as the in service DS Config stores may have new OAuth2 clients, UMA data etc created during transition which needs to be preserved. So in the above scenario an LDIF export of site B’s DS Config Stores for such data needs to be taken and imported in site A prior to site A going live (to catch changes while site A deployed was in progress) and after site B is disabled another LDIF export needs to taken from B and imported into A to catch any last minute changes between the first LDIF export and the switch over. Sheesh!
  • Even in a single site deployment model managing replication as well as managing the AM upgrade/deployment itself introduces risk and several potential break points.

New Deployment Model

The real enabler for a new deployment model for AM is the introduction of App and Policy stores, which will be replicated across sites. They enable full separation from the stuff AM needs to boot and run, from environmental runtime data. In such a model the DS Config stores return to a minimal footprint, containing only AM boot data with the App and Policy Stores containing the long lived environmental runtime data which is typically subject to zero loss SLAs and long term preservation.

Another enabler is a different configuration pattern for AM, where each AM effectively has the same FQDN and serverId allowing AM to be built once and then cloned into an image to allow rapid expansion and contraction of the AM farm without having to interact with the DS Config Store to add/delete new instances or go through the build process again and again.

Finally the last key component to this model is Affinity Based Load Balancing for the Userstore, CTS, App and Policy stores to both simplify the configuration and enable an all-active datastore architecture immune to data misses as a result of replication delay and is central to this new model.

Affinity is a unique feature of the ForgeRock platform and is used extensively by many customers. For more on Affinity click here.

The proposed topology below illustrates this new deployment model and is applicable to both active-active deployments and active-standby. Note cross site replication for the User, App and CTS stores is depicted, but for global/isolated deployments may well not be required.

Localised DS Config Store for each AM with replication disabled

As the DS Config store footprint will be minimal, to enable immutable configuration and massively simplify step-by-step/blue green/rolling deployments the proposal is to move the DS Config Stores local to AM with each AM built with exactly the same FQDN and serverId. Each local DS Config Store lives in isolation and replication is not enabled between these stores.

In order to provision each DS Config Store in lieu of replication, either the same build script can be executed on each host or a quicker and more optimised approach would be to build one AM-DS Config store instance/Pod in full, clone it and deploy the complete image to deploy a new AM-DS instance. The latter approach removes the need to interact with Amster to build additional instances and for example Git to pull configuration artefacts. With this model any new configuration changes require a new package/docker image/AMI, etc, i.e. an immutable build.

At boot time AM uses its local address to connect to its DS Config Store and Affinity to connect to the user Store, CTS and the App/Policy stores.

Advantages of this model:

  • As the DS Config Stores are not replicated most AM configuration and code level changes can be implemented or rolled back (using a new image or similar) without impacting any of the other AM instances and without the complexity of managing replication. Blue/green, rolling and stack by stack deployments and upgrades are massively simplified as is rollback.
  • Enables simplified expansion and contraction of the AM pool especially if an image/clone of a full AM instance and associated DS Config instance is used. This cloning approach also protects against configuration changes in Git or other code repositories inadvertently rippling to new AM instances; the same code and configuration base is deployment everywhere.
  • Promotes the cattle vs pet paradigm, for any new configuration deploy a new image/package.
  • This approach does not require any additional instances; the existing DS Config Stores are repurposed as App/Policy stores and the DS Config Stores are hosted locally to AM (or in a small Container in the same Pod as AM).
  • The existing DS Config Store can be quickly repurposed as App/Policy Stores no new instances or data level deployment steps are required other than tuning up the JVM and potentially uprating storage; enabling rapid switching from DS Config to App/Policy Stores
  • Enabler for FBC; when FBC becomes available the local DS Config stores are simply stopped in favour of FBC. Also if transition to FBC becomes problematic, rollback is easy — fire up the local DS Config stores and revert back.

Disadvantages of this model:

  • No DS Config Store failover; if the local DS Config Store fails the AM connected to it would also fail and not recover. However, this fits well with the pets vs cattle paradigm; if a local component fails, kill the whole instance and instantiate a new one.
  • Any log systems which have logic based on individual FQDNs for AM (Splunk, etc) would need their configuration to be modified to take into account each AM now has the same FQDN.
  • This deployment pattern is only suitable for customers who have mature DevOps processes. The expectation is no changes are made in production, instead a new release/build is produced and promoted to production. If for example a customer makes changes via REST or the UI directly then these changes will not be replicated to all other AM instances in the cluster, which would severely impair performance and stability.

Conclusions

This suggested model would significantly improve a customer’s ability to take on new configuration/code changes and potentially rollback without impacting other AM servers in the pool, makes effective use of the App/Policy stores without additional kit, allows easy transition to FBC and enables DevOps style deployments.

This blog post was first published @ https://medium.com/@darinder.shokar included here with permission.

Integrating ForgeRock Identity Platform 6.5

Integrating The ForgeRock Identity Platform 6.5

It’s a relatively common requirement to need to integrate the products that make up the ForgeRock Identity Platform. The IDM Samples Guide contains a good working example of just how to do this. Each version of the ForgeRock stack has slight differences, both in the products themselves, as well as the integrations. As such this blog will focus on version 6.5 of the products and will endeavour to include as much useful information to speed integrations for readers of this blog, including sample configuration files, REST calls etc.

In this integration IDM acts as an OIDC Relying Party, talking to AM as the OIDC Provider using the OAuth 2.0 authorization grant. The following sequence diagram illustrates successful processing from the authorization request, through grant of the authorization code, and ID token from the authorization provider, AM. You can find more details in the IDM Samples Guide.

Full Stack Authorization Code Flow

Sample files

For this integration I’ve included configured sample files which can be found by accessing the link below, modified, and either used as an example or just dropped straight into your test environment. It should not have to be said but just in case…DO NOT deploy these to production without appropriate testing / hardening: https://stash.forgerock.org/users/mark.nienaber_forgerock.com/repos/fullstack6.5/.

Postman Collection

For any REST calls made in this blog you’ll find the Postman collection available here: https://documenter.getpostman.com/view/5814408/SVfJVBYR?version=latest

Sample Scripts

We will set up some basic vanilla instances of our products to get started. I’ve provided some scripts to install both DS as well as AM.

Products used in this integration

Documentation can be found https://backstage.forgerock.com/docs/.

Note: In this blog I install the products under my home directory, this is not best practice, but keep in mind the focus is on the integration, and not meant as a detailed install guide.

Setup new DS instances

On Server 1 (AM Server) install a fresh DS 6.5 instance for an external AM config store (optional) and a DS repository to be shared with IDM.

Add the following entries to the hosts file:

sudo vi /etc/hosts
127.0.0.1       localhost opendj.example.com amconfig1.example.com openam.example.com

Before running the DS install script you’ll need to copy the Example.ldif file from the full-stack IDM sample to the DS/AM server. You can do this manually or use SCP from the IDM server.

scp ~/openidm/samples/full-stack/data/Example.ldif fradmin@opendj.example.com:~/Downloads

Modify the sample file installDSFullStack6.5.sh including:

  • server names
  • port numbers
  • location of DS-6.5.0.zip file
  • location of the Example.ldif from the IDM full-stack sample.

Once completed, run to install both an external AM Config store as well as the DS shared repository. i.e.

script_dir=`pwd`
ds_zip_file=~/Downloads/DS/DS-6.5.0.zip
ds_instances_dir=~/opends/ds_instances
ds_config=${ds_instances_dir}/config
ds_fullstack=${ds_instances_dir}/fullstack
#IDMFullStackSampleRepo
ds_fullstack_server=opendj.example.com
ds_fullstack_server_ldapPort=5389
ds_fullstack_server_ldapsPort=5636
ds_fullstack_server_httpPort=58081
ds_fullstack_server_httpsPort=58443
ds_fullstack_server_adminConnectorPort=54444
#Config
ds_amconfig_server=amconfig1.example.com
ds_amconfig_server_ldapPort=3389
ds_amconfig_server_ldapsPort=3636
ds_amconfig_server_httpPort=38081
ds_amconfig_server_httpsPort=38443
ds_amconfig_server_adminConnectorPort=34444

Now let’s run the script on Server 1, the AM / DS server, to create a new DS instances.

chmod +x ./installdsFullStack.sh
./installdsFullStack.sh
Install DS instances

You now have 2 DS servers installed and configured, let’s install AM.

Setup new AM server

On Server 1 we will install a fresh AM 6.5.2 on Tomcat using the provided Amster script.

Assuming the tomcat instance is started drop the AM WAR file under the webapps directory renaming the context to secure (change this as you wish).

cp ~/Downloads/AM-6.5.2.war ~/tomcat/webapps/secure.war

Copy the amster script install_am.amster into your amster 6.5.2 directory and make any modifications as required.

install-openam 
--serverUrl http://openam.example.com:8080/secure 
--adminPwd password 
--acceptLicense 
--cookieDomain example.com 
--cfgStoreDirMgr 'uid=am-config,ou=admins,ou=am-config' 
--cfgStoreDirMgrPwd password 
--cfgStore dirServer 
--cfgStoreHost amconfig1.example.com 
--cfgStoreAdminPort 34444 
--cfgStorePort 3389 
--cfgStoreRootSuffix ou=am-config 
--userStoreDirMgr 'cn=Directory Manager' 
--userStoreDirMgrPwd password 
--userStoreHost opendj.example.com  
--userStoreType LDAPv3ForOpenDS 
--userStorePort 5389 
--userStoreRootSuffix dc=example,dc=com
:exit

Now run amster and pass it this script to install AM. You can do this manually if you like, but scripting will make your life easier and allow you to repeat it later on.

cd amster6.5.2/
./amster install_am.amster
Install AM

At the end of this you’ll have AM installed and configured to point to the DS instances you set up previously.

AM Successfully installed

Setup new IDM server

On Server 2 install/unzip a new IDM 6.5.0.1 instance.

Make sure the IDM server can reach AM and DS servers by adding an entry to the hosts file.

sudo vi /etc/hosts
Hosts file entry

Modify the hostname and port of the IDM instance in the boot.properties file. An example boot.properties file can be found HERE.

cd ~/openidm
vi /resolver/boot.properties

Set appropriate openidm.host and openidm.port.http/s.

openidm/resolver/boot.properties

We now have the basic AM / IDM and DS setup and are ready to configure each of the products.

Configure IDM to point to shared DS repository

Modify the IDM LDAP connector file to point to the shared repository. An example file can be found HERE.

vi ~/openidm/samples/full-stack/conf/provisioner.openicf-ldap.json

Change “host” and “port” to match shared repo configured above.

provisioner.openicf-ldap.json

Configure AM to point to shared DS repository

Login as amadmin and browse to Identity Stores in the realm you’re configuring. For simplicity I’m using Top Level Realm , DO NOT this in production!!.

Set the correct values for your environment, then select Load Schema and Save.

Shared DS Identity Store

Browse to Identities and you should now see two identities that were imported in the Example.ldif when you setup the DS shared repository.

List of identities

We will setup another user to ensure AM is configured to talk to DS correctly. Select + Add Identity and fill in the values then press Create.

Create new identity

Modify some of the users values and press Save Changes.

Modify attributes

Prepare IDM

We will start IDM now and check that it’s connected to the DS shared repository.

Start the IDM full-stack sample

Enter the following commands to start the full-stack sample.

cd ~/openidm/
./startup.sh -p samples/full-stack
Start full-stack sample

Login to IDM

Login to the admin interface as openidm-admin.

http://openidm.example.com:8080/admin/

Login as openidm-admin

Reconcile DS Shared Repository

We will now run a reconciliation to pull users from the DS shared repository into IDM.

Browse to Configure, then Mappings then on the System/ldap/account→Mananged/user mapping select Reconcile

Reconcile DS shared repository

The users should now exist under Manage, Users.

User list

Let’s Login with our newly created testuser1 to make sure it’s all working.

End User UI Login

You should see the welcome screen.

Welcome Page

IDM is now ready for integration with AM.

Configure AM for integration

Now we will set up AM for integration with IDM.

Setup CORS Filter

Set up a CORS filter in AM to allow IDM as an origin. An example web.xml can be found HERE.

vi tomcat/webapps/secure/WEB-INF/web.xml

Add the appropriate CORS filter. (See sample file above)

</filter-mapping>
<filter-mapping>
<filter-name>CORSFilter</filter-name>
<url-pattern>/json/*</url-pattern>
</filter-mapping>
<filter-mapping>
<filter-name>CORSFilter</filter-name>
<url-pattern>/oauth2/*</url-pattern>
</filter-mapping>
<filter>
<filter-name>CORSFilter</filter-name>
<filter-class>org.forgerock.openam.cors.CORSFilter</filter-class>
<init-param>
<param-name>methods</param-name>
<param-value>POST,PUT,OPTIONS</param-value>
</init-param>
<init-param>
<param-name>origins</param-name>
<param-value>http://openidm.example.com:8080,http://openam.example.com:8080,http://localhost:8000,http://localhost:8080,null,file://</param-value>
</init-param>
<init-param>
<param-name>allowCredentials</param-name>
<param-value>true</param-value>
</init-param>
<init-param>
<param-name>headers</param-name>
<param-value>X-OpenAM-Username,X-OpenAM-Password,X-Requested-With,Accept,iPlanetDirectoryPro</param-value>
</init-param>
<init-param>
<param-name>expectedHostname</param-name>
<param-value>openam.example.com:8080</param-value>
</init-param>
<init-param>
<param-name>exposeHeaders</param-name>
<param-value>Access-Control-Allow-Origin,Access-Control-Allow-Credentials,Set-Cookie</param-value>
</init-param>
<init-param>
<param-name>maxAge</param-name>
<param-value>1800</param-value>
</init-param>
</filter>

Configure AM as an OIDC Provider

Login to AM as an administrator and browse to the Realm you’re configuring. As already mentioned, I’m using the Top Level Realm for simplicity.

Once you’ve browsed to the Realm, on the Dashboard, select Configure OAuth Provider.

Configure OAuth Provider

Now select Configure OpenID Connect.

Configure OIDC

If you need to modify any values like the Realm then do, but I’ll just press Create.

OIDC Server defaults

You’ll get a success message.

Success

AM is now configured as an OP, let’s set some required values. Browse to Services, then click on OAuth2 Provider.

AM Services

On the Consent tab, check the box next to Allow Clients to Skip Consent, then press Save.

Consent settings

Now browse to the Advanced OpenID Connect tab set openidm as the value for Authorized OIDC SSO Clients ( this is the name of the Relying Party / Client which we will create next). Press Save.

Authorized SSO clients

Configure Relying Party / Client for IDM

You can do these steps manually but let’s call AM’s REST interface, you can save these calls and easily replicate this step with the click of a button if you use a tool like Postman (See HERE for the Postman collection). I am using a simple CURL command from the AM server.

Firstly we’ll need an AM administrator session to create the client so call the /authenticate endpoint. (I’ve used jq for better formatting, so I’ll leave that for you to add in if required)

curl -X POST 
http://openam.example.com:8080/secure/json/realms/root/authenticate 
-H 'Accept-API-Version: resource=2.0, protocol=1' 
-H 'Content-Type: application/json' 
-H 'X-OpenAM-Password: password' 
-H 'X-OpenAM-Username: amadmin' 
-H 'cache-control: no-cache' 
-d '{}'

The SSO Token / Session will be returned as tokenId, so save this value.

Authenticate

Now we have a session we can use that in the next call to create the client. Again this will be done via REST however you can do this manually if you want. Substitute the tokenId value from above for the value of the iPlanetDrectoryPro.

curl -X POST 
'http://openam.example.com:8080/secure/json/realms/root/agents/?_action=create' 
-H 'Accept-API-Version: resource=3.0, protocol=1.0' 
-H 'Content-Type: application/json' 
-H 'iPlanetDirectoryPro: djyfFX3h97jH2P-61auKig_i23o.*AAJTSQACMDEAAlNLABxFc2d0dUs2c1RGYndWOUo0bnU3dERLK3pLY2c9AAR0eXBlAANDVFMAAlMxAAA.*' 
-d '{
"username": "openidm",
"userpassword": "openidm",
"realm": "/",
"AgentType": ["OAuth2Client"],
"com.forgerock.openam.oauth2provider.grantTypes": [
"[0]=authorization_code"
],
"com.forgerock.openam.oauth2provider.scopes": [
"[0]=openid"
],
"com.forgerock.openam.oauth2provider.tokenEndPointAuthMethod": [
"client_secret_basic"
],
"com.forgerock.openam.oauth2provider.redirectionURIs": [
"[0]=http://openidm.example.com:8080/"
],
"isConsentImplied": [
"true"
],
"com.forgerock.openam.oauth2provider.postLogoutRedirectURI": [
"[0]=http://openidm.example.com:8080/",
"[1]=http://openidm.example.com:8080/admin/"
]
}'
Create Relying Party

The OAuth Client should be created.

RP / Client created

Integrate IDM and AM

AM is now configured as an OIDC provider and has an OIDC Relying Party for IDM to use, so now we can configure the final step, that is, tell IDM to outsource authentication to AM.

Feel free to modify and copy this authentication.json file directly into your ~/openidm/samples/full-stack/conf folder or follow these steps to configure.

Browse to Configure, then Authentication.

Configure authentication

Authentication should be configured to Local, select ForgeRock Identity Provider.

Outsource authentication to AM

After you click above, the Configure ForgeRock Identity Provider page will pop up.

Set the appropriate values for

  • Well-Known Endpoint
  • Client ID
  • Client Secret
  • Note that the common datastore is set to the DS shared repository, leave this as is.

You can change the others to match your environment but be careful as the values must match those set in AM OP/RP configuration above. You can also refer to the sample the sample authentication.json. Once completed press Submit. You will be asked to re-authenticate.

IDM authentication settings

Testing the integrated environment

Everything is now configured so we are ready to test end to end.

Test End User UI

Browse to the IDM End User UI.

http://openidm.example.com:8080/

You will be directed to AM for login, let’s login with the test user we created earlier.

Login as test user

After authentication you will be directed to IDM End User UI welcome page.

Login success

Congratulations you did it!

Test IDM Admin UI

In this test you’ll login to AM as amadmin and IDM will convert this to a session for the IDM administrator openidm-admin.

This is achieved through the following script:

~/openidm/bin/defaults/script/auth/amSessionCheck.js

Specifically this code snippet is used:

if (security.authenticationId.toLowerCase() === "amadmin") {
security.authorization = {
"id" : "openidm-admin",
"component" : "internal/user",
"roles" : ["internal/role/openidm-admin", "internal/role/openidm-authorized"],
"moduleId" : security.authorization.moduleId
};

In a live environment you should not use amadmin, or openidm-admin but create your own delegated administrators, however in this case we will stick with the sample.

Browse to the IDM Admin UI.

http://openidm.example.com:8080/admin/

You will be directed to AM for login, login with amadmin.

Login as administrator

After authentication you will be directed to IDM End User UI welcome page.

Note: This may seem strange as you requested the Admin UI but it is expected as the redirection URI is set to this page (don’t change this as you can just browse to admin URL after authentication).

Login success

On the top right click the drop down and select Admin, or alternatively just hit the IDM Admin UI URL again.

Switch to Admin UI

You’ll now be directed to the Admin UI and as an IDM administrator you should be able to browse around as expected.

IDM Admin UI success

Congratulations you did it!

Finished.

References

  1. https://backstage.forgerock.com/docs/idm/6.5/samples-guide/#chap-full-stack
  2. khttps://forum.forgerock.com/2018/05/forgerock-identity-platform-version-6-integrating-idm-ds/
  3. https://forum.forgerock.com/2016/02/forgerock-full-stack-configuration/

This blog post was first published @ https://medium.com/@marknienaber included here with permission.

Deploying the ForgeRock platform on Kubernetes using Skaffold and Kustomize

Image result for forgerock logo

If you are following along with the ForgeOps repository, you will see some significant changes in the way we deploy the ForgeRock IAM platform to Kubernetes.  These changes are aimed at dramatically simplifying the workflow to configure, test and deploy ForgeRock Access Manager, Identity Manager, Directory Services and the Identity Gateway.

To understand the motivation for the change, let’s recap the current deployment approach:

  • The Kubernetes manifests are maintained in one git repository (forgeops), while the product configuration is another (forgeops-init).
  • At runtime,  Kubernetes init containers clone the configuration from git and make it  available to the component using a shared volume.
The advantage of this approach is that the docker container for a product can be (relatively) stable. Usually it is the configuration that is changing, not the product binary.
This approach seemed like a good idea at the time, but in retrospect it created a lot of complexity in the deployment:
  • The runtime configuration is complex, requiring orchestration (init containers) to make the configuration available to the product.
  • It creates a runtime dependency on a git repository being available. This isn’t a show stopper (you can create a local mirror), but it is one more moving part to manage.
  • The helm charts are complicated. We need to weave git repository information throughout the deployment. For example, putting git secrets and configuration into each product chart. We had to invent a mechanism to allow the user to switch to a different git repo or configuration – adding further complexity. Feedback from users indicated this was a frequent source of errors. 
  • Iterating on configuration during development is slow. Changes need to be committed to git and the deployment rolled to test out a simple configuration change.
  • Kubernetes rolling deployments are tricky. The product container version must be in sync with the git configuration. A mistake here might not get caught until runtime. 
It became clear that it would be *much* simpler if the products could just bundle the configuration in the docker container so that it is “ready to run” without any complex orchestration or runtime dependency on git.
[As an aside, we often get asked why we don’t store configuration in ConfigMaps. The short answer is: We do – for top level configuration such as domain names and global environment variables. Products like AM have large and complex configurations (~1000 json files for a full AM export). Managing these in ConfigMaps gets to be cumbersome. We also need a hierarchical directory structure – which is an outstanding ConfigMap RFE]
The challenge with the “bake the configuration in the docker image” approach is that  it creates *a lot* of docker containers. If each configuration change results in a new (and unique) container, you quickly realize that automation is required to be successful. 
About a year ago, one of my colleagues happened to stumble across a new tool from Google called skaffold.  From the documentation

“Skaffold handles the workflow for building, pushing and deploying your application.
So you can focus more on application development”
To some extent skaffold is syntactic sugar on top of this workflow:
docker build; docker tag; docker push;
kustomize build |  kubectl apply -f – 
Calling it syntactic sugar doesn’t really do it justice, so do read through their excellent documentation. 
There isn’t anything that skaffold does that you can’t accomplish with other tools (or a whack of bash scripts), but skaffold focuses on smoothing out and automating this basic workflow.
A key element of Skaffold is its tagging strategy. Skaffold will apply a unique tag to each docker image (the tagging strategy is pluggable, but is generally a sha256 hash, or a git commit). This is essential for our workflow where we want to ensure that combination of the product (say AM) and a specific configuration is guaranteed to be unique. By using a git commit tag on the final image, we can be confident that we know exactly how a container was built including its configuration.  This also makes rolling deployments much more tractable, as we can update a deployment tag and let Kubernetes spin down the older container and replace it with the new one.
If it isn’t clear from the above, the configuration for the product lives inside the docker image, and that in turn is tracked in a git repository. If for example you check out the source for the IDM container: https://github.com/ForgeRock/forgeops/tree/master/docker/idm 
You will see that the Dockerfile COPYs the configuration into the final image. When IDM runs, its configuration will be right there, ready to go. 
Skaffold has two major modes of operation.  The “run” mode  is a one shot build, tag, push and deploy.  You will typically use skaffold run as part of CD pipeline. Watch for git commit, and invoke skaffold to deploy the change.  Again – you can do this with other tools, but Skaffold just makes it super convenient.
Where Skaffold really shines is in “dev” mode. If you run skaffold dev, it will run a continual loop watching the file system for changes, and rebuilding and deploying as you edit files.
This diagram (lifted from the skaffold project) shows the workflow:
architectureThis process is really snappy. We find that we can deploy changes within 20-30 seconds (most of that is just container restarts).  When pushing to a remote GKE cluster, the first deployment is a little slower as we need to push all those containers to gcr.io, but subsequent updates are fast as you are pushing configuration deltas that are just a few KB in size.
Note that git commits are not required during development.  A developer will iterate on the desired configuration, and only when they are happy will they commit the changes to git and create a pull request. At this stage a CD process will pick up the commit and deploy the change to a QA environment. We have a simple CD sample using Google Cloudbuild.
At this point we haven’t said anything about helm and why we decided to move to Kustomize.  
Once our runtime deployments became simpler (no more git init containers, simpler ConfigMaps, etc.), we found ourselves questioning the need for  complex helm templates. There was some internal resistance from our developers on using golang templates (they *are* pretty ugly when combined with yaml), and the security issues raised by Helm’s Tiller component raised additional red flags. 
Suffice to say, there was no compelling reason to stick with Helm, and transitioning to Kustomize was painless. A shout out to the folks at Replicated – who have a very nifty tool called ship, that will convert your helm charts to Kustomize.  The “port” from Helm to Kustomize took a couple of days. We might look at Helm 3 in the future, but for now our requirements are being met by Kustomize. One nice side effect that we noticed is that Kustomize deployments with skaffold are really fast. 
This work is being done on the master branch of forgeops (targetting the 7.0 release), but if you would like to try out this new workflow with the current (6.5.2) products, you are in luck!  We have a preview branch  that uses the current products.  
The following should just work(TM) on minikube:
cd forgeops
git checkout skaffold-6.5
skaffold dev 
There are some prerequisites that you need to install. See the README-skaffold
The initial feedback on this workflow has been very positive. We’d love for folks to try it out and let us know what you think. Feel free to reach out to me at my ForgeRock email (warren dot strange at forgerock.com).

This blog post was first published @ warrenstrange.blogspot.ca, included here with permission.