Kubernetes: Why won’t that deployment roll?

Kubernetes Deployments provide a declarative way of managing replica sets and pods.  A deployment specifies how many pods to run as part of a replica set, where to place pods, how to scale them and how to manage their availability.

Deployments are also used to perform rolling updates to a service. They can support a number of different update strategies such as blue/green and canary deployments.

The examples provided in the Kubernetes documentation explain how to trigger a rolling update by changing a docker image.  For example, suppose you edit the Docker image tag by using the kubectl edit deployment/my-app command, changing the image tag from acme/my-app:v0.2.3 to acme/my-app:v0.3.0.

When Kubernetes sees that the image tag has changed, a rolling update is triggered according to the declared strategy. This results in the pods with the 0.2.3 image being taken down and replaced with pods running the 0.3.0 image.  This is done in a rolling fashion so that the service remains available during the transition.

If your application is packaged up as Helm chart, the helm upgrade command can be used to trigger a rollout:

helm upgrade my-release my-chart

Under the covers, Helm is just applying any updates in your chart to the deployment and sending them to Kubernetes. You can achieve the same thing yourself by applying updates using kubectl.

Now let’s suppose that you want to trigger a rolling update even if the docker image has not changed. A likely scenario is that you want to apply an update to your application’s configuration.

As an example, the ForgeRock Identity Gateway (IG) Helm chart pulls its configuration from a git repository. If the configuration changes in git, we’d like to roll out a new deployment.

My first attempt at this was to perform a helm upgrade on the release, updating the ConfigMap in the chart with the new git branch for the release. Our Helm chart uses the ConfigMap to set an environment variable to the git branch (or commit) that we want to checkout:

kind: ConfigMap
 data:
 GIT_CHECKOUT_BRANCH: test

After editing the ConfigMap, and doing a helm upgrade, nothing terribly exciting happened.  My deployment did not “roll” as expected with the new git configuration.

As it turns out, Kubernetes needs to see a change in the pod’s spec.template before triggers a new deployment. Changing the image tag is one way to do that,  but any change to the template will work. As I discovered, changes to a ConfigMap *do not* trigger deployment updates.

The solution here is to move the git branch variable out of the ConfigMap and into to the pod’s spec.template in the deployment object:

 spec:
      initContainers:
      - name: git-init
        env:
        - name: GIT_CHECKOUT_BRANCH
          value: "{{ .Values.global.git.branch }}"

When the IG helm chart is updated (we supply a new value to the template variable above), the template’s spec changes, and Kubernetes will roll out the new deployment.

Here is what it looks like when we put it all together (you can check out the full IG chart here).

# install IG using helm. The git branch defaults to "master"
 $ helm install openig
 NAME:   hissing-yak

$ kubectl get deployment
 NAME      DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE

openig    1         1         1            1           4m

# Take a look at our deployment history. So far, there is only one revision:
 $ kubectl rollout history deployment
 deployments "openig"
 REVISION CHANGE-CAUSE
 1
 # Let's deploy a different configuration. We can use a commit hash here instead
 # of a branch name:
 $ helm upgrade --set global.git.branch=d0562faec4b1621ad53b852aa5ee61f1072575cc
 hissing-yak openig

# Have a look at our deployment. We now see a new revision
 $ kubectl rollout history deploy
 deployments "openig"
 REVISION  CHANGE-CAUSE
 1
 2
 # Look at the deployment history. You should see an event where the original
 # replicaset is scaled down and a new set is scaled up:
 kubectl describe deployment
 Events:
 Type    Reason             Age   From                   Message
 ----    ------             ----  ----                   -------
 ---> Time=19M ago. This is the original RS
 Normal  ScalingReplicaSet  19m   deployment-controller  Scaled up replica set openig-6f6575cdfd to 1
 ------> Time=3m ago. The new RS being brought up
 Normal  ScalingReplicaSet  3m    deployment-controller  Scaled up replica set openig-7b5788659c to 1
 -----> Time=3m ago. The old RS being scaled down
 Normal  ScalingReplicaSet  3m    deployment-controller  Scaled down replica set openig-6f6575cdfd to 0

One of the neat things we can do with deployments is roll back to a previous revision. For example, if we have an error in our configuration, and want to restore the previous release:

$ kubectl rollout undo deployment/openig
 deployment "openig" rolled back
 $ kubectl describe  deployment:
 Normal  DeploymentRollback  1m  deployment-controller  Rolled back deployment "openig" to revision 1

# We can see the old pod is being terminated and the new one has started:
 $ kubectl get pod
 NAME                      READY     STATUS        RESTARTS   AGE
 openig-6f6575cdfd-tvmgj   1/1       Running       0          28s
 openig-7b5788659c-h7kcb   0/1       Terminating   0          8m

And that my friends, is how we roll.

This blog post was first published @ warrenstrange.blogspot.ca, included here with permission.

Save greenbacks on Google Container Engine using autoscaling and preemptible VMs

There is an awesome new feature on Google Container Engine (GKE) that lets you combine autoscaling, node pools and preemptible VMs to save big $!

The basic idea is to create a small cluster with an inexpensive VM type that will run 7×24. This primary node can be used for critical services that should not be rescheduled to another pod. A good example would be a Jenkins master server. Here is an example of how to create the cluster:

gcloud alpha container clusters create $CLUSTER 
  --network "default" --num-nodes 1 
  --machine-type  ${small} --zone $ZONE 
  --disk-size 50

Now here is the money saver trick:  A second node pool is added to the cluster. This node pool is configured to auto-scale from one node up to a maximum. This additional node pool uses preemptible VMs. These are VMs that can be taken away at any time if Google needs the capacity, but in exchange you get dirt cheap images. For example, running a 4 core VM with 15GB of RAM for a month comes in under $30.

This second pool is perfect for containers that can survive a restart or migration to a new node. Jenkins slaves would be a good candidate.

Here is an example of adding the node pool to the cluster you created above:

gcloud alpha container node-pools create $NODEPOOL --cluster $CLUSTER --zone $ZONE 
    --machine-type ${medium} --preemptible --disk-size 50 
    --enable-autoscaling --min-nodes=1 --max-nodes=4

That node pool will scale down to a single VM if the cluster is not busy, and scale up to a maximum of 4 nodes.

If your VM gets preempted (and it will at least once every 24 hours),  the pods running on that node will be rescheduled onto a new node created by the auto-scaler.

Container engine assigns a label to nodes which you can use for scheduling. For example, to ensure you Jenkins Master does not get put on a preemptible node, you can add the following annotation to your Pod Spec:

apiVersion: v1kind: Podspec:  nodeSelector:    !cloud.google.com/gke-preemptible
apiVersion: v1kind: Podspec:  nodeSelector:    !cloud.google.com/gke-preemptible
nodeSelector:    !cloud.google.com/gke-preemptible

See https://cloud.google.com/container-engine/docs/preemptible-vm for the details.

This blog post was first published @ warrenstrange.blogspot.ca, included here with permission.

Automating OpenDJ backups on Kubernetes

Kubernetes StatefulSets are designed to run “pet” like services such as databases.  ForgeRock’s OpenDJ LDAP server is an excellent fit for StatefulSets as it requires stable network identity and persistent storage.

The ForgeOps project contains a Kubernetes Helm chart to deploy DJ to a Kubernetes cluster. Using a StatefulSet, the cluster will auto-provision persistent storage for our pod. We configure OpenDJ to place its backend database on this storage volume.

This gives us persistence that survives container restarts, or even restarts of the cluster. As long as we don’t delete the underlying persistent volume, our data is safe.

Persistent storage is quite reliable, but we typically want additional offline backups for our database.

The high level approach to accomplish this is as follows:

  • Configure the OpenDJ container to supported scheduled backups to a volume.
  • Configure a Kubernetes volume to store the backups.
  • Create a sidecar container that archives the backups. For our example we will use Google Cloud Storage.
Here are the steps in more detail:

Scheduled Backups:

OpenDJ has a built in task scheduler that can periodically run backups using a crontab(5) format.  We update the Dockerfile for OpenDJ with environment variables that control when backups run:

 

 # The default backup directory. Only relevant if backups have been scheduled.  
 ENV BACKUP_DIRECTORY /opt/opendj/backup  
 # Optional full backup schedule in cron (5) format.  
 ENV BACKUP_SCHEDULE_FULL "0 2 * * *"  
 # Optional incremental backup schedule in cron(5) format.  
 ENV BACKUP_SCHEDULE_INCREMENTAL "15 * * * *"  
 # The hostname to run the backups on. If this hostname does not match the container hostname, the backups will *not* be scheduled.  
 # The default value below means backups will not be scheduled automatically. Set this environment variable if you want backups.  
 ENV BACKUP_HOST dont-run-backups  

To enable backup support, the OpenDJ container runs a script on first time setup that configures the backup schedule.  A snippet from that script looks like this:

 if [ -n "$BACKUP_SCHEDULE_FULL" ];  
 then  
   echo "Scheduling full backup with cron schedule ${BACKUP_SCHEDULE_FULL}"  
   bin/backup --backupDirectory ${BACKUP_DIRECTORY} -p 4444 -D "cn=Directory Manager"   
   -j ${DIR_MANAGER_PW_FILE} --trustAll --backupAll   
   --recurringTask "${BACKUP_SCHEDULE_FULL}"  
 fi  

 

Update the Helm Chart to support backup

Next we update the OpenDJ Helm chart to mount a volume for backups and to support our new BACKUP_ variables introduced in the Dockerfile. We use a ConfigMap to pass the relevant environment variables to the OpenDJ container:

 apiVersion: v1  
 kind: ConfigMap  
 metadata:  
  name: {{ template "fullname" . }}  
 data:  
  BASE_DN: {{ .Values.baseDN }}  
  BACKUP_HOST: {{ .Values.backupHost }}  
  BACKUP_SCHEDULE_FULL: {{ .Values.backupScheduleFull }}  
  BACKUP_SCHEDULE_INCREMENTAL: {{ .Values.backupScheduleIncremental }}  

The funny looking expressions in the curly braces are Helm templates. Those variables are expanded
when the object is sent to Kubernetes. Using values allows us to parameterize the chart when we deploy it.

Next we configure the container with a volume to hold the backups:

  volumeMounts:  
     - name: data  
      mountPath: /opt/opendj/data  
     - name: dj-backup  
      mountPath: /opt/opendj/backup  

This can be any volume type supported by your Kubernetes cluster. We will use an “emptyDir” for now – which is a dynamic volume that Kubernetes creates and mounts on the container.

Configuring a sidecar backup container

Now for the pièce de résistance. We have our scheduled backups going to a Kubernetes volume. How do we send those files to offline storage?

One approach would be to modify our OpenDJ Dockerfile to support offline storage. We could, for example, include commands to write backups to Amazon S3 or Google Cloud storage.  This works, but it would specialize our container image to a unique environment. Where practical, we want our images to be flexible so they can be reused in different contexts.

This is where sidecar containers come into play.  The sidecar container holds the specialized logic for archiving files.  In general, it is a good idea to design containers that have a single responsibility. Using sidecars helps to enable this kind of design.

If you are running on Google Cloud Engine,  there is a ready made container that bundles the “gcloud” SDK, including the “gsutil” utility for cloud storage.   We update our Helm chart to include this container as a sidecar that shares the backup volume with the OpenDJ container:

  {{- if .Values.enableGcloudBackups }}  
    # An example of enabling backup to google cloud storage.  
    # The bucket must exist, and the cluster needs --scopes storage-full when it is created.  
    # This runs the gsutil command periodically to rsync the contents of the /backup folder (shared with the DJ container) to cloud storage.   
    - name: backup  
     image: gcr.io/cloud-builders/gcloud  
     imagePullPolicy: IfNotPresent  
     command: [ "/bin/sh", "-c", "while true; do gsutil -m rsync -r /backup {{ .Values.gsBucket }} ; sleep 600; done"]  
     volumeMounts:  
     - name: dj-backup  
      mountPath: /backup  
    {{- end }}  

The above container runs in a loop that periodically rsyncs the contents of the backup volume to cloud storage.  You could of course replace this sidecar with another that sends storage to a different location (say an Amazon S3 bucket).

If you enable this feature and browse to your cloud storage bucket, you should see your backed up data:

To wrap it all up, here is the final helm command that will deploy a highly available, replicated two node OpenDJ cluster, and schedule backups on the second node:

 helm install -f custom-gke.yaml   
   --set djInstance=userstore 
   --set numberSampleUsers=1000,backupHost=userstore-1,replicaCount=2 helm/opendj  

Now we just need to demonstrate that we can restore our data. Stay tuned!

This blog post was first published @ warrenstrange.blogspot.ca, included here with permission.

OpenDJ Pets on Kubernetes


Stateless “12-factor” applications are all the rage, but there are some kinds of services that are inherently stateful. Good examples are things like relational databases (Postgres, MySQL) and NoSQL databases (Cassandra, etc).

These services are difficult to containerize, because the default docker model favours ephemeral containers where the data disappears when the container is destroyed.

These services also have a strong need for identity. A database “primary” server is different than the “slave”. In Cassandra, certain nodes are designated as seed nodes, and so on.

OpenDJ is an open source LDAP directory server from ForgeRock. LDAP servers are inherently “pet like” insomuch as the directory data must persist beyond the container lifetime. OpenDJ nodes also replicate data between themselves to provide high-availability and therefore need some kind of stable network identity.

Kubernetes 1.3  introduces a feature called “Pet Sets” that is designed specifically for these kinds of stateful applications.   A Kubernetes PetSet provides applications with:

  • Permanent hostnames that persist across restarts
  • Automatically provisioned persistent disks per container that live beyond the life of a container
  • Unique identities in a group to allow for clustering and leader election
  • Initialization containers which are critical for starting up clustered applications

These features are exactly what we need to deploy OpenDJ instances.  If you want to give this a try, read on…

You will need access to a Kubernetes 1.3 environment. Using minikube is the recommended way to get started on the desktop.

You will need to fork and clone the ForgeRock Docker repository to build the OpenDJ base image. The repository is on our stash server:

 https://stash.forgerock.org/projects/DOCKER/repos/docker/browse

To build the OpenDJ image, you will do something like:

cd opendj
 docker build -t forgerock/opendj:latest .

If you are using minikube,  you should connect your docker client to the docker daemon running in your minikube cluster (use minikube docker-env).  Kubernetes will not need to “pull” the image from a registry – it will already be loaded.  For development this approach will speed things up considerably.

Take a look at the README for the OpenDJ image. There are a few environment variables that the container uses to determine how it is bootstrapped and configured.  The most important ones:

  • BOOTSTRAP: Path to a shell script that will initialize OpenDJ. This is only executed if the data/config directory is empty. Defaults to /opt/opendj/boostrap/setup.sh
  • BASE_DN: The base DN to create. Used in setup and replication
  • DJ_MASTER_SERVER: If set, run.sh will call bootstrap/replicate.sh to enable replication to this master. This only happens if the data/config directory does not exist

There are sample bootstrap setup.sh scripts provided as part of the container, but you can override these and provide your own script.

Next,  fork and clone the ForgeRock Kubernetes project here:
https://stash.forgerock.org/projects/DOCKER/repos/fretes/browse

The opendj directory contains the Pet Set example.  You must edit the files to suit your needs, but as provided, the artifacts do the following:

  • Configures two OpenDJ servers (opendj-0 and opendj-1) in a pet set.
  • Runs the  cts/setup.sh script provided as part of the docker image to configure OpenDJ as an OpenAM CTS server.
  • Optionally assigns persistent volumes to each pet, so the data will live across restarts
  • Assigns “opendj-0” as the master.  The replicate.sh script provided as part of the Docker image will replicate each node to this master.  The script ignores any attempt by the master to replicate to itself.  As each pet is added (Kubernetes creates them in order) replication will be configured between that pet and the opendj-0 master.
  • Creates a Kubernetes service to access the OpenDJ instances. Instances can be addressed by their unique name (opendj-1), or by a service name (opendj) which will go through a primitive load balancing function (at this time round robin).  Applications can also perform DNS lookup on the opendj SRV record to obtain a list of all the OpenDJ instances in the cluster.

The replication topology is quite simple. We simply replicate each OpenDJ instance to opendj-0. This is going to work fine for small OpenDJ clusters. For more complex installations you will need to enhance this example.

To create the petset:

kubectl create -f opendj/

If you bring up the minikube dashboard:

minikube dashboard

You should see the two pets being created (be patient, this takes a while).

Take a look at the pod logs using the dashboard or:

kubectl logs opendj-0 -f

Now try scaling up your PetSet. In the dashboard, edit the Pet Set object, and change the number of replicas from 2 to 3:

You should see a new OpenDJ instance being created. If you examine the logs for that instance, you will see it has joined the replication topology.

Note: Scaling down the Pet Set is not implemented at this time. Kubernetes will remove the pod, but the OpenDJ instances will still think the scaled down node is in the replication topology.

This blog post was first published @ blog.warrenstrange.com, included here with permission.

OpenDJ Pets on Kubernetes





Stateless "12-factor" applications are all the rage, but there are some kinds of services that are inherently stateful. Good examples are things like relational databases (Postgres, MySQL) and NoSQL databases (Cassandra, etc).

These services are difficult to containerize, because the default docker model favours ephemeral containers where the data disappears when the container is destroyed.

These services also have a strong need for identity. A database "primary" server is different than the "slave". In Cassandra, certain nodes are designated as seed nodes, and so on.

OpenDJ is an open source LDAP directory server from ForgeRock. LDAP servers are inherently "pet like" insomuch as the directory data must persist beyond the container lifetime. OpenDJ nodes also replicate data between themselves to provide high-availability and therefore need some kind of stable network identity.

Kubernetes 1.3  introduces a feature called "Pet Sets" that is designed specifically for these kinds of stateful applications.   A Kubernetes PetSet provides applications with:
  • Permanent hostnames that persist across restarts
  • Automatically provisioned persistent disks per container that live beyond the life of a container
  • Unique identities in a group to allow for clustering and leader election
  • Initialization containers which are critical for starting up clustered applications

These features are exactly what we need to deploy OpenDJ instances.  If you want to give this a try, read on...

You will need access to a Kubernetes 1.3 environment. Using minikube is the recommended way to get started on the desktop. 

You will need to fork and clone the ForgeRock Docker repository to build the OpenDJ base image. The repository is on our stash server: 


To build the OpenDJ image, you will do something like:

cd opendj
docker build -t forgerock/opendj:latest . 

If you are using minikube,  you should connect your docker client to the docker daemon running in your minikube cluster (use minikube docker-env).  Kubernetes will not need to "pull" the image from a registry - it will already be loaded.  For development this approach will speed things up considerably.


Take a look at the README for the OpenDJ image. There are a few environment variables that the container uses to determine how it is bootstrapped and configured.  The most important ones:
  • BOOTSTRAP: Path to a shell script that will initialize OpenDJ. This is only executed if the data/config directory is empty. Defaults to /opt/opendj/boostrap/setup.sh
  • BASE_DN: The base DN to create. Used in setup and replication
  • DJ_MASTER_SERVER: If set, run.sh will call bootstrap/replicate.sh to enable replication to this master. This only happens if the data/config directory does not exist

There are sample bootstrap setup.sh scripts provided as part of the container, but you can override these and provide your own script.  

Next,  fork and clone the ForgeRock Kubernetes project here:

The opendj directory contains the Pet Set example.  You must edit the files to suit your needs, but as provided, the artifacts do the following:

  • Configures two OpenDJ servers (opendj-0 and opendj-1) in a pet set. 
  • Runs the  cts/setup.sh script provided as part of the docker image to configure OpenDJ as an OpenAM CTS server.
  • Optionally assigns persistent volumes to each pet, so the data will live across restarts
  • Assigns "opendj-0" as the master.  The replicate.sh script provided as part of the Docker image will replicate each node to this master.  The script ignores any attempt by the master to replicate to itself.  As each pet is added (Kubernetes creates them in order) replication will be configured between that pet and the opendj-0 master. 
  • Creates a Kubernetes service to access the OpenDJ instances. Instances can be addressed by their unique name (opendj-1), or by a service name (opendj) which will go through a primitive load balancing function (at this time round robin).  Applications can also perform DNS lookup on the opendj SRV record to obtain a list of all the OpenDJ instances in the cluster.

The replication topology is quite simple. We simply replicate each OpenDJ instance to opendj-0. This is going to work fine for small OpenDJ clusters. For more complex installations you will need to enhance this example.


To create thet petset:

kubectl create -f opendj/


If you bring up the minikube dashboard:

minikube dashboard 

You should see the two pets being created (be patient, this takes a while). 




Take a look at the pod logs using the dashboard or:

kubectl logs opendj-0 -f 

Now try scaling up your PetSet. In the dashboard, edit the Pet Set object, and change the number of replicas from 2 to 3:



















You should see a new OpenDJ instance being created. If you examine the logs for that instance, you will see it has joined the replication topology. 

Note: Scaling down the Pet Set is not implemented at this time. Kubernetes will remove the pod, but the OpenDJ instances will still think the scaled down node is in the replication topology. 




Creating an internal CA and signed server certificates for OpenDJ using cfssl, keytool and openssl


Yes, that title is quite a mouthful, and mostly intended to get the Google juice if I need to find this entry again.

I spent a couple of hours figuring out the magical incantations, so thought I would document this here.

The problem: You want OpenDJ to use something other than the default self-signed certificate for SSL connections.   A "real" certificate signed by a CA (Certificate Authority) is expensive and a pain to procure and install.

The next best alternative is to create your own "internal" CA, and  have that CA sign certificates for your services.   In most cases, this is going to work fine for *internal* services that do not need to be trusted by a browser.

You might ask why is this better than just using self-signed certificates?  The idea is that you can import your CA certificate once into the truststore for your various clients, and thereafter those clients will trust any certificate presented that is signed by your CA.

For example, assume I have OpenDJ servers:  server-1,server-2 and server-3.  Using only  self-signed certificates, I will need to import the certs for each server (three in this case) into my client's truststore. If instead, I use a CA, I need only import a single CA certificate. The OpenDJ server certificates will be trusted because they are signed by my CA.  Once you start to get a lot of services deployed using self-signed certificates becomes super painful. Hopefully, that all makes sense...

Now how do you create all these certificates?  Using CloudFlare's open source  cfssl utility, Java keytool, and a little openssl.

I'll spare you the details, and point you to this shell script which you can edit for your environment:

Here is the gist:



Kubernetes Namespaces and OpenAM




I have been conducting some experiments running the ForgeRock stack on Kubernetes. I recently stumbled on namespaces.

In a nutshell Kubernetes (k8) namespaces provide isolation for instances. The typical use case is to provide isolated environments for dev, QA, production and so on.

I had an "Aha!" moment when it occurred to me that namespaces could also provide multi-tenancy on a k8 cluster. How might this work?

Let's create a two node OpenAM cluster using an external OpenDJ instance:

See https://github.com/ForgeRock/fretes  for some samples used in this article

kubectl create -f am-dj-idm/

The above command launches all the containers found in the given directory, wires them up together (updates DNS records), and create a load balancer on GCE.

 If I look at my services:

 kubectl get service 

I see something like this:

NAME       LABELS          SELECTOR   IP(S) PORT(S) 
openam-svc name=openam-svc site=site1 10.215.249.206 80/TCP 
                                      104.197.122.164 

(note: I am eliding a bit of the output here for brevity)

That second IP for openam-svc is the external IP of the load balancer configured by Kubernetes. If you bring up this IP address you will see the OpenAM login page. 

Now, let's change my namespace to another instance. Say "tenant1" (I previously created this namespace)

kc config use-context tenant1 

A kubectl get services  should be empty, as we have no services running yet in the tenant1 namespace. 

So let's create some: 

kubectl create -f am-dj-idm/ 


This is the same command we ran before - but this time we are running against a different namespace.  

Looking at our services, we see:

NAME        LABELS            SELECTOR   IP(S) PORT(S) 
openam-svc  name=openam-svc   site=site1 10.215.255.185 80/TCP 
                                         23.251.153.176 

Pretty cool. We now have two OpenAM instances deployed, with complete isolation, on the same cluster, using only a handful of commands. 

Hopefully this gives you a sense of why I am so excited about Kubernetes.



Running OpenAM and OpenDJ on Kubernetes with Google Container Engine



Still quite experimental, but if you are adventurous, have a look at:

https://github.com/ForgeRock/frstack/tree/master/docker/k8


This will set up a two node Kubernetes cluster running OpenAM and OpenDJ.  This uses images on the Docker hub that provide nightly builds for OpenAM and OpenDJ.

I will be presenting this at the ForgeRock IRM summit this thursday. Fingers crossed that the demo gods smile down on me!