Kubernetes Liveness and Readiness Probes

May 18, 2020 1 By Eric Shanks

Just because a container is in a running state, does not mean that the process running within that container is functional. We can use Kubernetes Readiness and Liveness probes to determine whether an application is ready to receive traffic or not.

Liveness and Readiness Probes – The Theory

On each node of a Kubernetes cluster there is a Kubelet running which manages the pods on that particular node. Its responsible for getting images pulled down to the node, reporting the node’s health, and restarting failed containers. But how does the Kubelet know if there is a failed container?

Well, it can use the notion of probes to check on the status of a container. Specifically a liveness probe.

Liveness probes indicate if a container is running. Meaning, has the application within the container started running and is it still running? If you’ve configured liveness probes for your containers, you’ve probably still seen them in action. When a container gets restarted, it’s generally because of a liveness probe failing. This can happen if your container couldn’t startup, or if the application within the container crashed. The Kubelet will restart the container because the liveness probe is failing in those circumstances. In some circumstances though, the application within the container is not working, but hasn’t crashed. In that case, the container won’t restart unless you provide additional information as a liveness probe.

A readiness probe indicates if the application running inside the container is “ready” to serve requests. As an example, assume you have an application that starts but needs to check on other services like a backend database before finishing its configuration. Or an application that needs to download some data before it’s ready to handle requests. A readiness probe tells the Kubelet that the application can now perform its function and that the Kubelet can start sending it traffic.

There are three different ways these probes can be checked.

ExecAction: Execute a command within the container
TCPSocketAction: TCP check against the container’s IP/port
HTTPGetAction: An HTTP Get request against the container’s IP/Port

Let’s look at the two probes in the context of a container starting up. The diagram below shows several states of the same container over time. We have a view into the containers to see whats going on with the application with relationship to the probes.

On the left side, the pod has just been deployed. A liveness probe performed at TCPSocketAction and found that the pod is “alive” even though the application is still doing work (loading data, etc) and isn’t ready yet. As time moves on, the application finishes its startup routine and is now “ready” to serve incoming traffic.

Let’s take a look at this from a different perspective. Assume we have a deployment already in our cluster, and it consists of a single replica which is displayed on the right side, behind our service. Its likely that we’ll need to scale the app, or replace it with another version. Now that we know our app isn’t ready to handle traffic right away after being started, we can wait to have our service add the new app to the list of endpoints until the application is “ready”. This is an important thing to consider if your apps aren’t ready as soon as the container starts up. A request could be sent to the container before its able to handle the request.

Liveness and Readiness Probes – In Action

First, we’ll look to see what happens with a readiness check. For this example, I’ve got a very simple apache container that displays pretty elaborate website. I’ve created a yaml manifest to deploy the container, service, and ingress rule.

apiVersion: v1
kind: Pod
metadata:
  labels:
    app: liveness
  name: liveness-http
spec:
  containers:
  - name: liveness
    image: theithollow/hollowapp-blog:liveness
    livenessProbe:
      httpGet:
        path: /
        port: 80
      initialDelaySeconds: 3
      periodSeconds: 3
    readinessProbe:
      httpGet:
        path: /health
        port: 80
      initialDelaySeconds: 3
      periodSeconds: 3
---
apiVersion: v1
kind: Service
metadata:
  name: liveness
spec:
  selector:
    app: liveness
  ports:
    - protocol: TCP
      port: 80
      targetPort: 80
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: liveness-ingress
  namespace: default
spec:
  rules:
  - host: liveness.theithollowlab.com
    http:
      paths:
      - backend:
          serviceName: liveness
          servicePort: 80

This manifest includes two probes:

Liveness check doing an HTTP request against “/”
Readiness check doing an HTTP request agains /health

    livenessProbe:
      httpGet:
        path: /
        port: 80
      initialDelaySeconds: 3
      periodSeconds: 3
    readinessProbe:
      httpGet:
        path: /health
        port: 80
      initialDelaySeconds: 3
      periodSeconds: 3

My container uses a script to start the HTTP daemon right away, and then waits 60 seconds before creating a /health page. This is to simulate some work being done by the application and the app isn’t ready for consumption. This is the entire website for reference.

And here is my container script.

/usr/sbin/httpd > /dev/null 2>&1 &. #Start HTTP Daemon
sleep 60. #wait 60 seconds
echo HealthStatus > /var/www/html/health #Create Health status page
sleep 3600Code language: PHP (php)

Deploy the manifest through kubectl apply. Once deployed, I’ve run a --watch command to keep an eye on the deployment. Here’s what it looked like.

You’ll notice that the ready status showed 0/1 for about 60 seconds. Meaning that my container was not in a ready status for 60 seconds until the /health page became available through the startup script.

As a silly example, what if we modified our liveness probe to look for /health? Perhaps we have an application that sometimes stops working, but doesn’t crash. Will the application ever startup? Here’s my new probe in the yaml manifest.

    livenessProbe:      
      httpGet:
        path: /health
        port: 80
      initialDelaySeconds: 3
      periodSeconds: 3

After deploying this, let’s run another --watch on the pods. Here we see that the pod is restarting, and I am unable to ever access the /health page because it restarts before its ready.