Kubernetes - Taints and Tolerations

One of the best things about Kubernetes, is that I don’t have to think about which piece of hardware my container will run on when I deploy it. The Kubernetes scheduler can make that decision for me. This is great until I actually DO care about what node my container runs on. This post will examine one solution to pod placement, through taints and tolerations.

Taints - The Theory

Suppose we had a Kubernetes cluster where we didn’t want any pods to run on a specific node. You might need to do this for a variety of reasons, such as:

one node in the cluster is reserved for special purposes because it has specialized hardware like a GPU
one node in the cluster isn’t licensed for some software running on it
one node is in a different network zone for compliance reasons
one node is in timeout for doing something naughty

Whatever the particular reason, we need a way to ensure our pods are not placed on a certain node. That’s where a taint comes in.

Taint’s are a way to put up a giant stop sign in front of the K8s scheduler. You can apply a taint to a k8s node to tell the scheduler you’re not available for any pods.

Tolerations - The Theory

How about use case where we had really slow spinning disks in a node. We applied a taint to that node so that our normal pods won’t be placed on that piece of hardware, due to it’s poor performance, but we have some pods that don’t need fast disks. This is where Tolerations could come into play.

A toleration is a way of ignoring a taint during scheduling. Tolerations aren’t applied to nodes, but rather the pods. So, in the example above, if we apply a toleration to the PodSpec, we could “tolerate” the slow disks on that node and still use it.

Taints - In Action

Let’s apply a taint to our Kubernetes cluster. But first, you might check to see if you have a taint applied already. Depending upon how you deployed your cluster, your master node(s) might have a taint applied to them to keep pods from running on the master nodes. You can run the:

kubectl describe node [k8s master node]

OK, now lets apply a taint to a couple of nodes in our cluster. I’ll create a taint with a key/value pair of “hardware:slow” to identify nodes that should not run my pods any longer because of their slow hardware specifications.

kubectl taint nodes [node name] [key=value]:NoSchedule

In my case I ran this twice because I tainted two nodes. I should mention that this can be done through labels as well to quickly taint multiple nodes. Also, we ran the command with the “NoSchedule” effect which keeps the scheduler from choosing this node, but you could also use other effects like “PreferNoSchedule” or “NoExecute” as well.

At this point, two of my three available worker nodes are tainted with the “hardware” key pair. Lets deploy some pods and see how they’re scheduled. I’ll deploy nginx pods to my workers and I’ll deploy three pods which ordinarily we’d expect to be deployed evenly across my cluster. The manifest file below is what will be deployed.

apiVersion: apps/v1 #version of the API to use
kind: Deployment #What kind of object we're deploying
metadata: #information about our object we're deploying
  name: nginx-deployment #Name of the deployment
  labels: #A tag on the deployments created
    app: nginx
spec: #specifications for our object
  strategy:
    type: RollingUpdate
    rollingUpdate: #Update Pods a certain number at a time
      maxUnavailable: 1 #Total number of pods that can be unavailable at once
      maxSurge: 1 #Maximum number of pods that can be deployed above desired state
  replicas: 3 #The number of pods that should always be running
  selector: #which pods the replica set should be responsible for
    matchLabels:
      app: nginx #any pods with labels matching this I'm responsible for.
  template: #The pod template that gets deployed
    metadata:
      labels: #A tag on the replica sets created
        app: nginx
    spec:
      containers:
      - name: nginx-container #the name of the container within the pod
        image: nginx:1.7.9 #which container image should be pulled
        ports:
        - containerPort: 80 #the port of the container within the pod

After applying the nginx deployment, we’ll check our pods and see which nodes they are running on. To do this run:

kubectl get pod -o=custom-columns=NODE:.spec.nodeName,NAME:.metadata.name

As you can see, I’ve got three pods deployed and they’re all on k8s-worker-0. This is the only node that wasn’t tainted in my cluster, so this confirms that the taints on k8s-worker-1 and k8s-worker-2 are working correctly.

Tolerations - In Action

Now I’m going to delete that deployment and deploy a new deployment that tolerates our “hardware” taint.

I’ve created a new manifest file that is the same as we ran before, except this time I added a toleration for the taint we applied to our nodes.

apiVersion: apps/v1 #version of the API to use
kind: Deployment #What kind of object we're deploying
metadata: #information about our object we're deploying
  name: nginx-deployment #Name of the deployment
  labels: #A tag on the deployments created
    app: nginx
spec: #specifications for our object
  strategy:
    type: RollingUpdate
    rollingUpdate: #Update Pods a certain number at a time
      maxUnavailable: 1 #Total number of pods that can be unavailable at once
      maxSurge: 1 #Maximum number of pods that can be deployed above desired state
  replicas: 3 #The number of pods that should always be running
  selector: #which pods the replica set should be responsible for
    matchLabels:
      app: nginx #any pods with labels matching this I'm responsible for.
  template: #The pod template that gets deployed
    metadata:
      labels: #A tag on the replica sets created
        app: nginx
    spec:
      tolerations:
      - key: "hardware"
        operator: "Equal"
        value: "slow"
        effect: "NoSchedule"
      containers:
      - name: nginx-container #the name of the container within the pod
        image: nginx:1.7.9 #which container image should be pulled
        ports:
        - containerPort: 80 #the port of the container within the pod

Lets apply this new manifest to our cluster and see what happens to the pod placement decisions by the scheduler.

Well, look there. Now those same three pods were distributed across the three nodes evenly for this deployment that tolerated the node taints.

Summary

Its hard to say what needs you might have for scheduling pods on specific nodes in your cluster, but by using taints and tolerations you can adjust where these pods are deployed.

Taints are applied at the node level and prevent nodes from being used. Tolerations are applied at the pod level and can tell the scheduler which taints they are able to withstand.

Taints - The Theory#

Tolerations - The Theory#

Taints - In Action#

Tolerations - In Action#

Summary#

Taints - The Theory

Tolerations - The Theory

Taints - In Action

Tolerations - In Action

Summary