Kubernetes – Taints and Tolerations
July 29, 2019One of the best things about Kubernetes, is that I don’t have to think about which piece of hardware my container will run on when I deploy it. The Kubernetes scheduler can make that decision for me. This is great until I actually DO care about what node my container runs on. This post will examine one solution to pod placement, through taints and tolerations.
Taints – The Theory
Suppose we had a Kubernetes cluster where we didn’t want any pods to run on a specific node. You might need to do this for a variety of reasons, such as:
- one node in the cluster is reserved for special purposes because it has specialized hardware like a GPU
- one node in the cluster isn’t licensed for some software running on it
- one node is in a different network zone for compliance reasons
- one node is in timeout for doing something naughty
Whatever the particular reason, we need a way to ensure our pods are not placed on a certain node. That’s where a taint comes in.
Taint’s are a way to put up a giant stop sign in front of the K8s scheduler. You can apply a taint to a k8s node to tell the scheduler you’re not available for any pods.
Tolerations – The Theory
How about use case where we had really slow spinning disks in a node. We applied a taint to that node so that our normal pods won’t be placed on that piece of hardware, due to it’s poor performance, but we have some pods that don’t need fast disks. This is where Tolerations could come into play.
A toleration is a way of ignoring a taint during scheduling. Tolerations aren’t applied to nodes, but rather the pods. So, in the example above, if we apply a toleration to the PodSpec, we could “tolerate” the slow disks on that node and still use it.
Taints – In Action
Let’s apply a taint to our Kubernetes cluster. But first, you might check to see if you have a taint applied already. Depending upon how you deployed your cluster, your master node(s) might have a taint applied to them to keep pods from running on the master nodes. You can run the:
kubectl describe node [k8s master node]
Code language: CSS (css)
OK, now lets apply a taint to a couple of nodes in our cluster. I’ll create a taint with a key/value pair of “hardware:slow” to identify nodes that should not run my pods any longer because of their slow hardware specifications.
kubectl taint nodes [node name] [key=value]:NoSchedule
Code language: CSS (css)
In my case I ran this twice because I tainted two nodes. I should mention that this can be done through labels as well to quickly taint multiple nodes. Also, we ran the command with the “NoSchedule” effect which keeps the scheduler from choosing this node, but you could also use other effects like “PreferNoSchedule” or “NoExecute” as well.
At this point, two of my three available worker nodes are tainted with the “hardware” key pair. Lets deploy some pods and see how they’re scheduled. I’ll deploy nginx pods to my workers and I’ll deploy three pods which ordinarily we’d expect to be deployed evenly across my cluster. The manifest file below is what will be deployed.
apiVersion: apps/v1 #version of the API to use
kind: Deployment #What kind of object we're deploying
metadata: #information about our object we're deploying
name: nginx-deployment #Name of the deployment
labels: #A tag on the deployments created
app: nginx
spec: #specifications for our object
strategy:
type: RollingUpdate
rollingUpdate: #Update Pods a certain number at a time
maxUnavailable: 1 #Total number of pods that can be unavailable at once
maxSurge: 1 #Maximum number of pods that can be deployed above desired state
replicas: 3 #The number of pods that should always be running
selector: #which pods the replica set should be responsible for
matchLabels:
app: nginx #any pods with labels matching this I'm responsible for.
template: #The pod template that gets deployed
metadata:
labels: #A tag on the replica sets created
app: nginx
spec:
containers:
- name: nginx-container #the name of the container within the pod
image: nginx:1.7.9 #which container image should be pulled
ports:
- containerPort: 80 #the port of the container within the pod
Code language: PHP (php)
After applying the nginx deployment, we’ll check our pods and see which nodes they are running on. To do this run:
kubectl get pod -o=custom-columns=NODE:.spec.nodeName,NAME:.metadata.name
Code language: JavaScript (javascript)
As you can see, I’ve got three pods deployed and they’re all on k8s-worker-0. This is the only node that wasn’t tainted in my cluster, so this confirms that the taints on k8s-worker-1 and k8s-worker-2 are working correctly.
Tolerations – In Action
Now I’m going to delete that deployment and deploy a new deployment that tolerates our “hardware” taint.
I’ve created a new manifest file that is the same as we ran before, except this time I added a toleration for the taint we applied to our nodes.
apiVersion: apps/v1 #version of the API to use
kind: Deployment #What kind of object we're deploying
metadata: #information about our object we're deploying
name: nginx-deployment #Name of the deployment
labels: #A tag on the deployments created
app: nginx
spec: #specifications for our object
strategy:
type: RollingUpdate
rollingUpdate: #Update Pods a certain number at a time
maxUnavailable: 1 #Total number of pods that can be unavailable at once
maxSurge: 1 #Maximum number of pods that can be deployed above desired state
replicas: 3 #The number of pods that should always be running
selector: #which pods the replica set should be responsible for
matchLabels:
app: nginx #any pods with labels matching this I'm responsible for.
template: #The pod template that gets deployed
metadata:
labels: #A tag on the replica sets created
app: nginx
spec:
tolerations:
- key: "hardware"
operator: "Equal"
value: "slow"
effect: "NoSchedule"
containers:
- name: nginx-container #the name of the container within the pod
image: nginx:1.7.9 #which container image should be pulled
ports:
- containerPort: 80 #the port of the container within the pod
Code language: PHP (php)
Lets apply this new manifest to our cluster and see what happens to the pod placement decisions by the scheduler.
Well, look there. Now those same three pods were distributed across the three nodes evenly for this deployment that tolerated the node taints.
Summary
Its hard to say what needs you might have for scheduling pods on specific nodes in your cluster, but by using taints and tolerations you can adjust where these pods are deployed.
Taints are applied at the node level and prevent nodes from being used. Tolerations are applied at the pod level and can tell the scheduler which taints they are able to withstand.