Resizing Tanzu Kubernetes Grid Cluster Nodes
December 9, 2020Have you ever missed when trying to properly size an Kubernetes environment? Maybe the requirements changed, maybe there were wrong assumptions, or maybe the project took off and it just needs more resources. Under normal circumstances, I might suggest to you to build a new Tanzu Kubernetes Grid (TKG) cluster and re-deploy your apps. Unfortunately, as much as I want to treat Kubernetes clusters as ephemeral, they can’t always be treated this way. If you need to resize your TKG nodes without re-deploying a new cluster, then keep reading.
Tanzu Kubernetes Grid is built atop of the ClusterAPI project, and as such, we can use the details about how ClusterAPI provisions our clusters to update them.
My favorite ClusterAPI reference diagram to visually understand the components of ClusterAPI can be found on Chip Zoller‘s site at Neon Mirrors. Mr. Zoller visualizes the cluster object dependencies and we will use these to update a running cluster with no down time.
From the diagram, we can see that there is a MachineDeployment Object. The machine deployment defines how our nodes are configured. The machine deployments reference a KubeadmConfigTemplate which defines how nodes will join a Kubernetes cluster. It also references a MachineTemplate that defines how the nodes are deployed on a cloud. In the diagram from neonmirrors.net you see a “vSphereMachineTemplate” object, but for this example, we’ll use the AWSMachineTemplate object.
Modify Node Settings
Now, that you have some background on the TKG objects, we can try out modifying the configuration. First, lets take a look at the objects we’re working with. To get access to these objects, you’ll want to set your KUBECONFIG context to the management cluster responsible for your workload clusters. Once you set your context you can run:
kubectl get machines
You can see that in my lab I have a six node Kubernetes cluster. I plan to update one of the workload nodes resources. Notice in the name of the nodes there are some with a md
in them. The md
stands for “machine deployment” and these are our workload nodes. Lets pick md-0 as a node to update.
Lets look for the machine deployment object.
kubectl get machinedeployments
Notice that I have three different machine deployments. Each machine deployment could consist of multiple machines, but we have three in this case because each AWS Availability zone gets their own machine deployments. Lets check out the md-0
machine deployment in further detail.
kubectl get machinedeployment tanzuworkloads1-md-0 -o yaml
The snippet below shows how the machine deployment reference both a Kubeadmconfig and an AWSMachineTemplate.
You can see that our machinedeployment is referencing a tanzuworkloads-md-0
AWSMachineTemplate. Lets go take a look at those.
kubectl get awsmachinetemplates
Notice that we have a template for the control plane, as well as a template for each of our availability zones. NOTE: You can create more templates for special use cases such as high memory nodes, high compute nodes or GPUs, etc.
Lets dive deeper and look at the awsmachinetemplate for md-0.
kubectl get awsmachinetemplates tanzuworkloads-md-0 -o yaml
Ah ha! We now see where the AWS instance size and root volume size are located. Under normal circumstances I’d tell you to just edit this manifest and you’re all set. However these template are supposed to be immutable, so what we’re going to do is copy this template to a file, make our changes and apply it with a new name.
To copy the template to a file, you can use the same command from above out to a file.
kubectl get awsmachinetemplates tanzuworkloads-md-0 -o yaml > myfile.yaml
After your file is written to your workstation, edit the file to make the changes you want to the machines and give it a new name. I’ve also removed the status fields and have posted my full file below. For my environment i changed the instancetype to t3.xlarge and a root volume of 100 GB.
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
kind: AWSMachineTemplate
metadata:
generation: 1
managedFields:
- apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
manager: manager
name: tanzuworkloads-md-0-new
namespace: default
ownerReferences:
- apiVersion: cluster.x-k8s.io/v1alpha3
kind: Cluster
name: tanzuworkloads
uid: 89f5020a-2cf6-4c0b-be15-1c9d9c9deddb
spec:
template:
spec:
ami:
id: ami-058759e4c2532dc14
iamInstanceProfile: nodes.tkg.cloud.vmware.com
instanceType: t3.xlarge
rootVolume:
size: 100
sshKeyName: vmc-cna-admin
Code language: JavaScript (javascript)
After your modifications are made, you can apply the configuration to the cluster. No updates will happen to your workload cluster at this point.
kubectl apply -f myfile.yaml
kubectl get awsmachinetemplates
The last step for us to update our nodes, is to modify our existing machinedeployment object to point at our new awsmachinetemplate.
kubectl edit machinedeployment tanzuworkloads-md-0
use vim to edit the configuration and save the config.
Note: to enter insert mode press i
and to exit vim
use the command :wq
Once you save the configuration, you should take a look at your AWS instances. With any luck, a new instance is being provisioned with your correct settings.
You can also verify this by running a get on your machine deployments.
kubectl get machinedeployments
If you watch this operation in full, you’ll see that new nodes will be provisioned and joined to your cluster. Once they are healthy, tkg
will remove the old node and effectively replaced it. This same process can be used for upgrades.
Summary
Sometimes you need to resize an existing Tanzu Kubernetes Grid cluster without disrupting the containers that are already running. You can do this by making a new MachineTemplate from the existing one, and modifying the machinedeployment
to use the new template. Tanzu Kubernetes Grid will then do a rolling update to the nodes in the cluster.