A multi-node Kubernetes cluster is a cluster that consists of at least one or more ControlPlane nodes and one or more Worker nodes. Control Plane Node(s) responsible for managing the clusterβs lifecycle and schedules workloads. Worker nodes runs the actual applications (Pods).
Source:
Workloads Scheduling
By default, Kubernetes does not schedule Pods on control plane nodes for regular workloads. This is to ensure the control plane remains dedicated to managing the cluster and is not burdened with running application workloads. This only true if the node has this taints.
Lets try it with minikube, first we need to start a new cluster with 3 node using this command below.
β minikube start --nodes 3 -p multinode
π [multinode] minikube v1.34.0 on Darwin 15.3.1 (arm64)
β¨ Automatically selected the docker driver
π Using Docker Desktop driver with root privileges
π Starting "multinode" primary control-plane node in "multinode" cluster
π Pulling base image v0.0.45 ...
π₯ Creating docker container (CPUs=2, Memory=2200MB) ...
π³ Preparing Kubernetes v1.31.0 on Docker 27.2.0 ...
βͺ Generating certificates and keys ...
βͺ Booting up control plane ...
βͺ Configuring RBAC rules ...
π Configuring CNI (Container Networking Interface) ...
π Verifying Kubernetes components...
βͺ Using image gcr.io/k8s-minikube/storage-provisioner:v5
π Enabled addons: storage-provisioner, default-storageclass
π Starting "multinode-m02" worker node in "multinode" cluster
π Pulling base image v0.0.45 ...
π₯ Creating docker container (CPUs=2, Memory=2200MB) ...
π Found network options:
βͺ NO_PROXY=192.168.49.2
π³ Preparing Kubernetes v1.31.0 on Docker 27.2.0 ...
βͺ env NO_PROXY=192.168.49.2
π Verifying Kubernetes components...
π Starting "multinode-m03" worker node in "multinode" cluster
π Pulling base image v0.0.45 ...
π₯ Creating docker container (CPUs=2, Memory=2200MB) ...
π Found network options:
βͺ NO_PROXY=192.168.49.2,192.168.49.3
π³ Preparing Kubernetes v1.31.0 on Docker 27.2.0 ...
βͺ env NO_PROXY=192.168.49.2
βͺ env NO_PROXY=192.168.49.2,192.168.49.3
π Verifying Kubernetes components...
π Done! kubectl is now configured to use "multinode" cluster and "default" namespace by default
This will start a Kubernetes cluster with 3 nodes, 1 control plane node and 2 worker nodes.
β kubectl get nodes
NAME STATUS ROLES AGE VERSION
multinode Ready control-plane 48s v1.31.0
multinode-m02 Ready <none> 35s v1.31.0
multinode-m03 Ready <none> 25s v1.31.0
Then lets create a simple deployment using nginx image with 3 replicas and apply it.
β kubectl apply -f deployment.yaml
deployment.apps/nginx-deployment created
β kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-deployment-54b9c68f67-fq928 1/1 Running 0 4m56s 10.244.1.2 multinode-m02 <none> <none>
nginx-deployment-54b9c68f67-t85zh 1/1 Running 0 4m56s 10.244.2.2 multinode-m03 <none> <none>
nginx-deployment-54b9c68f67-wvzng 1/1 Running 0 4m56s 10.244.0.3 multinode <none> <none>
As we can see above the pods is spread evenly across all nodes, including control plane node which should not happened. This is because minikube doesn't add the NoSchedule taint by default.
If no NoSchedule taint exists on the control-plane node, it can accept Pods. With 3 replicas and 3 nodes, the scheduler evenly distributes Pods. To prevent this we can manually add the taint using this command below.
Then delete the pods that scheduled in the control plane node.
β kubectl delete pod nginx-deployment-54b9c68f67-wvzng
pod "nginx-deployment-54b9c68f67-wvzng" deleted
Check the pods list again and you should see the new pods is not scheduled in the control plane node.
β kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-deployment-54b9c68f67-fq928 1/1 Running 0 22m 10.244.1.2 multinode-m02 <none> <none>
nginx-deployment-54b9c68f67-hd6bq 1/1 Running 0 14s 10.244.2.3 multinode-m03 <none> <none>
nginx-deployment-54b9c68f67-t85zh 1/1 Running 0 22m 10.244.2.2 multinode-m03 <none> <none>
Node Specific Deployment
Eventually we want to deploy or place a Pod into specific node due to several reason. For example we want to deploy database service in EU region for GDPR compliance, Put the observability services in different nodes to improve reliability, Isolate CPU intensive workload in different node, etc.
Kubernetes provides mechanisms like nodeSelector, nodeAffinity, and taints and tolerations to control pod placement.
nodeSelector
nodeSelector is the simplest recommended way to deploy / place Pod into specific node. We can add nodeSelector field to Pod specification and specify the node labels that we want to target. Make sure that you label the node properly. Kubernetes only schedules the Pod into nodes that have each of the labels we specify.
First lets add a label to node multinode-m02 so we can use it as node selector.
Then update our deployment file to add nodeSelector field inside template.spec. This will tell kubernetes scheduler to schedule the pod to node that have the same label as specified in node selector.
nodeSelector:
node-type: infra
Re-apply the yaml file and check the pods list again. You should see that all the pods run in multinode-m02 node.
β kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-deployment-5579449f87-8c26w 1/1 Running 0 23s 10.244.1.3 multinode-m02 <none> <none>
nginx-deployment-5579449f87-cjbbf 1/1 Running 0 16s 10.244.1.5 multinode-m02 <none> <none>
nginx-deployment-5579449f87-kdk45 1/1 Running 0 19s 10.244.1.4 multinode-m02 <none> <none>
nodeAffinity
Similar like node selector, nodeAffinity also provide constraints on which node the Pod can be scheduled. The difference is that node affinity can have more complex rules (required and preferred).
requiredDuringSchedulingIgnoredDuringExecution: The scheduler can't schedule the Pod unless the rule is met (Hard Requirements).
preferredDuringSchedulingIgnoredDuringExecution: The scheduler tries to find a node that meets the rule. If a matching node is not available, the scheduler still schedules the Pod (Soft Requirements).
This time lets add a label to node multinode-m03 so we can use it for affinity rule.
Re-apply the yaml file and check the pods list again. You should see that all the pods run in multinode-m03 node.
β kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-deployment-78b49dbd66-6nkgj 1/1 Running 0 13s 10.244.2.4 multinode-m03 <none> <none>
nginx-deployment-78b49dbd66-brdvt 1/1 Running 0 5s 10.244.2.6 multinode-m03 <none> <none>
nginx-deployment-78b49dbd66-j27td 1/1 Running 0 9s 10.244.2.5 multinode-m03 <none> <none>
When both nodeSelector and nodeAffinity with required rule presents, kubernetes scheduler will always try to satisfy both requirements. If no nodes match both requirements the pod will stuck in Pending status. If we check the events there will be Warning: FailedScheduling with message like below.
32s Warning FailedScheduling Pod/nginx-deployment-785475985d-vdv8q 0/3 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }, 2 node(s) didn't match Pod's node affinity/selector. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling.