Scaling: Horizontal Pod Autoscaler
In Kubernetes, a HorizontalPodAutoscaler
automatically updates a workload resource (such as a Deployment
or StatefulSet
), with the aim of automatically scaling the workload to match demand.
Horizontal scaling means that the response to increased load is to deploy more Pods. This is different from vertical scaling, which for Kubernetes would mean assigning more resources (for example: memory or CPU) to the Pods that are already running for the workload.
If the load decreases, and the number of Pods is above the configured minimum, the HorizontalPodAutoscaler instructs the workload resource (the Deployment
, StatefulSet
, or other similar resource) to scale back down.
Update our App
Lets update our app first to return hashed message using SHA256
to make some additional CPU load. Add new field called Hash
in our response definition.
Add the hashed message into the Hash
response field and then rebuild the apps.
Lets deploy the newly builded app into our kubernetes cluster. We can do this using rollout restart
command. Kubernetes will spin up new pods and removing the old pods as the new one ready to receive traffic. This ensure there is no down time to the service when we deploying new version of the app.
We can check using get pod
command and we should see new sets of pods running.
Define HPA
Next lets define the horizontal pod autoscaler, we can create new file hpa.yaml
and put below definition there.
spec.scaleTargetRef
spec.scaleTargetRef
In this section we define which object that the autoscaler should target. We choose a deployment object called simple-go
as the target. The controller manager then selects the pods based on the target resource's .spec.selector
labels (which in our case it's app: simple-go
), and obtains the metrics from either the resource metrics API.
spec.minReplicas
spec.minReplicas
Define minimum number of Pods.
spec.maxReplicas
spec.maxReplicas
Define maximum number of Pods.
spec.metrics
spec.metrics
In this section we define what metrics should the controller manager look. We chose cpu
as the resource with target type: Utilization
and averageUtilization: 50
. It means that if the pods average CPU utilization above 50%
, the controller manager will spin up new pods.
The controller manager will keep spin up new pods until the average utilization below specified value or maxReplicas
reached.
If the average utilization below the specified value for certain amount of time (defaulted to 300s
). The controller manager will deleting / removing pods until the minReplicas
reached.
spec.behavior
spec.behavior
We define here that the scaleDown
behavior stabilizationWindowSeconds: 120
, that means the controller manager need to wait for 120 seconds or 2 minutes for the metrics to stabilize before deleting some pods.
Policies here means that the controller manager can remove 1
pods for every 15s
if the metrics is stabilized.
Apply and Validate
Lets apply our HPA using apply
command and validate using get horizontalpodautoscalers.autoscaling
command.
Load Testing
To test wether our HPA work as expected we can create a load test. This load test will simulate increased traffic into our servce.
Basically we will do load test to our service for 120s
. To run the script use k6 run <test_file>
command like this.
New Pods Created
After few seconds lets check our HPA and we should see that the REPLICAS
is increasing. In mine I see 6
replicas that means I have 3
new pods running. In your machine this could be different. This means that the scale up is work as expected.
If we check the pods list we will also see the new pods that are running, you can differentiate by the ages.
Few minutes after the load test done we can check again the pods and we should see that we only have 3
pods now, equal to specified minReplicas
value.
This means that the service traffic is stabilized and the controller manager already remove unneeded replicas until it reached minimum number of replicas we specified. The scale down of our HPA is work as expected.
References
Last updated