# Scaling: Horizontal Pod Autoscaler

In Kubernetes, a `HorizontalPodAutoscaler` automatically updates a workload resource (such as a `Deployment` or `StatefulSet`), with the aim of automatically scaling the workload to match demand.

Horizontal scaling means that the response to increased load is to deploy more Pods. This is different from vertical scaling, which for Kubernetes would mean assigning more resources (for example: memory or CPU) to the Pods that are already running for the workload.

If the load decreases, and the number of Pods is above the configured minimum, the HorizontalPodAutoscaler instructs the workload resource (the `Deployment`, `StatefulSet`, or other similar resource) to scale back down.

## Update our App

Lets update our app first to return hashed message using `SHA256` to make some additional CPU load. Add new field called `Hash` in our response definition.

```go
type Response struct {
    Message string `json:"message,omitempty"`
    Hash    string `json:"hash,omitempty"`
}
```

Add the hashed message into the `Hash` response field and then rebuild the apps.

```go
hasher := sha256.New()
hasher.Write([]byte(message))
hash := fmt.Sprintf("%x", hasher.Sum(nil))

res := Response{
    Message: message,
    Hash:    hash,
}
```

Lets deploy the newly builded app into our kubernetes cluster. We can do this using `rollout restart` command. Kubernetes will spin up new pods and removing the old pods as the new one ready to receive traffic. This ensure there is no down time to the service when we deploying new version of the app.

```bash
➜ kubectl rollout restart deployment/simple-go
deployment.apps/simple-go restarted
```

We can check using `get pod` command and we should see new sets of pods running.

## Define HPA

Next lets define the horizontal pod autoscaler, we can create new file `hpa.yaml` and put below definition there.

```yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: simple-go-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: simple-go
  minReplicas: 3
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 50
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 120
      policies:
      - type: Pods
        value: 1
        periodSeconds: 15
```

### `spec.scaleTargetRef`

In this section we define which object that the autoscaler should target. We choose a deployment object called `simple-go` as the target. The controller manager then selects the pods based on the target resource's `.spec.selector` labels (which in our case it's `app: simple-go`), and obtains the metrics from either the resource metrics API.

### `spec.minReplicas`

Define minimum number of Pods.

### `spec.maxReplicas`

Define maximum number of Pods.

### `spec.metrics`

In this section we define what metrics should the controller manager look. We chose `cpu` as the resource with target `type: Utilization` and `averageUtilization: 50`. It means that if the pods average CPU utilization above `50%`, the controller manager will spin up new pods.

The controller manager will keep spin up new pods until the average utilization below specified value or `maxReplicas` reached.

If the average utilization below the specified value for certain amount of time (defaulted to `300s`). The controller manager will deleting / removing pods until the `minReplicas` reached.

### `spec.behavior`

We can also define how the scaling behavior. The default behavior can be read here: [HPA: Default Behavior](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#default-behavior).

We define here that the `scaleDown` behavior `stabilizationWindowSeconds: 120`, that means the controller manager need to wait for 120 seconds or 2 minutes for the metrics to stabilize before deleting some pods.

```yaml
    policies:
      - type: Pods
        value: 1
        periodSeconds: 15
```

Policies here means that the controller manager can remove `1` pods for every `15s` if the metrics is stabilized.

### Apply and Validate

Lets apply our HPA using `apply` command and validate using `get horizontalpodautoscalers.autoscaling` command.

```bash
➜ kubectl apply -f hpa.yaml 
horizontalpodautoscaler.autoscaling/simple-go-hpa create
```

```bash
➜ kubectl get horizontalpodautoscalers.autoscaling
NAME            REFERENCE              TARGETS                         
simple-go-hpa   Deployment/simple-go   cpu: 10%/50%  

MINPODS   MAXPODS   REPLICAS   AGE
3         10        3          26s
```

## Load Testing

To test wether our HPA work as expected we can create a load test. This load test will simulate increased traffic into our servce.

We use `k6` to do load testing, you can download and install it by following instruction on their [official documentation](https://grafana.com/docs/k6/latest/). After installing, create new file called `load_test.js` and copy paste this code below. Do change the url to your service url from minikube.

```js
import http from 'k6/http';
import { sleep } from 'k6';

export const options = {
  // A number specifying the number of VUs to run concurrently.
  vus: 100,
  // A string specifying the total duration of the test run.
  duration: '120s',
};

export default function() {
  http.get('http://127.0.0.1:64544'); // change to your service url from minikube
  sleep(1);
}
```

Basically we will do load test to our service for `120s`. To run the script use `k6 run <test_file>` command like this.

```bash
➜ k6 run load_test.js

          /\      |‾‾| /‾‾/   /‾‾/   
     /\  /  \     |  |/  /   /  /    
    /  \/    \    |     (   /   ‾‾\  
   /          \   |  |\  \ |  (‾)  | 
  / __________ \  |__| \__\ \_____/ .io

  execution: local
     script: load_test.js
     output: -

  scenarios: (100.00%) 1 scenario, 100 max VUs, 2m30s max duration (incl. graceful stop):
           * default: 100 looping VUs for 2m0s (gracefulStop: 30s)


running (0m23.3s), 100/100 VUs, 2207 complete and 0 interrupted iterations
default   [======>-------------------------------] 100 VUs  0m23.3s/2m0s
```

### New Pods Created

After few seconds lets check our HPA and we should see that the `REPLICAS` is increasing. In mine I see `6` replicas that means I have `3` new pods running. In your machine this could be different. This means that the scale up is work as expected.

```bash
➜ kubectl get horizontalpodautoscalers.autoscaling
NAME            REFERENCE              TARGETS                          
simple-go-hpa   Deployment/simple-go   cpu: 93%/50%  

MINPODS   MAXPODS   REPLICAS   AGE
3         10        6          11m
```

If we check the pods list we will also see the new pods that are running, you can differentiate by the ages.

```bash
➜ kubectl get pods                                
simple-go-764bc77644-2dtbj   1/1     Running   0          33s
simple-go-764bc77644-8pfcs   1/1     Running   0          17m31s
simple-go-764bc77644-g9zph   1/1     Running   0          33s
simple-go-764bc77644-grfhg   1/1     Running   0          17m29s
simple-go-764bc77644-jf897   1/1     Running   0          33s
simple-go-764bc77644-snb8n   1/1     Running   0          17m26s
```

Few minutes after the load test done we can check again the pods and we should see that we only have `3` pods now, equal to specified `minReplicas` value.

```bash
➜ kubectl get pods  
NAME                         READY   STATUS    RESTARTS   AGE
simple-go-764bc77644-8pfcs   1/1     Running   0          25m
simple-go-764bc77644-grfhg   1/1     Running   0          25m
simple-go-764bc77644-snb8n   1/1     Running   0          25m
```

This means that the service traffic is stabilized and the controller manager already remove unneeded replicas until it reached minimum number of replicas we specified. The scale down of our HPA is work as expected.

## References

* <https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/>
* <https://grafana.com/docs/k6/latest/>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://bagus-cahyono.gitbook.io/programming-notes/cka/07_hpa.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
