The Horizontal Pod Autoscaler (HPA) automatically scales the number of pods in a deployment based on observed CPU utilization or memory usage. The controller periodically adjusts the number of replicas in a deployment to match the observed average CPU utilization or memory usage to the target value specified by user.
The Horizontal Pod Autoscaler is implemented as a control loop with a period of default 30 seconds controlled by the controller manager HPA sync-period flag. For per-pod resource metrics like CPU, the controller fetches the metrics from the resource metrics API for each pod targeted by the Horizontal Pod Autoscaler. See Horizontal Pod Autoscaler for more details.
This document walks you through an example of configuring Horizontal Pod Autoscaler for the hpa-example deployment. In addition, we will create a deployment to send an infinite loop of queries to the hpa-example application, demonstrating its autoscaling function and the HPA principle.
About 25 minutes
project-regular
user account, and this account needs to be invited into the project with the role operator
. Please refer to Get started with multi-tenant management.1.1. Log in with project-regular
account. Enter demo-project, then select Application Workloads → Services
1.2. Click Create Service and choose Stateless service, name it hpa
, then click Next.
1.3. Click Add Container Image, then input mirrorgooglecontainers/hpa-example
and press return key. It will automatically search and load the image information, choose Use Default Ports
.
1.4. Click √
to save it, and click Next. Then skip Mount Volumes and Advanced Settings, and click Create. At this point, the stateless service hpa
has been created successfully.
Note: At the same time, the corresponding Deployment and Service have been created in KubeSphere.
2.1. Choose Workloads → Deployments. Enter hpa
to view its detailed page.
2.2. Choose More → Horizontal Pod Autoscaler.
2.3. Give some sample values for HPA configuration as follows. Click OK to finish the configuration.
50
(represents the percent of target CPU utilization)1
10
Note: After setting HPA for Deployment, it will create a
Horizontal Pod Autoscaler
in Kubernetes for autoscaling.
3.1. In the current project, navigate to Workloads → Deployments. Click Create and fill in the basic information in the pop-up window, name it load-generator
, click Next.
3.2. Click on the Add Container Image, enter busybox
into Image edit box, and press return key.
3.3. Scroll down to Start command. Add commands and parameters as follows. These commands are used to request hpa service and create CPU load.
sh,-c
Note the http address example is like http://{$service-name}.{$project-name}.svc.cluster.local. You need to replace the following http address with the actual name of service and project.
while true; do wget -q -O- http://hpa.demo-project.svc.cluster.local; done
3.4. Click on the √
button when you are done, then click Next. We do not use volume in this demo, therefore click Next → Create to complete the creation.
So far, we have created two deployments, i.e. hpa
and load-generator
, and one service, i.e. hpa
.
Choose Workloads → Deployments, enter the deployment hpa
to view detailed page. Please pay attention to the replicas, Pod status and CPU utilization, as well as the Pods monitoring graphs.
When the load-generator
Pod works, it will continuously request hpa
service. As shown from the following screenshot, the CPU utilization is significantly increased after refreshing the page. Currently it is rising to 1012%
, and the desired replicas and current replicas is rising to 10/10
.
After around two minutes, the CPU decreased to 509%
, which proves the principle of HPA.
5.1. Scroll down to the Pods list, and pay attention to the first Pod that we created. Generally, we can see the CPU usage of the Pod shows a significant upward trend in the monitoring graph. When HPA starts working, the CPU usage has an obvious decreased trend. Finally it tends to be smooth.
5.2. Switch to the Monitoring tab and select Last 30 minutes
in the filter.
5.3. Click View all replicas on the right of monitoring graph to inspect all replicas monitoring graphs.
6.1. Go back to Workloads → Deployments and delete load-generator
to cease the load increasing.
6.2. Inspect the status of the hpa
again. You will find that its current CPU utilization has slowly dropped to 10% in a few minutes. Eventually the HPA reduces its deployment replicas to one which is the initial value. The trend in the monitoring curve can also help us to understand the working principle of HPA.
6.3. Now, drill into the Pod detailed page from Pod list, inspect the monitoring graph and review the CPU utilization and Network inbound/outbound trends. We can find the trends match this HPA example.
6.4. Then drill into the container of this Pod, we can find it has the same trend as the Pod.
If you need to modify the settings of the HPA, you can go to the deployment detailed page, and click More → Horizontal Pod Autoscaler, edit the pop-up window at your will.
If you do not need HPA for deployment, you can click ··· → Cancel.
Congratulation! You have been familiar with how to set HPA for deployment through KubeSphere console.