Auto scaling in kubernetes
When we deploy a API in kubernets we must define replication number for the pod, but as we know there will be high traffic during peak time and we usually can’t estimate service capacity exactly at first time, in this case we must scale our service like creating more pods to share online traffic to avoid service crash down.
We usually scale service manually before using kubernetes, append more nodes during peak time and destroy nodes when the traffic became smooth.
In kubernetes there’s a kind of feature called HPA(Horizontal Pod Autoscaler) which could help your scale service automatically. You could specify minimum and maximum replica number in yaml file, HPA will monitor pod’s CPU and Memory by collecting pod’s metric, if HPA found your pod’s metric is over the threshold number which you defined in yaml file, it will create more pods automatically and join the service cluster to load the traffic.

Here is a simple HPA samle:
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: hpa-demo
namespace: test-ns
labels:
app: hpa-demo
component: api
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: hpa-demo
minReplicas: 3
maxReplicas: 10
metrics:
- type: Resource
resource:
name: memory
targetAverageUtilization: 75
- type: Resource
resource:
name: cpu
targetAverageUtilization: 75
I defined there’s will be at least 3 replicas for the pod, if the CPU or Memory usage is over 75%, HPA will create at most 10 pods.
HPA monitor pod’s metric by using metrics-server.
Reply