Auto scaling in kubernetes

When we deploy a API in kubernets we must define replication number for the pod, but as we know there will be high traffic during peak time and we usually can’t estimate service capacity exactly at first time, in this case we must scale our service like creating more pods to share online traffic to avoid service crash down.

We usually scale service manually before using kubernetes, append more nodes during peak time and destroy nodes when the traffic became smooth.

In kubernetes there’s a kind of feature called HPA(Horizontal Pod Autoscaler) which could help your scale service automatically. You could specify minimum and maximum replica number in yaml file, HPA will monitor pod’s CPU and Memory by collecting pod’s metric, if HPA found your pod’s metric is over the threshold number which you defined in yaml file, it will create more pods automatically and join the service cluster to load the traffic.

Here is a simple HPA samle:

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  name: hpa-demo
  namespace: test-ns
  labels:
    app: hpa-demo
    component: api
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: hpa-demo
  minReplicas: 3
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: memory
      targetAverageUtilization: 75
  - type: Resource
    resource:
      name: cpu
      targetAverageUtilization: 75

I defined there’s will be at least 3 replicas for the pod, if the CPU or Memory usage is over 75%, HPA will create at most 10 pods.

HPA monitor pod’s metric by using metrics-server.