Updates from February, 2019 Toggle Comment Threads | Keyboard Shortcuts

  • Wang 22:54 on 2019-02-22 Permalink | Reply
    Tags: , ,   

    Xinhai Bay 2019!

     
  • Wang 22:12 on 2019-02-11 Permalink | Reply
    Tags: , , , , , , , , ,   

    Guarantee service availability in kubernetes 

    A good service not only provide good functionalities, but also ensure the availability and uptime.

    We reinforce our service from QoS, QPS, Throttling, Scaling, Throughput, Monitoring.

    Qos

    There’re 3 kinds of QoS in kubernetes: Guaranteed, Burstable, BestEffort. We usually use Guaranteed, Burstable for different services.

    #Guaranteed
    resources:
      requests:
        cpu: 1000m
        memory: 4Gi
      limits:
        cpu: 1000m
        memory: 4Gi
    
    #Burstable
    resources:
      requests:
        cpu: 1000m
        memory: 4Gi
      limits:
        cpu: 6000m
        memory: 8Gi
    
    QPS

    We did lots of stress test on APIs by Gatling before we release them, we mainly care about mean response time, std deviation, mean requests/sec, error rate (API Testing Report), during testing we monitor server metrics by Datadog to find out bottlenecks.

    We usually test APIs in two scenarios: internal, external. External testing result is much lower than internal testing because of network latency, network bandwidth and son on.

    Internal testing result

    ================================================================================
    ---- Global Information --------------------------------------------------------
    > request count                                     246000 (OK=246000 KO=0     )
    > min response time                                     16 (OK=16     KO=-     )
    > max response time                                   5891 (OK=5891   KO=-     )
    > mean response time                                    86 (OK=86     KO=-     )
    > std deviation                                        345 (OK=345    KO=-     )
    > response time 50th percentile                         30 (OK=30     KO=-     )
    > response time 75th percentile                         40 (OK=40     KO=-     )
    > response time 95th percentile                         88 (OK=88     KO=-     )
    > response time 99th percentile                       1940 (OK=1940   KO=-     )
    > mean requests/sec                                817.276 (OK=817.276 KO=-     )
    ---- Response Time Distraaibution ------------------------------------------------
    > t < 800 ms                                        240565 ( 98%)
    > 800 ms < t < 1200 ms                                1110 (  0%)
    > t > 1200 ms                                         4325 (  2%)
    > failed                                                 0 (  0%)
    ================================================================================
    

    External testing result

    ================================================================================
    ---- Global Information --------------------------------------------------------
    > request count                                      33000 (OK=32999  KO=1     )
    > min response time                                    477 (OK=477    KO=60001 )
    > max response time                                  60001 (OK=41751  KO=60001 )
    > mean response time                                   600 (OK=599    KO=60001 )
    > std deviation                                        584 (OK=484    KO=0     )
    > response time 50th percentile                        497 (OK=497    KO=60001 )
    > response time 75th percentile                        506 (OK=506    KO=60001 )
    > response time 95th percentile                       1366 (OK=1366   KO=60001 )
    > response time 99th percentile                       2125 (OK=2122   KO=60001 )
    > mean requests/sec                                109.635 (OK=109.631 KO=0.003 )
    ---- Response Time Distribution ------------------------------------------------
    > t < 800 ms                                         29826 ( 90%)
    > 800 ms < t < 1200 ms                                1166 (  4%)
    > t > 1200 ms                                         2007 (  6%)
    > failed                                                 1 (  0%)
    ---- Errors --------------------------------------------------------------------
    > i.g.h.c.i.RequestTimeoutException: Request timeout after 60000      1 (100.0%)
     ms
    ================================================================================
    
    Throttling

    We throttle API by Nginx limit, we configured ingress like this:

    annotations:
      nginx.ingress.kubernetes.io/limit-connections: '30'
      nginx.ingress.kubernetes.io/limit-rps: '60'
    

    And it will generate Nginx configuration dynamically like this:

    limit_conn_zone $limit_ZGVsaXZlcnktY2RuYV9kc2QtYXBpLWNkbmEtZ2F0ZXdheQ zone=xxx_conn:5m;
    limit_req_zone $limit_ZGVsaXZlcnktY2RuYV9kc2QtYXBpLWNkbmEtZ2F0ZXdheQ zone=xxx_rps:5m rate=60r/s;
    
    server {
        server_name xxx.xxx ;
        listen 80;
        
        location ~* "^/xxx/?(?<baseuri>.*)" {
            ...
            ...        
            limit_conn xxx_conn 30;
            limit_req zone=xxx_rps burst=300 nodelay;
            ...
            ...        
    }
    
    Scaling

    We use HPA in kubernetes to ensure auto (Auto scaling in kubernetes), you could check HPA status in server:

    [xxx@xxx ~]$ kubectl get hpa -n test-ns
    NAME       REFERENCE             TARGETS           MINPODS   MAXPODS   REPLICAS   AGE
    api-demo   Deployment/api-demo   39%/30%, 0%/30%   3         10        3          126d
    
    [xxx@xxx ~]$ kubectl get pod -n test-ns
    NAME                           READY     STATUS    RESTARTS   AGE
    api-demo-76b9954f57-6hvzx      1/1       Running   0          126d
    api-demo-76b9954f57-mllsx      1/1       Running   0          126d
    api-demo-76b9954f57-s22k8      1/1       Running   0          126d
    
    
    Throughput & Monitoring

    We integrated Datadog for monitoring(Monitoring by Datadog), we could check detail API metrics from various dashboards.

    Also we could calculate throughout from user, request, request time.

     
c
Compose new post
j
Next post/Next comment
k
Previous post/Previous comment
r
Reply
e
Edit
o
Show/Hide comments
t
Go to top
l
Go to login
h
Show/Hide help
shift + esc
Cancel
%d bloggers like this: