Consul with ACL
Enable ACL in Consul to protect your configurations, I deployed Consul by Helm.


A good service not only provide good functionalities, but also ensure the availability and uptime.
We reinforce our service from QoS, QPS, Throttling, Scaling, Throughput, Monitoring.
There’re 3 kinds of QoS in kubernetes: Guaranteed, Burstable, BestEffort. We usually use Guaranteed, Burstable for different services.
#Guaranteed
resources:
requests:
cpu: 1000m
memory: 4Gi
limits:
cpu: 1000m
memory: 4Gi
#Burstable
resources:
requests:
cpu: 1000m
memory: 4Gi
limits:
cpu: 6000m
memory: 8Gi
We did lots of stress test on APIs by Gatling before we release them, we mainly care about mean response time, std deviation, mean requests/sec, error rate (API Testing Report), during testing we monitor server metrics by Datadog to find out bottlenecks.
We usually test APIs in two scenarios: internal, external. External testing result is much lower than internal testing because of network latency, network bandwidth and son on.
Internal testing result
================================================================================
---- Global Information --------------------------------------------------------
> request count 246000 (OK=246000 KO=0 )
> min response time 16 (OK=16 KO=- )
> max response time 5891 (OK=5891 KO=- )
> mean response time 86 (OK=86 KO=- )
> std deviation 345 (OK=345 KO=- )
> response time 50th percentile 30 (OK=30 KO=- )
> response time 75th percentile 40 (OK=40 KO=- )
> response time 95th percentile 88 (OK=88 KO=- )
> response time 99th percentile 1940 (OK=1940 KO=- )
> mean requests/sec 817.276 (OK=817.276 KO=- )
---- Response Time Distraaibution ------------------------------------------------
> t < 800 ms 240565 ( 98%)
> 800 ms < t < 1200 ms 1110 ( 0%)
> t > 1200 ms 4325 ( 2%)
> failed 0 ( 0%)
================================================================================
External testing result
================================================================================
---- Global Information --------------------------------------------------------
> request count 33000 (OK=32999 KO=1 )
> min response time 477 (OK=477 KO=60001 )
> max response time 60001 (OK=41751 KO=60001 )
> mean response time 600 (OK=599 KO=60001 )
> std deviation 584 (OK=484 KO=0 )
> response time 50th percentile 497 (OK=497 KO=60001 )
> response time 75th percentile 506 (OK=506 KO=60001 )
> response time 95th percentile 1366 (OK=1366 KO=60001 )
> response time 99th percentile 2125 (OK=2122 KO=60001 )
> mean requests/sec 109.635 (OK=109.631 KO=0.003 )
---- Response Time Distribution ------------------------------------------------
> t < 800 ms 29826 ( 90%)
> 800 ms < t < 1200 ms 1166 ( 4%)
> t > 1200 ms 2007 ( 6%)
> failed 1 ( 0%)
---- Errors --------------------------------------------------------------------
> i.g.h.c.i.RequestTimeoutException: Request timeout after 60000 1 (100.0%)
ms
================================================================================
We throttle API by Nginx limit, we configured ingress like this:
annotations:
nginx.ingress.kubernetes.io/limit-connections: '30'
nginx.ingress.kubernetes.io/limit-rps: '60'
And it will generate Nginx configuration dynamically like this:
limit_conn_zone $limit_ZGVsaXZlcnktY2RuYV9kc2QtYXBpLWNkbmEtZ2F0ZXdheQ zone=xxx_conn:5m;
limit_req_zone $limit_ZGVsaXZlcnktY2RuYV9kc2QtYXBpLWNkbmEtZ2F0ZXdheQ zone=xxx_rps:5m rate=60r/s;
server {
server_name xxx.xxx ;
listen 80;
location ~* "^/xxx/?(?<baseuri>.*)" {
...
...
limit_conn xxx_conn 30;
limit_req zone=xxx_rps burst=300 nodelay;
...
...
}
We use HPA in kubernetes to ensure auto (Auto scaling in kubernetes), you could check HPA status in server:
[xxx@xxx ~]$ kubectl get hpa -n test-ns
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
api-demo Deployment/api-demo 39%/30%, 0%/30% 3 10 3 126d
[xxx@xxx ~]$ kubectl get pod -n test-ns
NAME READY STATUS RESTARTS AGE
api-demo-76b9954f57-6hvzx 1/1 Running 0 126d
api-demo-76b9954f57-mllsx 1/1 Running 0 126d
api-demo-76b9954f57-s22k8 1/1 Running 0 126d
We integrated Datadog for monitoring(Monitoring by Datadog), we could check detail API metrics from various dashboards.
Also we could calculate throughout from user, request, request time.
For security issue we decided to enable LDAP in presto, to deploy presto into kubernetes cluster we build presto image ourselves which include kerberos authentication and LDAP configurations.
As you see the image structure, configurations under catalog/etc/hive are very important, please pay attention.
krb5.conf and xxx.keytab are used to connect to kerberos
password-authenticator.properties and ldap_server.pem under etc, hive.properties and hive-security.json under catalog are used to connect to LDAP.
password-authenticator.properties
password-authenticator.name=ldap
ldap.url=ldaps://<IP>:<PORT>
ldap.user-bind-pattern=xxxxxx
ldap.user-base-dn=xxxxxx
hive.properties
connector.name=hive-hadoop2
hive.security=file
security.config-file=<hive-security.json>
hive.metastore.authentication.type=KERBEROS
hive.metastore.uri=thrift://<IP>:<PORT>
hive.metastore.service.principal=<SERVER-PRINCIPAL>
hive.metastore.client.principal=<CLIENT-PRINCIPAL>
hive.metastore.client.keytab=<KEYTAB>
hive.config.resources=core-site.xml, hdfs-site.xml
hive-security.json
{
"schemas": [{
"user": "user_1",
"schema": "db_1",
"owner": false
}, {
"user": " ",
"schema": "db_1",
"owner": false
}, {
"user": "user_2",
"schema": "db_2",
"owner": false
}],
"tables": [{
"user": "user_1",
"schema": "db_1",
"table": "table_1",
"privileges": ["SELECT"]
}, {
"user": "user_1",
"schema": "db_1",
"table": "table_2",
"privileges": ["SELECT"]
}, {
"user": "user_2",
"schema": "db_1",
"table": ".*",
"privileges": ["SELECT"]
}, {
"user": "user_2",
"schema": "db_2",
"table": "table_1",
"privileges": ["SELECT"]
}, {
"user": "user_2",
"schema": "db_2",
"table": "table_2",
"privileges": ["SELECT"]
}],
"sessionProperties": [{
"allow": false
}]
}
We konw X-Pack is is an extension that bundles security, monitoring, reporting, and graph capabilities into one package.
From ELK stack 6.3, X-Pack is integrated into Elasticsearch, you can try it by 30-day-trial license. After the trail you could choose buy the license or downgrade to the normal license.
1.Down & Unzip & Enter
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.3.2.tar.gz
tar -zvxf elasticsearch-6.3.2.tar.gz && cd elasticsearch-6.3.2
2.Enable monitor & Start
echo "xpack.monitoring.enabled: true" >>config/elasticsearch.yml
echo "xpack.security.enabled: true" >>config/elasticsearch.yml
echo "xpack.watcher.enabled: true" >>config/elasticsearch.yml
echo "xpack.ml.enabled: true" >>config/elasticsearch.yml
echo "xpack.graph.enabled: true" >>config/elasticsearch.yml
echo "xpack.monitoring.collection.enabled: true" >>config/elasticsearch.yml
bin/elasticsearch
1.Download & Unzip & Entry
wget wget https://artifacts.elastic.co/downloads/kibana/kibana-6.3.2-darwin-x86_64.tar.gz
tar -zvxf kibana-6.3.2-darwin-x86_64.tar.gz && cd kibana-6.3.2-darwin-x86_64
2.Start
bin/kibana
3.Visit kibana, http://localhost:5601, you will see the dashboard, next we will enable 30-day-trail license.
3.1.Click Management on left menu
3.2.Click License Management
3.3.Click Start trial button
3.4.Click Start my trial button
3.5.Start trail license done
1.Enter elasticsearch directory, execute command as below to generate password for users, please choose one
bin/elasticsearch-setup-passwords interactive(you need to enter password for every user)
bin/elasticsearch-setup-passwords auto(will generate password for users automatically)
2.Enter kibana directory, stop kibana and set username/password in kibana.yml, then start kibana
echo "elasticsearch.username: kibana" >>config/kibana.yml
echo "elasticsearch.password: kibana123" >>config/kibana.yml
bin/kibana
3.After finishing all the settings, you will see the login page
4.Enter the username/password which you set in kibana.yml, then you could login success
When I configured presto cluster to connect hive by kerberos, I met some problems which cost me too much time to solve them, so I summarized the problems, hope could help others.
1.Append -Djava.security.krb5.conf=”krb5.conf location” to etc/jvm.properties
8) Error in custom provider, java.lang.NoClassDefFoundError: Could not initialize class com.facebook.presto.hive.authentication.KerberosHadoopAuthentication
at com.facebook.presto.hive.authentication.AuthenticationModules$1.createHadoopAuthentication(AuthenticationModules.java:59) (via modules: com.facebook.presto.hive.authentication.HiveAuthenticationModule -> io.airlift.configuration.ConditionalModule -> com.facebook.presto.hive.authentication.AuthenticationModules$1)
while locating com.facebook.presto.hive.authentication.HadoopAuthentication annotated with @com.facebook.presto.hive.ForHiveMetastore()
for the 2nd parameter of com.facebook.presto.hive.authentication.KerberosHiveMetastoreAuthentication.<init>(KerberosHiveMetastoreAuthentication.java:44)
...
...
2.Specify hdfs-site.xml/core-site.xml in hive.properties like hive.config.resources=xxx/core-site.xml,xxx/hdfs-site.xml
Query 20180504_150148_00018_v6ndf failed: java.net.UnknownHostException: xxx
3.Download hadoop-lzo jar into plugin/hive-hadoop2
Query 20180504_150959_00002_3f2qe failed: Unable to create input format org.apache.hadoop.mapred.TextInputFormat
Caused by: java.lang.IllegalArgumentException: Compression codec com.hadoop.compression.lzo.LzoCodec not found.
at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:139)
at org.apache.hadoop.io.compress.CompressionCodecFactory.<init>(CompressionCodecFactory.java:180)
at org.apache.hadoop.mapred.TextInputFormat.configure(TextInputFormat.java:45)
... 19 more
Caused by: java.lang.ClassNotFoundException: Class com.hadoop.compression.lzo.LzoCodec not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101)
at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:132)
... 21 more
4.Export KRB5_CONFIG & get kerberos tgt, use kinit command
Query 20180504_153940_00000_nrsgy failed: Failed to list directory: hdfs://xxx/user/hive/warehouse/xxx.db/xxx
5.More than one coordinator in the cluster
2018-05-04T18:10:56.410Z WARN http-worker-4560 com.facebook.presto.execution.SqlTaskManager Switching coordinator affinity from hhbts to qhnep
2018-05-04T18:10:56.500Z WARN http-worker-4560 com.facebook.presto.execution.SqlTaskManager Switching coordinator affinity from qhnep to c83wr
2018-05-04T18:10:56.578Z WARN http-worker-4395 com.facebook.presto.execution.SqlTaskManager Switching coordinator affinity from c83wr to ujj9n
2018-05-04T18:10:56.749Z WARN http-worker-4432 com.facebook.presto.execution.SqlTaskManager Switching coordinator affinity from ujj9n to wdsxf
2018-05-04T18:10:57.009Z WARN http-worker-4584 com.facebook.presto.execution.SqlTaskManager Switching coordinator affinity from wdsxf to hhbts
For data security, hadoop cluster usually implement different security mechanisms, most commonly used mechanism is kerberos. Recently I tested how to connect hive by kerberos in presto.
1.Add krb5.conf/keytab/hdfs-site.xml/core-site.xml in every node.
2.Modify etc/jvm.properties, append -Djava.security.krb5.conf=”krb5.conf location”
3.Create hive.properties under etc/catalog
cat << 'EOF' > etc/catalog/hive.properties
connector.name=hive-hadoop2
hive.metastore.uri=thrift://xxx:9083
hive.metastore.authentication.type=KERBEROS
hive.metastore.service.principal=xxx@xxx.com
hive.metastore.client.principal=xxx@xxx.com
hive.metastore.client.keytab="keytab location"
hive.config.resources="core-site.xml and hdfs-site.xml" location
EOF
4.Download hadoop-lzo jar into plugin/hive-hadoop2
wget http://maven.twttr.com/com/hadoop/gplcompression/hadoop-lzo/0.4.16/hadoop-lzo-0.4.16.jar -O plugin/hive-hadoop2
5.Get principal tgt
export KRB5_CONFIG="krb5.conf location"
kinit -kt "keytab location" xxx@xxx.com
6.Restart presto
bin/launcher restart
After applying certification on Let’s Encrypt, I tested the certification and generated the report.
not good..
I registed my domain “wanghongmeng.com” on Aliyun, and applied free EC2 server for one year on AWS.
After building my blog on AWS, I set A parse to the server’s IP of AWS.
But yesterday I received email from Aliyun which said that my server was not in Aliyun after they checking, it was not allowed, I have to miggrate my blog server to Aliyun, otherwise they will undo my authority number.
After thinking about this, for saving money(Aliyun is not free for one year), I solved it by the way below:
1.Set A parse to my friend’s server ip which was bought in Aliyun
2.Add a piece of configuration in his nginx.conf:
server {
listen 80;
server_name wanghongmeng.com www.wanghongmeng.com;
location / {
rewrite ^/(.*)$ https://$server_name/$1 permanent;
}
}
server {
listen 443;
server_name wanghongmeng.com www.wanghongmeng.com;
ssl on;
ssl_certificate "Location of Pem File";
ssl_certificate_key "Location of Key File";
ssl_session_timeout 5m;
ssl_protocols TLSv1 TLSv1.1 TLSv1.2;
ssl_ciphers "Your Algorithm";
ssl_session_cache shared:SSL:50m;
ssl_prefer_server_ciphers on;
location / {
proxy_pass http://AWS's IP:443/;
}
}
3.Expose 443 port on my AWS, and only accept requests from my friend’s server IP:
server {
listen 443;
set $flag 0;
if ($host = 'www.wanghongmeng.com') {
set $flag 1;
}
if ($host = 'wanghongmeng.com') {
set $flag 1;
}
if ($flag = 0){
return 403;
}
location / {
allow "My Friend's Server IP";
proxy_pass http://blog-ip;
}
}
Things done! 😀😀
I thought something before, when I check nginx’s log, I found a wired hostname.
After checking, I think our website was mirrored.
I think they parsed their domain by CNAME to our domain, and we don’t do any host check at that time.
To prevent being mirrored again, I add host check configuration in nginx.conf
set $flag 0;
if ($host = 'www.wanghongmeng.com') {
set $flag 1;
}
if ($host = 'wanghongmeng.com') {
set $flag 1;
}
if ($flag = 0){
return 403;
}
By adding this, nginx will check every request to see if it’s from our domain, if not, return 403 response code.
After this, our website was no longer mirrored again.
Nginx Version: 1.9.12
Reply