Tagged: Presto Toggle Comment Threads | Keyboard Shortcuts

  • Wang 22:36 on 2021-03-02 Permalink | Reply
    Tags: , , , Presto   

    Emerging Architectures for Modern Data Infrastructure

     
  • Wang 22:21 on 2018-11-05 Permalink | Reply
    Tags: , , , , , , , Presto,   

    [Presto] Secure with LDAP 

    For security issue we decided to enable LDAP in presto, to deploy presto into kubernetes cluster we build presto image ourselves which include kerberos authentication and LDAP configurations.

    As you see the image structure, configurations under catalog/etc/hive are very important, please pay attention.

    krb5.conf and xxx.keytab are used to connect to kerberos

    password-authenticator.properties and ldap_server.pem under etc, hive.properties and hive-security.json under catalog are used to connect to LDAP.

    password-authenticator.properties

    password-authenticator.name=ldap
    ldap.url=ldaps://<IP>:<PORT>
    ldap.user-bind-pattern=xxxxxx
    ldap.user-base-dn=xxxxxx
    

    hive.properties

    connector.name=hive-hadoop2
    hive.security=file
    security.config-file=<hive-security.json>
    hive.metastore.authentication.type=KERBEROS
    hive.metastore.uri=thrift://<IP>:<PORT>
    hive.metastore.service.principal=<SERVER-PRINCIPAL>
    hive.metastore.client.principal=<CLIENT-PRINCIPAL>
    hive.metastore.client.keytab=<KEYTAB>
    hive.config.resources=core-site.xml, hdfs-site.xml
    

    hive-security.json

    {
      "schemas": [{
        "user": "user_1",
        "schema": "db_1",
        "owner": false
      }, {
        "user": " ",
        "schema": "db_1",
        "owner": false
      }, {
        "user": "user_2",
        "schema": "db_2",
        "owner": false
      }],
      "tables": [{
        "user": "user_1",
        "schema": "db_1",
        "table": "table_1",
        "privileges": ["SELECT"]
      }, {
        "user": "user_1",
        "schema": "db_1",
        "table": "table_2",
        "privileges": ["SELECT"]
      }, {
        "user": "user_2",
        "schema": "db_1",
        "table": ".*",
        "privileges": ["SELECT"]
      }, {
        "user": "user_2",
        "schema": "db_2",
        "table": "table_1",
        "privileges": ["SELECT"]
      }, {
        "user": "user_2",
        "schema": "db_2",
        "table": "table_2",
        "privileges": ["SELECT"]
      }],
      "sessionProperties": [{
        "allow": false
      }]
    }
    
     
  • Wang 22:37 on 2018-05-07 Permalink | Reply
    Tags: Presto,   

    [Presto] Kerberos trouble shooting 

    When I configured presto cluster to connect hive by kerberos, I met some problems which cost me too much time to solve them, so I summarized the problems, hope could help others.

    1.Append -Djava.security.krb5.conf=”krb5.conf location” to etc/jvm.properties

    8) Error in custom provider, java.lang.NoClassDefFoundError: Could not initialize class com.facebook.presto.hive.authentication.KerberosHadoopAuthentication
      at com.facebook.presto.hive.authentication.AuthenticationModules$1.createHadoopAuthentication(AuthenticationModules.java:59) (via modules: com.facebook.presto.hive.authentication.HiveAuthenticationModule -> io.airlift.configuration.ConditionalModule -> com.facebook.presto.hive.authentication.AuthenticationModules$1)
      while locating com.facebook.presto.hive.authentication.HadoopAuthentication annotated with @com.facebook.presto.hive.ForHiveMetastore()
        for the 2nd parameter of com.facebook.presto.hive.authentication.KerberosHiveMetastoreAuthentication.<init>(KerberosHiveMetastoreAuthentication.java:44)
      ...
      ...
    

    2.Specify hdfs-site.xml/core-site.xml in hive.properties like hive.config.resources=xxx/core-site.xml,xxx/hdfs-site.xml

    Query 20180504_150148_00018_v6ndf failed: java.net.UnknownHostException: xxx
    

    3.Download hadoop-lzo jar into plugin/hive-hadoop2

    Query 20180504_150959_00002_3f2qe failed: Unable to create input format org.apache.hadoop.mapred.TextInputFormat
    
    Caused by: java.lang.IllegalArgumentException: Compression codec com.hadoop.compression.lzo.LzoCodec not found.
            at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:139)
            at org.apache.hadoop.io.compress.CompressionCodecFactory.<init>(CompressionCodecFactory.java:180)
            at org.apache.hadoop.mapred.TextInputFormat.configure(TextInputFormat.java:45)
            ... 19 more
    Caused by: java.lang.ClassNotFoundException: Class com.hadoop.compression.lzo.LzoCodec not found
            at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101)
            at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:132)
            ... 21 more
    

    4.Export KRB5_CONFIG & get kerberos tgt, use kinit command

    Query 20180504_153940_00000_nrsgy failed: Failed to list directory: hdfs://xxx/user/hive/warehouse/xxx.db/xxx
    

    5.More than one coordinator in the cluster

    2018-05-04T18:10:56.410Z   WARN    http-worker-4560    com.facebook.presto.execution.SqlTaskManager    Switching coordinator affinity from hhbts to qhnep
    2018-05-04T18:10:56.500Z    WARN    http-worker-4560    com.facebook.presto.execution.SqlTaskManager    Switching coordinator affinity from qhnep to c83wr
    2018-05-04T18:10:56.578Z    WARN    http-worker-4395    com.facebook.presto.execution.SqlTaskManager    Switching coordinator affinity from c83wr to ujj9n
    2018-05-04T18:10:56.749Z    WARN    http-worker-4432    com.facebook.presto.execution.SqlTaskManager    Switching coordinator affinity from ujj9n to wdsxf
    2018-05-04T18:10:57.009Z    WARN    http-worker-4584    com.facebook.presto.execution.SqlTaskManager    Switching coordinator affinity from wdsxf to hhbts
    
     
  • Wang 16:56 on 2018-05-02 Permalink | Reply
    Tags: , , , Presto,   

    [Presto] Connect hive by kerberos 

    For data security, hadoop cluster usually implement different security mechanisms, most commonly used mechanism is kerberos. Recently I tested how to connect hive by kerberos in presto.

    1.Add krb5.conf/keytab/hdfs-site.xml/core-site.xml in every node.

    2.Modify etc/jvm.properties, append -Djava.security.krb5.conf=”krb5.conf location”

    3.Create hive.properties under etc/catalog

    cat << 'EOF' > etc/catalog/hive.properties
    connector.name=hive-hadoop2
    
    hive.metastore.uri=thrift://xxx:9083
    hive.metastore.authentication.type=KERBEROS
    hive.metastore.service.principal=xxx@xxx.com
    hive.metastore.client.principal=xxx@xxx.com
    hive.metastore.client.keytab="keytab location"
    
    hive.config.resources="core-site.xml and hdfs-site.xml" location
    EOF
    

    4.Download hadoop-lzo jar into plugin/hive-hadoop2

    wget http://maven.twttr.com/com/hadoop/gplcompression/hadoop-lzo/0.4.16/hadoop-lzo-0.4.16.jar -O plugin/hive-hadoop2
    

    5.Get principal tgt

    export KRB5_CONFIG="krb5.conf location"
    kinit -kt "keytab location" xxx@xxx.com
    

    6.Restart presto

    bin/launcher restart
    
     
  • Wang 20:12 on 2018-03-25 Permalink | Reply
    Tags: Ambari, , , Presto,   

    [Presto] Integrate with Ambari 

    Days before I have installed presto and ambari separately, officially ambari doesn’t support presto, you have to download ambari-presto-service and configure it yourself if you wanna manage presto on ambari.

    So I tried this.

    1.download hdp yum repository

    wget -nv http://public-repo-1.hortonworks.com/HDP/centos6/2.x/updates/2.6.3.0/hdp.repo -O /etc/yum.repos.d/HDP.repo
    

    2.download ambari-presto-service and configure

    version=`hdp-select status hadoop-client | sed 's/hadoop-client - ([0-9].[0-9]).*/1/'`
    mkdir /var/lib/ambari-server/resources/stacks/HDP/$version/services/PRESTO
    wget https://github.com/prestodb/ambari-presto-service/releases/download/v1.2/ambari-presto-1.2.tar.gz
    tar -xvf ambari-presto-1.2.tar.gz -C /var/lib/ambari-server/resources/stacks/HDP/$version/services/PRESTO
    mv /var/lib/ambari-server/resources/stacks/HDP/$version/services/PRESTO/ambari-presto-1.2/* /var/lib/ambari-server/resources/stacks/HDP/$version/services/PRESTO
    rm -rf /var/lib/ambari-server/resources/stacks/HDP/$version/services/PRESTO/ambari-presto-1.2
    chmod -R +x /var/lib/ambari-server/resources/stacks/HDP/$version/services/PRESTO/*
    

    3.restart ambari-server

    ambari-server restart
    

    4.add presto service on ambari, please configure discovery.uri when you add presto service, e.g. discovery.uri: http://coordinator:8285

    After doing this, you could add catalogs and use presto as query engine.

    I did a simple query comparison between Tez and Presto, if you wanna accurate benchmark result, I think this benchmark test could help. The query is to calculate sum on a hive table.

    Presto: 4s

    presto:test> select sum(count) as sum from (
              -> select count(*) as count from t0004998 where month = '6.5'
              -> union
              -> select count(*) as count from t0004998 where typestatus in ('VL2216','VL2217','VL2218','VL2219','VL2220')
              -> union
              -> select count(*) as count from t0004998 where countrycode in ('FAMILY','FORM','GENUS','KINGDOM','ORDER','PHYLUM','SPECIES')
              -> ) t;
      sum   
    --------
     307374 
    (1 row)
    
    Query 20180317_102034_00040_sq83e, FINISHED, 1 node
    Splits: 29 total, 29 done (100.00%)
    0:04 [982K rows, 374MB] [231K rows/s, 87.8MB/s]
    

    Tez: 29.77s

    hive> select sum(count) from (
        > select count(*) as count from t0004998 where month = "6.5"
        > union
        > select count(*) as count from t0004998 where typestatus in ("VL2216","VL2217","VL2218","VL2219","VL2220")
        > union
        > select count(*) as count from t0004998 where countrycode in ("FAMILY","FORM","GENUS","KINGDOM","ORDER","PHYLUM","SPECIES")
        > ) t;
    Query ID = hdfs_20180317102109_5fd30986-f840-450e-aedd-b51c5e3a48f1
    Total jobs = 1
    Launching Job 1 out of 1
    Status: Running (Executing on YARN cluster with App id application_1521267007048_0012)
    
    --------------------------------------------------------------------------------
            VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
    --------------------------------------------------------------------------------
    Map 1 ..........   SUCCEEDED      1          1        0        0       0       0
    Map 10 .........   SUCCEEDED      1          1        0        0       1       0
    Map 8 ..........   SUCCEEDED      1          1        0        0       0       0
    Reducer 11 .....   SUCCEEDED      1          1        0        0       0       0
    Reducer 2 ......   SUCCEEDED      1          1        0        0       0       1
    Reducer 4 ......   SUCCEEDED      1          1        0        0       0       0
    Reducer 6 ......   SUCCEEDED      1          1        0        0       0       0
    Reducer 7 ......   SUCCEEDED      1          1        0        0       0       0
    Reducer 9 ......   SUCCEEDED      1          1        0        0       0       0
    --------------------------------------------------------------------------------
    VERTICES: 09/09  [==========================>>] 100%  ELAPSED TIME: 29.77 s    
    --------------------------------------------------------------------------------
    OK
    307374
    Time taken: 30.732 seconds, Fetched: 1 row(s)
    
     
  • Wang 21:36 on 2018-03-20 Permalink | Reply
    Tags: , , , Presto   

    [Presto] Build pseudo cluster 

    Presto is a distributed query engine which is developed by Facebook, for specific concept and advantages, please refer to the official document, below are the steps how I build pseudo cluster on my mac.

    1.download presto

    wget https://repo1.maven.org/maven2/com/facebook/presto/presto-server/0.196/presto-server-0.196.tar.gz
    tar -zvxf presto-server-0.196.tar.gz && cd presto-server-0.196
    

    2.configure configurations

    mkdir etc
    
    cat << 'EOF' > etc/jvm.config
    -server
    -Xmx16G
    -Xms16G
    -XX:+UseG1GC
    -XX:G1HeapRegionSize=32M
    -XX:+UseGCOverheadLimit
    -XX:+ExplicitGCInvokesConcurrent
    -XX:+HeapDumpOnOutOfMemoryError
    -XX:+ExitOnOutOfMemoryError
    EOF
    
    cat << 'EOF' > etc/log.properties
    com.facebook.presto=INFO
    EOF
    
    cat << 'EOF' > etc/config1.properties
    coordinator=true
    node-scheduler.include-coordinator=true
    http-server.http.port=8001
    query.max-memory=24GB
    query.max-memory-per-node=8GB
    discovery-server.enabled=true
    discovery.uri=http://localhost:8001
    EOF
    
    cat << 'EOF' > etc/config2.properties
    coordinator=false
    node-scheduler.include-coordinator=true
    http-server.http.port=8002
    query.max-memory=24GB
    query.max-memory-per-node=8GB
    discovery-server.enabled=true
    discovery.uri=http://localhost:8001
    EOF
    
    cat << 'EOF' > etc/config3.properties
    coordinator=true
    node-scheduler.include-coordinator=true
    http-server.http.port=8003
    query.max-memory=24GB
    query.max-memory-per-node=8GB
    discovery-server.enabled=true
    discovery.uri=http://localhost:8001
    EOF
    
    cat << 'EOF' > etc/node1.properties
    node.environment=test
    node.id=671d18f9-dd0f-412d-b18c-fe6d7989b040
    node.data-dir=/usr/local/Cellar/presto/0.196/data/node1
    EOF
    
    cat << 'EOF' > etc/node2.properties
    node.environment=test
    node.id=e72fdd91-a135-4936-9a3e-f888c5106ed9
    node.data-dir=/usr/local/Cellar/presto/0.196/data/node2
    EOF
    
    cat << 'EOF' > etc/node3.properties
    node.environment=test
    node.id=6ab76715-1812-4093-95cf-1945f4cfefe3
    node.data-dir=/usr/local/Cellar/presto/0.196/data/node3
    EOF
    

    p.s. If you want to restrict operation, please add access-control.properties as below, only permit read operation.

    cat << 'EOF' > etc/access-control.properties
    access-control.name=read-only
    EOF
    

    3.start presto server

    bin/launcher start --config=etc/config1.properties --node-config=etc/node1.properties
    bin/launcher start --config=etc/config2.properties --node-config=etc/node2.properties
    bin/launcher start --config=etc/config3.properties --node-config=etc/node3.properties
    

    4.downlaod cli

    wget https://repo1.maven.org/maven2/com/facebook/presto/presto-cli/0.196/presto-cli-0.196-executable.jar -O bin/presto-cli
    chmod +x bin/presto-cli
    

    5.create catalogs

    cat << 'EOF' > etc/catalog/mysql.properties
    connector.name=mysql
    connection-url=jdbc:mysql://localhost:3306?useSSL=false
    connection-user=presto
    connection-password=presto
    EOF
    
    cat << 'EOF' > etc/catalog/hive.properties
    connector.name=hive-hadoop2
    hive.metastore.uri=thrift://localhost:9083
    EOF
    

    6.connect

    bin/presto-cli --server localhost:8001 --catalog hive
    
    presto> show catalogs;
     Catalog 
    ---------
     hive    
     mysql   
     system  
    (3 rows)
    
    Query 20180318_045410_00013_sq83e, FINISHED, 1 node
    Splits: 1 total, 1 done (100.00%)
    0:00 [0 rows, 0B] [0 rows/s, 0B/s]
    

    Screenshot:


    P.S. If build cluster, pay attention to below items:

    1.node.id in node.properties in every node must be unique in the cluster, you could generate it by uuid/uuidgen.

    2.query.max-memory-per-node in config.properties better to be half of -Xmx in jvm.config.

     
c
Compose new post
j
Next post/Next comment
k
Previous post/Previous comment
r
Reply
e
Edit
o
Show/Hide comments
t
Go to top
l
Go to login
h
Show/Hide help
shift + esc
Cancel
%d bloggers like this: