Tagged: Sqoop Toggle Comment Threads | Keyboard Shortcuts

  • Wang 20:24 on 2018-03-16 Permalink | Reply
    Tags: , , Sqoop   

    [Sqoop2] Notebook 

    Recently I tested sqoop2 which has many new features compared to sqoop1, about the comparision, I think you could check here and stackoverflow, I will introduce about the operation manual.

    1.install

    wget http://ftp.jaist.ac.jp/pub/apache/sqoop/1.99.7/sqoop-1.99.7-bin-hadoop200.tar.gz
    tar -vzxf sqoop-1.99.7-bin-hadoop200.tar.gz && cd sqoop-1.99.7-bin-hadoop200
    

    2.replace @LOGDIR@/@BASEDIR@ in sqoop.properties

    3.download mysql driver into server/lib

    4.configure proxy user in core-site.xml

    <property>
        <name>hadoop.proxyuser.sqoop2.hosts</name>
        <value>*</value>
    </property>
    <property>
        <name>hadoop.proxyuser.sqoop2.groups</name>
        <value>*</value>
    </property>
    

    5.verify & start sqoop2 server

    bin/sqoop2-tool verify
    bin/sqoop2-server start
    

    6.start client & test

    bin/sqoop2-shell
    

    7.show the connectors

    sqoop:000> show connector
    +------------------------+---------+------------------------------------------------------------+----------------------+
    | Name | Version | Class | Supported Directions |
    +------------------------+---------+------------------------------------------------------------+----------------------+
    | generic-jdbc-connector | 1.99.7 | org.apache.sqoop.connector.jdbc.GenericJdbcConnector | FROM/TO |
    | kite-connector | 1.99.7 | org.apache.sqoop.connector.kite.KiteConnector | FROM/TO |
    | oracle-jdbc-connector | 1.99.7 | org.apache.sqoop.connector.jdbc.oracle.OracleJdbcConnector | FROM/TO |
    | ftp-connector | 1.99.7 | org.apache.sqoop.connector.ftp.FtpConnector | TO |
    | hdfs-connector | 1.99.7 | org.apache.sqoop.connector.hdfs.HdfsConnector | FROM/TO |
    | kafka-connector | 1.99.7 | org.apache.sqoop.connector.kafka.KafkaConnector | TO |
    | sftp-connector | 1.99.7 | org.apache.sqoop.connector.sftp.SftpConnector | TO |
    +------------------------+---------+------------------------------------------------------------+----------------------+
    

    8.create links & show links

    sqoop:000> create link -connector generic-jdbc-connector
    sqoop:000> create link -connector hdfs-connector
    
    sqoop:000> show link
    +-------------+------------------------+---------+
    | Name | Connector Name | Enabled |
    +-------------+------------------------+---------+
    | mysql-local | generic-jdbc-connector | true |
    | hdfs-local | hdfs-connector | true |
    +-------------+------------------------+---------+
    
    sqoop:000> show link --all
    2 link(s) to show:
    link with name mysql-local (Enabled: true, Created by hongmeng.wang at 3/1/18 10:56 AM, Updated by hongmeng.wang at 3/1/18 12:51 PM)
    Using Connector generic-jdbc-connector with name {1}
    Database connection
    Driver class: com.mysql.jdbc.Driver
    Connection String: jdbc:mysql://localhost:3306
    Username: root
    Password:
    Fetch Size: 100
    Connection Properties:
    protocol = tcp
    useUnicode = true
    characterEncoding = utf-8
    autoReconnect = true
    SQL Dialect
    Identifier enclose: (blank, if use default, will get error)
    link with name hdfs-local (Enabled: true, Created by hongmeng.wang at 3/1/18 10:58 AM, Updated by hongmeng.wang at 3/1/18 12:54 PM)
    Using Connector hdfs-connector with name {1}
    HDFS cluster
    URI: hdfs://localhost:9000
    Conf directory: /usr/local/Cellar/hadoop/2.8.2/libexec/etc/hadoop
    Additional configs::
    

    9.create job & show job

    sqoop:000> create job -f "mysql-local" -t "hdfs-local"
    
    sqoop:000> show job
    +----+----------------------+--------------------------------------+-----------------------------+---------+
    | Id | Name | From Connector | To Connector | Enabled |
    +----+----------------------+--------------------------------------+-----------------------------+---------+
    | 1 | mysql-2-hdfs-t1 | mysql-local (generic-jdbc-connector) | hdfs-local (hdfs-connector) | true |
    +----+----------------------+--------------------------------------+-----------------------------+---------+
    
    sqoop:000> show job --all
    1 job(s) to show:
    Job with name mysql-2-hdfs-segment (Enabled: true, Created by hongmeng.wang at 3/1/18 11:06 AM, Updated by hongmeng.wang at 3/1/18 11:39 AM)
    Throttling resources
    Extractors:
    Loaders:
    Classpath configuration
    Extra mapper jars:
    From link: mysql-local
    Database source
    Schema name: test
    Table name: t1
    SQL statement:
    Column names:
    Partition column: id
    Partition column nullable:
    Boundary query:
    Incremental read
    Check column:
    Last value:
    To link: hdfs-local
    Target configuration
    Override null value: true
    Null value:
    File format: TEXT_FILE
    Compression codec: NONE
    Custom codec:
    Output directory: /sqoop/mysql/test
    Append mode:
    

    10.start job & check job’s status

    sqoop:000> start job -name mysql-2-hdfs-segment
    Submission details
    Job Name: mysql-2-hdfs-segment
    Server URL: http://localhost:12000/sqoop/
    Created by: sqoop2
    Creation date: 2018-03-01 13:53:37 JST
    Lastly updated by: sqoop2
    External ID: job_1519869491258_0001
    http://localhost:8088/proxy/application_1519869491258_0001/
    2018-03-01 13:53:37 JST: BOOTING - Progress is not available
    
    sqoop:000> status job -n mysql-2-hdfs-segment
    Submission details
    Job Name: mysql-2-hdfs-segment
    Server URL: http://localhost:12000/sqoop/
    Created by: sqoop2
    Creation date: 2018-03-01 14:01:54 JST
    Lastly updated by: sqoop2
    External ID: job_1519869491258_0002
    http://localhost:8088/proxy/application_1519869491258_0002/
    2018-03-01 14:03:31 JST: BOOTING - 0.00 %
    

    Issues

    1.modify “org.apache.sqoop.submission.engine.mapreduce.configuration.directory=”directory of hadoop configuration” in conf/sqoop.properties if you got below error when executing bin/sqoop2-tool verify

    Exception in thread "main" java.lang.RuntimeException: Failure in server initialization
    at org.apache.sqoop.core.SqoopServer.initialize(SqoopServer.java:68)
    at org.apache.sqoop.server.SqoopJettyServer.<init>(SqoopJettyServer.java:67)
    at org.apache.sqoop.server.SqoopJettyServer.main(SqoopJettyServer.java:177)
    Caused by: org.apache.sqoop.common.SqoopException: MAPREDUCE_0002:Failure on submission engine initialization - Invalid Hadoop configuration directory (not a directory or permission issues): /etc/hadoop/conf/
    at org.apache.sqoop.submission.mapreduce.MapreduceSubmissionEngine.initialize(MapreduceSubmissionEngine.java:97)
    at org.apache.sqoop.driver.JobManager.initialize(JobManager.java:257)
    at org.apache.sqoop.core.SqoopServer.initialize(SqoopServer.java:64)
    ... 2 more
    

    2.check $CLASSPATH and $HADOOP_CLASSPATH, maybe some jars conflict if got below error:

    Caused by: java.lang.SecurityException: sealing violation: package org.apache.derby.impl.services.locks is sealed
    at java.net.URLClassLoader.getAndVerifyPackage(URLClassLoader.java:399)
    at java.net.URLClassLoader.definePackageInternal(URLClassLoader.java:419)
    at java.net.URLClassLoader.defineClass(URLClassLoader.java:451)
    at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at java.lang.ClassLoader.defineClass1(Native Method)
    at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
    at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
    at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
    at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:264)
    at org.apache.derby.impl.services.monitor.BaseMonitor.getImplementations(Unknown Source)
    at org.apache.derby.impl.services.monitor.BaseMonitor.getDefaultImplementations(Unknown Source)
    at org.apache.derby.impl.services.monitor.BaseMonitor.runWithState(Unknown Source)
    at org.apache.derby.iampl.services.monitor.FileMonitor.<init>(Unknown Source)
    at org.apache.derby.iapi.services.monitor.Monitor.startMonitor(Unknown Source)
    at org.apache.derby.iapi.jdbc.JDBCBoot.boot(Unknown Source)
    at org.apache.derby.jdbc.EmbeddedDriver.boot(Unknown Source)
    at org.apache.derby.jdbc.EmbeddedDriver.<clinit>(Unknown Source)
    ... 11 more
    
     
  • Wang 21:33 on 2018-03-11 Permalink | Reply
    Tags: , , , , , Sqoop   

    [Sqoop1] Interact MySQL with HDFS/Hive/HBase 

    install sqoop1 on mac

    brew install sqoop
    

    #if you have set env profiles, uncomment profiles in conf/sqoop-env.sh

    1.MySQL -> HDFS

    1.1.import table

    sqoop import --connect jdbc:mysql://localhost/test --direct --username root --P --table t1 --warehouse-dir /mysql/test --fields-terminated-by ','
    

    1.2.import schema

    sqoop import-all-tables --connect jdbc:mysql://localhost/test --direct --username root -P --warehouse-dir /mysql/test --fields-terminated-by ','
    

    2.MySQL -> Hive

    2.1.import definition

    sqoop create-hive-table --connect jdbc:mysql://localhost/test --table t1 --username root --P --hive-database test
    

    2.2.import table

    sqoop import --connect jdbc:mysql://localhost/test --username root --P --table t1 --hive-import --hive-database test --hive-table t1 --fields-terminated-by ','
    

    2.3.import schema

    sqoop import-all-tables --connect jdbc:mysql://localhost/test --username root --P --hive-import --hive-database test --fields-terminated-by ','
    

    3.MySQL -> HBase

    3.1.definition

    sqoop import --connect jdbc:mysql://localhost/test --username root --P --table t1
    

    3.2.import table, need create table in hbase first

    sqoop import --connect jdbc:mysql://localhost/test --username root --P --table t1 --hbase-bulkload --hbase-table test.t1 --column-family basic --fields-terminated-by ','
    

    3.3.import table without creating table in hbase, but pay attention to hbase/sqoop version

    sqoop import --connect jdbc:mysql://localhost/test --username root --P --table t1 --hbase-bulkload --hbase-create-table --hbase-table test.t1 --column-family basic --fields-terminated-by ','
    

    4.HDFS/Hive/HBase -> MySQL

    sqoop export --connect jdbc:mysql://localhost/test --username root --P --table t1 --export-dir /user/hive/warehouse/test.db/t1 --fields-terminated-by ','
    
     
c
Compose new post
j
Next post/Next comment
k
Previous post/Previous comment
r
Reply
e
Edit
o
Show/Hide comments
t
Go to top
l
Go to login
h
Show/Hide help
shift + esc
Cancel
%d bloggers like this: