[Sqoop2] Notebook
Recently I tested sqoop2 which has many new features compared to sqoop1, about the comparision, I think you could check here and stackoverflow, I will introduce about the operation manual.
1.install
wget http://ftp.jaist.ac.jp/pub/apache/sqoop/1.99.7/sqoop-1.99.7-bin-hadoop200.tar.gz
tar -vzxf sqoop-1.99.7-bin-hadoop200.tar.gz && cd sqoop-1.99.7-bin-hadoop200
2.replace @LOGDIR@/@BASEDIR@ in sqoop.properties
3.download mysql driver into server/lib
4.configure proxy user in core-site.xml
<property>
<name>hadoop.proxyuser.sqoop2.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.sqoop2.groups</name>
<value>*</value>
</property>
5.verify & start sqoop2 server
bin/sqoop2-tool verify
bin/sqoop2-server start
6.start client & test
bin/sqoop2-shell
7.show the connectors
sqoop:000> show connector
+------------------------+---------+------------------------------------------------------------+----------------------+
| Name | Version | Class | Supported Directions |
+------------------------+---------+------------------------------------------------------------+----------------------+
| generic-jdbc-connector | 1.99.7 | org.apache.sqoop.connector.jdbc.GenericJdbcConnector | FROM/TO |
| kite-connector | 1.99.7 | org.apache.sqoop.connector.kite.KiteConnector | FROM/TO |
| oracle-jdbc-connector | 1.99.7 | org.apache.sqoop.connector.jdbc.oracle.OracleJdbcConnector | FROM/TO |
| ftp-connector | 1.99.7 | org.apache.sqoop.connector.ftp.FtpConnector | TO |
| hdfs-connector | 1.99.7 | org.apache.sqoop.connector.hdfs.HdfsConnector | FROM/TO |
| kafka-connector | 1.99.7 | org.apache.sqoop.connector.kafka.KafkaConnector | TO |
| sftp-connector | 1.99.7 | org.apache.sqoop.connector.sftp.SftpConnector | TO |
+------------------------+---------+------------------------------------------------------------+----------------------+
8.create links & show links
sqoop:000> create link -connector generic-jdbc-connector
sqoop:000> create link -connector hdfs-connector
sqoop:000> show link
+-------------+------------------------+---------+
| Name | Connector Name | Enabled |
+-------------+------------------------+---------+
| mysql-local | generic-jdbc-connector | true |
| hdfs-local | hdfs-connector | true |
+-------------+------------------------+---------+
sqoop:000> show link --all
2 link(s) to show:
link with name mysql-local (Enabled: true, Created by hongmeng.wang at 3/1/18 10:56 AM, Updated by hongmeng.wang at 3/1/18 12:51 PM)
Using Connector generic-jdbc-connector with name {1}
Database connection
Driver class: com.mysql.jdbc.Driver
Connection String: jdbc:mysql://localhost:3306
Username: root
Password:
Fetch Size: 100
Connection Properties:
protocol = tcp
useUnicode = true
characterEncoding = utf-8
autoReconnect = true
SQL Dialect
Identifier enclose: (blank, if use default, will get error)
link with name hdfs-local (Enabled: true, Created by hongmeng.wang at 3/1/18 10:58 AM, Updated by hongmeng.wang at 3/1/18 12:54 PM)
Using Connector hdfs-connector with name {1}
HDFS cluster
URI: hdfs://localhost:9000
Conf directory: /usr/local/Cellar/hadoop/2.8.2/libexec/etc/hadoop
Additional configs::
9.create job & show job
sqoop:000> create job -f "mysql-local" -t "hdfs-local"
sqoop:000> show job
+----+----------------------+--------------------------------------+-----------------------------+---------+
| Id | Name | From Connector | To Connector | Enabled |
+----+----------------------+--------------------------------------+-----------------------------+---------+
| 1 | mysql-2-hdfs-t1 | mysql-local (generic-jdbc-connector) | hdfs-local (hdfs-connector) | true |
+----+----------------------+--------------------------------------+-----------------------------+---------+
sqoop:000> show job --all
1 job(s) to show:
Job with name mysql-2-hdfs-segment (Enabled: true, Created by hongmeng.wang at 3/1/18 11:06 AM, Updated by hongmeng.wang at 3/1/18 11:39 AM)
Throttling resources
Extractors:
Loaders:
Classpath configuration
Extra mapper jars:
From link: mysql-local
Database source
Schema name: test
Table name: t1
SQL statement:
Column names:
Partition column: id
Partition column nullable:
Boundary query:
Incremental read
Check column:
Last value:
To link: hdfs-local
Target configuration
Override null value: true
Null value:
File format: TEXT_FILE
Compression codec: NONE
Custom codec:
Output directory: /sqoop/mysql/test
Append mode:
10.start job & check job’s status
sqoop:000> start job -name mysql-2-hdfs-segment
Submission details
Job Name: mysql-2-hdfs-segment
Server URL: http://localhost:12000/sqoop/
Created by: sqoop2
Creation date: 2018-03-01 13:53:37 JST
Lastly updated by: sqoop2
External ID: job_1519869491258_0001
http://localhost:8088/proxy/application_1519869491258_0001/
2018-03-01 13:53:37 JST: BOOTING - Progress is not available
sqoop:000> status job -n mysql-2-hdfs-segment
Submission details
Job Name: mysql-2-hdfs-segment
Server URL: http://localhost:12000/sqoop/
Created by: sqoop2
Creation date: 2018-03-01 14:01:54 JST
Lastly updated by: sqoop2
External ID: job_1519869491258_0002
http://localhost:8088/proxy/application_1519869491258_0002/
2018-03-01 14:03:31 JST: BOOTING - 0.00 %
Issues
1.modify “org.apache.sqoop.submission.engine.mapreduce.configuration.directory=”directory of hadoop configuration” in conf/sqoop.properties if you got below error when executing bin/sqoop2-tool verify
Exception in thread "main" java.lang.RuntimeException: Failure in server initialization
at org.apache.sqoop.core.SqoopServer.initialize(SqoopServer.java:68)
at org.apache.sqoop.server.SqoopJettyServer.<init>(SqoopJettyServer.java:67)
at org.apache.sqoop.server.SqoopJettyServer.main(SqoopJettyServer.java:177)
Caused by: org.apache.sqoop.common.SqoopException: MAPREDUCE_0002:Failure on submission engine initialization - Invalid Hadoop configuration directory (not a directory or permission issues): /etc/hadoop/conf/
at org.apache.sqoop.submission.mapreduce.MapreduceSubmissionEngine.initialize(MapreduceSubmissionEngine.java:97)
at org.apache.sqoop.driver.JobManager.initialize(JobManager.java:257)
at org.apache.sqoop.core.SqoopServer.initialize(SqoopServer.java:64)
... 2 more
2.check $CLASSPATH and $HADOOP_CLASSPATH, maybe some jars conflict if got below error:
Caused by: java.lang.SecurityException: sealing violation: package org.apache.derby.impl.services.locks is sealed
at java.net.URLClassLoader.getAndVerifyPackage(URLClassLoader.java:399)
at java.net.URLClassLoader.definePackageInternal(URLClassLoader.java:419)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:451)
at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at org.apache.derby.impl.services.monitor.BaseMonitor.getImplementations(Unknown Source)
at org.apache.derby.impl.services.monitor.BaseMonitor.getDefaultImplementations(Unknown Source)
at org.apache.derby.impl.services.monitor.BaseMonitor.runWithState(Unknown Source)
at org.apache.derby.iampl.services.monitor.FileMonitor.<init>(Unknown Source)
at org.apache.derby.iapi.services.monitor.Monitor.startMonitor(Unknown Source)
at org.apache.derby.iapi.jdbc.JDBCBoot.boot(Unknown Source)
at org.apache.derby.jdbc.EmbeddedDriver.boot(Unknown Source)
at org.apache.derby.jdbc.EmbeddedDriver.<clinit>(Unknown Source)
... 11 more
Reply