Hadoop | Wang's Tech Blog

Tagged: Hadoop Toggle Comment Threads | Keyboard Shortcuts

Wang 20:34 on 2018-02-24 Permalink | Reply
Tags: BigData ( 37 ), Hadoop, Hive ( 15 ), Tez ( 6 )
Conflicting jars of Hadoop and Tez

After I installed Tez, it’s ok to run hive jobs via Tez, but when I changed engine to MR, I got below error:
```
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = wanghongmeng_20180224185414_623cf20b-77d4-4a09-a17d-41c72ed76ac3
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask. DEFAULT_MR_AM_ADMIN_USER_ENV
```
I can’t see any useful information from logs, after long time’s investigating, I found hadoop-mapreduce-client-common-2.7.0.jar/hadoop-mapreduce-client-core-2.7.0.jar under Tez library were conflicting with hadoop version, my installed hadoop version was 2.8.2, so I removed the two jars.

After doing this, I could run hive on MR successfully..😀

Like Loading...
Reply Cancel reply

Name

Email

Website

Notify me of new comments via email.
Notify me of new posts via email.
Δ

Wang 19:51 on 2018-02-24 Permalink | Reply
Tags: BigData ( 37 ), Hadoop, Hive ( 15 ), Tez ( 6 ), Tomcat

Replace MR with Tez on hive2

From hive2 Hive-on-MR is not recommended, you could see the warning information when running hive cli

Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.

So I installed Tez to replace MR to run jobs, below are installation steps.

1.install Tez

1.1.down Tez and unpackage

wget http://ftp.jaist.ac.jp/pub/apache/tez/0.9.0/apache-tez-0.9.0-src.tar.gz
tar -zvxf apache-tez-0.9.0-src.tar.gz && cd apache-tez-0.9.0-src

1.2.compile and build Tez jar, you need install protobuf/maven before compiling

mvn clean package -DskipTests=true -Dmaven.javadoc.skip=true

1.3.upload Tez to hdfs

hadoop fs -mkdir /apps
hadoop fs -copyFromLocal tez-dist/target/tez-0.9.0.tar.gz /apps/

1.4.create tez-site.xml under hadoop conf directory

cat <<'EOF' > $HADOOP_CONF_DIR/tez-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
    <property>
        <name>tez.lib.uris</name>
        <value>${fs.defaultFS}/apps/tez-0.9.0.tar.gz</value>
    </property>
    <property>
        <name>tez.history.logging.service.class</name>
        value>org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService</value>
    </property>
    <property>
        <name>tez.tez-ui.history-url.base</name>
        <value>http://localhost:8080/tez-ui/</value>
    </property>
</configuration>
EOF

1.5.append configurations to yarn-site.xml

<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
</property>
<property>
    <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
    <value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
    <name>yarn.timeline-service.enabled</name>
    <value>true</value>
</property>
<property>
    <name>yarn.resourcemanager.system-metrics-publisher.enabled</name>
    <value>true</value>
</property>
<property>
    <name>yarn.timeline-service.generic-application-history.enabled</name>
    <value>true</value>
</property>
<property>
    <name>yarn.timeline-service.http-cross-origin.enabled</name>
    <value>true</value>
</property>
<property>
    <name>yarn.timeline-service.hostname</name>
    <value>localhost</value>
</property>
<property>
    <name>yarn.resourcemanager.webapp.cross-origin.enabled</name>
    <value>true</value>
</property>
<property>  
    <name>yarn.resourcemanager.address</name>  
    <value>localhost:8032</value>  
</property>  
<property>  
    <name>yarn.resourcemanager.scheduler.address</name>  
    <value>localhost:8030</value>  
</property>  
<property>  
    <name>yarn.resourcemanager.resource-tracker.address</name>  
    <value>localhost:8031</value>  
</property>

1.6.append configuration to core-site.xml

<property>
    <name>fs.default.name</name>
    <value>hdfs://master:9000</value>
</property>
<property>
    <name>hadoop.tmp.dir</name>  
    <value>/data/hadoop/hdfs/tmp</value>
</property>
<property>
    <name>hadoop.http.filter.initializers</name>
    <value>org.apache.hadoop.security.HttpCrossOriginFilterInitializer</value>
</property>

1.7.unpackage tez-dist/target/tez-0.9.0-minimal.tar.gz

1.8.append env to /etc/profile

export TEZ_CONF_DIR="location of tez-site.xml"
export TEZ_JARS="location of unpackaged tez-0.9.0-minimal.tar.gz"
export HADOOP_CLASSPATH=${TEZ_CONF_DIR}:${TEZ_JARS}/*:${TEZ_JARS}/lib/*

1.9.start timelineserver

yarn-daemon.sh start timelineserver

1.10.configure tez ui, install tomcat, unpackage tez-ui/target/tez-ui-0.9.0.war into webapps, rename unpackaged directory to tez-ui

1.11.start tomcat, visit http://localhost:8080/tez-ui to test

2.test Tez

2.1.change job engine to Tez

hive> set hive.execution.engine=tez;

2.2.run job to test

hive> select count(*) from gbif_0004998;
Query ID = wanghongmeng_20180224180801_e5ddcf23-1e1a-4724-8156-1393807c2ac0
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id application_1519462946874_0003)

----------------------------------------------------------------------------------------------
VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED 
----------------------------------------------------------------------------------------------
Map 1 .......... container SUCCEEDED 1 1 0 0 0 0 
Reducer 2 ...... container SUCCEEDED 1 1 0 0 0 0 
----------------------------------------------------------------------------------------------
VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 9.87 s 
----------------------------------------------------------------------------------------------
OK
327316
Time taken: 23.876 seconds, Fetched: 1 row(s)

2.3.check result on tez ui

Wang 22:15 on 2018-02-07 Permalink | Reply
Tags: BigData ( 37 ), Hadoop, Hive ( 15 )

Use GBIF’s dataset to do analysis

GBIF, global biodiversity information facility which contains huge data, I think it’s good to do analysis with gbif’s data sample.

Please follow the web’s instruction to download the sample dataset.

After doing this, I imported the dataset into hive, below are the steps.

1.create hdfs path

hdfs dfs -mkdir -p /user/hive/gbif/0004998

2.upload dataset into hdfs’s directory which was created on step 1

hdfs dfs -copyFromLocal /Users/wanghongmeng/Desktop/0004998-180131172636756.csv /user/hive/gbif/0004998

3.create hive table and load dataset

CREATE EXTERNAL TABLE gbif_0004998_ori (
gbifid string,
datasetkey string,
occurrenceid string,
kingdom string,
...
...
establishmentmeans string,
lastinterpreted string,
mediatype string,
issue string)
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY 't'
STORED as TEXTFILE
LOCATION '/user/hive/gbif/0004998'
tblproperties ('skip.header.line.count'='1');

4.create new hive table by snappy compression, then drop origin table

CREATE TABLE gbif.gbif_0004998
STORED AS ORC
TBLPROPERTIES("orc.compress"="snappy")
AS SELECT * FROM gbif.gbif_0004998_ori;

drop table gbif.gbif_0004998_ori;

5.check hive table’s infomation

hive> desc formatted gbif_0004998;
OK
# col_name data_type comment 

gbifid string 
datasetkey string 
occurrenceid string 
kingdom string 
phylum string 
...
...
# Detailed Table Information 
Database: gbif 
Owner: wanghongmeng 
CreateTime: Wed Feb 7 21:28:25 JST 2018 
LastAccessTime: UNKNOWN 
Retention: 0 
Location: hdfs://localhost:9000/user/hive/warehouse/gbif.db/gbif_0004998 
Table Type: MANAGED_TABLE 
Table Parameters: 
COLUMN_STATS_ACCURATE {"BASIC_STATS":"true"}
numFiles 1 
numRows 327316 
orc.compress snappy 
rawDataSize 1319738112 
totalSize 13510344 
transient_lastDdlTime 1519457306 

# Storage Information 
SerDe Library: org.apache.hadoop.hive.ql.io.orc.OrcSerde 
InputFormat: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat 
OutputFormat: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat 
Compressed: No 
Num Buckets: -1 
Bucket Columns: [] 
Sort Columns: [] 
Storage Desc Params: 
serialization.format 1 
Time taken: 0.078 seconds, Fetched: 74 row(s)

6.check data

hive> select * from gbif.gbif_0004998 limit 5;
OK
1633594438 8130e5c6-f762-11e1-a439-00145eb45e9a KINGDOM incertae sedis EE Põhja-Kiviõli opencast mine 70488160-b003-11d8-a8af-b8a03c50a862 59.366475 26.8873 1000.0 2010-04-30T02:00Z 30 4 2010 0 FOSSIL_SPECIMEN Institute of Geology at TUT GIT 343-200 Toom CC_BY_NC_4_0 Toom 2018-02-02T20:24Z STILLIMAGE GEODETIC_DATUM_ASSUMED_WGS84;TAXON_MATCH_NONE
1633594440 8130e5c6-f762-11e1-a439-00145eb45e9a KINGDOM incertae sedis EE Neitla Quarry 70488160-b003-11d8-a8af-b8a03c50a862 59.102247 25.762486 10.0 2012-09-12T02:00Z 12 9 2012 0 FOSSIL_SPECIMEN Institute of Geology at TUT GIT 362-272 CC_BY_NC_4_0 Toom 2018-02-02T20:24Z STILLIMAGE GEODETIC_DATUM_ASSUMED_WGS84;TAXON_MATCH_NONE
1633594442 8130e5c6-f762-11e1-a439-00145eb45e9a KINGDOM incertae sedis EE Päri quarry 70488160-b003-11d8-a8af-b8a03c50a862 58.840459 24.042791 10.0 2014-05-23T02:00Z 23 5 2014 0 FOSSIL_SPECIMEN Institute of Geology at TUT GIT 340-303 Toom CC_BY_NC_4_0 Hints, O. 2018-02-02T20:24Z STILLIMAGE GEODETIC_DATUM_ASSUMED_WGS84;TAXON_MATCH_NONE
1633594445 8130e5c6-f762-11e1-a439-00145eb45e9a KINGDOM incertae sedis EE Saxby shore 70488160-b003-11d8-a8af-b8a03c50a862 59.027778 23.117222 10.0 2017-06-17T02:00Z 17 6 2017 0 FOSSIL_SPECIMEN Institute of Geology at TUT GIT 362-544 Toom CC_BY_NC_4_0 Toom 2018-02-02T20:24Z STILLIMAGE GEODETIC_DATUM_ASSUMED_WGS84;TAXON_MATCH_NONE
1633594446 8130e5c6-f762-11e1-a439-00145eb45e9a KINGDOM incertae sedis EE Saxby shore 70488160-b003-11d8-a8af-b8a03c50a862 59.027778 23.117222 10.0 2017-06-17T02:00Z 17 6 2017 0 FOSSIL_SPECIMEN Institute of Geology at TUT GIT 362-570 CC_BY_NC_4_0 Baranov 2018-02-02T20:24Z GEODETIC_DATUM_ASSUMED_WGS84;TAXON_MATCH_NONE
Time taken: 0.172 seconds, Fetched: 5 row(s)

Wang 20:53 on 2018-01-31 Permalink | Reply
Tags: Hadoop, Hive ( 15 ), MacOS

Hive on macOS

When I run hive, I got error as below:

Exception in thread "main" java.lang.ClassCastException: java.base/jdk.internal.loader.ClassLoaders$AppClassLoader cannot be cast to java.base/java.net.URLClassLoader
at org.apache.hadoop.hive.ql.session.SessionState.(SessionState.java:394)
at org.apache.hadoop.hive.ql.session.SessionState.(SessionState.java:370)
at org.apache.hadoop.hive.cli.CliSessionState.(CliSessionState.java:60)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:708)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:564)
at org.apache.hadoop.util.RunJar.run(RunJar.java:234)
at org.apache.hadoop.util.RunJar.main(RunJar.java:148)

I’m wired about the error that cast class in different jdk versions, I have set JAVA_HOME in profile, why I still got this error?

I tested java version, it’s jdk1.8

wanghongmeng:2.3.1 gizmo$ java -version
java version "1.8.0_151"
Java(TM) SE Runtime Environment (build 1.8.0_151-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.151-b12, mixed mode)

But when I checked jdk’s installed directory, I found /Library/Java/Home was linked to jdk9’s home, I never used jdk9, so I uninstalled jdk9, and linked /Library/Java/Home to jdk1.8’s home.

After this, problem solved.😀😀

Wang's Tech Blog

Recent Posts

Tags

Archives

My Girl

Tagged: Hadoop Toggle Comment Threads | Keyboard Shortcuts

Wang 20:34 on 2018-02-24 Permalink | Reply
Tags: BigData ( 37 ), Hadoop, Hive ( 15 ), Tez ( 6 )

Conflicting jars of Hadoop and Tez

Reply Cancel reply

Wang 19:51 on 2018-02-24 Permalink | Reply
Tags: BigData ( 37 ), Hadoop, Hive ( 15 ), Tez ( 6 ), Tomcat

Replace MR with Tez on hive2

Wang 22:15 on 2018-02-07 Permalink | Reply
Tags: BigData ( 37 ), Hadoop, Hive ( 15 )

Use GBIF’s dataset to do analysis

Wang 20:53 on 2018-01-31 Permalink | Reply
Tags: Hadoop, Hive ( 15 ), MacOS

Hive on macOS

Wang's Tech Blog

Recent Posts

Tags

Archives

My Girl

Tagged: Hadoop Toggle Comment Threads | Keyboard Shortcuts

Wang 20:34 on 2018-02-24 Permalink | Reply Tags: BigData ( 37 ), Hadoop, Hive ( 15 ), Tez ( 6 )

Conflicting jars of Hadoop and Tez

Reply Cancel reply

Wang 19:51 on 2018-02-24 Permalink | Reply Tags: BigData ( 37 ), Hadoop, Hive ( 15 ), Tez ( 6 ), Tomcat

Replace MR with Tez on hive2

Wang 22:15 on 2018-02-07 Permalink | Reply Tags: BigData ( 37 ), Hadoop, Hive ( 15 )

Use GBIF’s dataset to do analysis

Wang 20:53 on 2018-01-31 Permalink | Reply Tags: Hadoop, Hive ( 15 ), MacOS

Hive on macOS

Wang 20:34 on 2018-02-24 Permalink | Reply
Tags: BigData ( 37 ), Hadoop, Hive ( 15 ), Tez ( 6 )

Wang 19:51 on 2018-02-24 Permalink | Reply
Tags: BigData ( 37 ), Hadoop, Hive ( 15 ), Tez ( 6 ), Tomcat

Wang 22:15 on 2018-02-07 Permalink | Reply
Tags: BigData ( 37 ), Hadoop, Hive ( 15 )

Wang 20:53 on 2018-01-31 Permalink | Reply
Tags: Hadoop, Hive ( 15 ), MacOS