[Performance Test] MR vs Tez
I tested performance about MR and Tez on my laptop, it’s single server, so it’s not very accurate.
I create two tables to do the test which contains the datasets I downloaded from GBIF.
gbif_0004998: 327,316 rows
gbif_0004991: 6,914,665 rows
1.test gbif_0004998
create by MR
hive> set hive.execution.engine=mr;
Hive-on-MR is deprecated in Hive 2 and may no be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive> CREATE TABLE gbif.gbif_0004998
> STORED AS ORC
> TBLPROPERTIES("orc.compress"="snappy")
> AS SELECT * FROM gbif.gbif_0004998_ori;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = wanghongmeng_20180224190744_f3fb257a-829e-40c2-974b-5abeb3d88693
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1519462946874_0007, Tracking URL = http://localhost:8088/proxy/application_1519462946874_0007/
Kill Command = /usr/local/Cellar/hadoop/2.8.2/bin/hadoop job -kill job_1519462946874_0007
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2018-02-24 19:07:53,043 Stage-1 map = 0%, reduce = 0%
2018-02-24 19:08:10,204 Stage-1 map = 100%, reduce = 0%
Ended Job = job_1519462946874_0007
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://localhost:9000/user/hive/warehouse/gbif.db/.hive-staging_hive_2018-02-24_19-07-44_762_5371659277950436672-1/-ext-10002
Moving data to directory hdfs://localhost:9000/user/hive/warehouse/gbif.db/gbif_0004998
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 HDFS Read: 130582415 HDFS Write: 13510429 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
Time taken: 28.28 seconds
create by Tez
hive> set hive.execution.engine=tez;
hive> CREATE TABLE gbif.gbif_0004998
> STORED AS ORC
> TBLPROPERTIES("orc.compress"="snappy")
> AS SELECT * FROM gbif.gbif_0004998_ori;
Query ID = wanghongmeng_20180224193755_bd7fda12-bfd7-4abf-9c3e-0f90b9b58607
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id application_1519462946874_0013)
----------------------------------------------------------------------------------------------
VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
----------------------------------------------------------------------------------------------
Map 1 .......... container SUCCEEDED 1 1 0 0 0 0
----------------------------------------------------------------------------------------------
VERTICES: 01/01 [==========================>>] 100% ELAPSED TIME: 15.65 s
----------------------------------------------------------------------------------------------
Moving data to directory hdfs://localhost:9000/user/hive/warehouse/gbif.db/gbif_0004998
OK
gbif_0004998_ori.gbifid gbif_0004998_ori.datasetkey gbif_0004998_ori.occurrenceid gbif_0004998_ori.kingdom gbif_0004998_ori.phylum gbif_0004998_ori.class gbif_0004998_ori.orders gbif_0004998_ori.family gbif_0004998_ori.genus gbif_0004998_ori.species gbif_0004998_ori.infraspecificepithet gbif_0004998_ori.taxonrank gbif_0004998_ori.scientificname gbif_0004998_ori.countrycode gbif_0004998_ori.locality gbif_0004998_ori.publishingorgkey gbif_0004998_ori.decimallatitude gbif_0004998_ori.decimallongitude gbif_0004998_ori.coordinateuncertaintyinmeters gbif_0004998_ori.coordinateprecision gbif_0004998_ori.elevation gbif_0004998_ori.elevationaccuracy gbif_0004998_ori.depth gbif_0004998_ori.depthaccuracy gbif_0004998_ori.eventdate gbif_0004998_ori.day gbif_0004998_ori.month gbif_0004998_ori.year gbif_0004998_ori.taxonkey gbif_0004998_ori.specieskey gbif_0004998_ori.basisofrecord gbif_0004998_ori.institutioncode gbif_0004998_ori.collectioncode gbif_0004998_ori.catalognumber gbif_0004998_ori.recordnumber gbif_0004998_ori.identifiedby gbif_0004998_ori.license gbif_0004998_ori.rightsholder gbif_0004998_ori.recordedby gbif_0004998_ori.typestatus gbif_0004998_ori.establishmentmeans gbif_0004998_ori.lastinterpreted gbif_0004998_ori.mediatype gbif_0004998_ori.issue
Time taken: 16.631 seconds
query by MR
hive> set hive.execution.engine=mr;
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive> select count(*) as total from gbif_0004998 where mediatype = 'STILLIMAGE';
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = wanghongmeng_20180224194412_0c9a74e1-b01e-4b92-8db4-f31522d44bd9
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1519462946874_0016, Tracking URL = http://localhost:8088/proxy/application_1519462946874_0016/
Kill Command = /usr/local/Cellar/hadoop/2.8.2/bin/hadoop job -kill job_1519462946874_0016
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2018-02-24 19:44:24,034 Stage-1 map = 0%, reduce = 0%
2018-02-24 19:44:33,661 Stage-1 map = 100%, reduce = 0%
2018-02-24 19:44:40,063 Stage-1 map = 100%, reduce = 100%
Ended Job = job_1519462946874_0016
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 HDFS Read: 30539 HDFS Write: 105 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
total
28918
Time taken: 28.529 seconds, Fetched: 1 row(s)
query by Tez
hive> set hive.execution.engine=tez;
hive> select count(*) from gbif_0004998 where mediatype = 'STILLIMAGE';
Query ID = wanghongmeng_20180224193902_f03b627e-e091-4632-87e5-0d8af6484032
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id application_1519462946874_0013)
----------------------------------------------------------------------------------------------
VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
----------------------------------------------------------------------------------------------
Map 1 .......... container SUCCEEDED 1 1 0 0 0 0
Reducer 2 ...... container SUCCEEDED 1 1 0 0 0 0
----------------------------------------------------------------------------------------------
VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 5.97 s
----------------------------------------------------------------------------------------------
OK
total
28918
Time taken: 6.438 seconds, Fetched: 1 row(s)
2.test gbif_0004991
create by MR
hive> set hive.execution.engine=mr;
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive> CREATE TABLE gbif.gbif_0004991
> STORED AS ORC
> TBLPROPERTIES("orc.compress"="snappy")
> AS SELECT * FROM gbif.gbif_0004991_ori;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = wanghongmeng_20180224191238_19301476-a77f-45fa-a405-05a8732a45e9
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1519462946874_0010, Tracking URL = http://localhost:8088/proxy/application_1519462946874_0010/
Kill Command = /usr/local/Cellar/hadoop/2.8.2/bin/hadoop job -kill job_1519462946874_0010
Hadoop job information for Stage-1: number of mappers: 14; number of reducers: 0
2018-02-24 19:16:32,473 Stage-1 map = 0%, reduce = 0%
2018-02-24 19:17:32,678 Stage-1 map = 0%, reduce = 0%
2018-02-24 19:17:40,248 Stage-1 map = 7%, reduce = 0%
2018-02-24 19:17:51,119 Stage-1 map = 11%, reduce = 0%
2018-02-24 19:17:52,207 Stage-1 map = 18%, reduce = 0%
2018-02-24 19:17:58,625 Stage-1 map = 21%, reduce = 0%
2018-02-24 19:18:13,859 Stage-1 map = 25%, reduce = 0%
2018-02-24 19:18:15,999 Stage-1 map = 32%, reduce = 0%
2018-02-24 19:18:30,537 Stage-1 map = 36%, reduce = 0%
2018-02-24 19:18:31,625 Stage-1 map = 39%, reduce = 0%
2018-02-24 19:18:32,759 Stage-1 map = 43%, reduce = 0%
2018-02-24 19:19:17,117 Stage-1 map = 46%, reduce = 0%
2018-02-24 19:19:19,250 Stage-1 map = 50%, reduce = 0%
2018-02-24 19:19:25,639 Stage-1 map = 54%, reduce = 0%
2018-02-24 19:19:28,825 Stage-1 map = 57%, reduce = 0%
2018-02-24 19:19:32,031 Stage-1 map = 61%, reduce = 0%
2018-02-24 19:19:33,101 Stage-1 map = 64%, reduce = 0%
2018-02-24 19:19:39,470 Stage-1 map = 68%, reduce = 0%
2018-02-24 19:19:42,677 Stage-1 map = 71%, reduce = 0%
2018-02-24 19:19:54,459 Stage-1 map = 75%, reduce = 0%
2018-02-24 19:19:58,723 Stage-1 map = 79%, reduce = 0%
2018-02-24 19:20:04,147 Stage-1 map = 82%, reduce = 0%
2018-02-24 19:20:06,277 Stage-1 map = 86%, reduce = 0%
2018-02-24 19:20:15,977 Stage-1 map = 93%, reduce = 0%
2018-02-24 19:20:20,269 Stage-1 map = 96%, reduce = 0%
2018-02-24 19:20:36,398 Stage-1 map = 100%, reduce = 0%
Ended Job = job_1519462946874_0010
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://localhost:9000/user/hive/warehouse/gbif.db/.hive-staging_hive_2018-02-24_19-12-38_616_5758586722663198282-1/-ext-10002
Moving data to directory hdfs://localhost:9000/user/hive/warehouse/gbif.db/gbif_0004991
MapReduce Jobs Launched:
Stage-Stage-1: Map: 14 HDFS Read: 3539512736 HDFS Write: 342789525 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
Time taken: 481.311 seconds
create via Tez
hive> set hive.execution.engine=tez;
hive> CREATE TABLE gbif.gbif_0004991
> STORED AS ORC
> TBLPROPERTIES("orc.compress"="snappy")
> AS SELECT * FROM gbif.gbif_0004991_ori;
Query ID = wanghongmeng_20180224192800_111872d9-059b-4a8a-9fd7-e3ea02af8898
Total jobs = 1
Launching Job 1 out of 1
Tez session was closed. Reopening...
Session re-established.
Status: Running (Executing on YARN cluster with App id application_1519462946874_0013)
----------------------------------------------------------------------------------------------
VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
----------------------------------------------------------------------------------------------
Map 1 .......... container SUCCEEDED 1 1 0 0 0 0
----------------------------------------------------------------------------------------------
VERTICES: 01/01 [==========================>>] 100% ELAPSED TIME: 241.12 s
----------------------------------------------------------------------------------------------
Moving data to directory hdfs://localhost:9000/user/hive/warehouse/gbif.db/gbif_0004991
OK
gbif_0004991_ori.gbifid gbif_0004991_ori.datasetkey gbif_0004991_ori.occurrenceid gbif_0004991_ori.kingdom gbif_0004991_ori.phylum gbif_0004991_ori.class gbif_0004991_ori.orders gbif_0004991_ori.family gbif_0004991_ori.genus gbif_0004991_ori.species gbif_0004991_ori.infraspecificepithet gbif_0004991_ori.taxonrank gbif_0004991_ori.scientificname gbif_0004991_ori.countrycode gbif_0004991_ori.locality gbif_0004991_ori.publishingorgkey gbif_0004991_ori.decimallatitude gbif_0004991_ori.decimallongitude gbif_0004991_ori.coordinateuncertaintyinmeters gbif_0004991_ori.coordinateprecision gbif_0004991_ori.elevation gbif_0004991_ori.elevationaccuracy gbif_0004991_ori.depth gbif_0004991_ori.depthaccuracy gbif_0004991_ori.eventdate gbif_0004991_ori.day gbif_0004991_ori.month gbif_0004991_ori.year gbif_0004991_ori.taxonkey gbif_0004991_ori.specieskey gbif_0004991_ori.basisofrecord gbif_0004991_ori.institutioncode gbif_0004991_ori.collectioncode gbif_0004991_ori.catalognumber gbif_0004991_ori.recordnumber gbif_0004991_ori.identifiedby gbif_0004991_ori.license gbif_0004991_ori.rightsholder gbif_0004991_ori.recordedby gbif_0004991_ori.typestatus gbif_0004991_ori.establishmentmeans gbif_0004991_ori.lastinterpreted gbif_0004991_ori.mediatype gbif_0004991_ori.issue
Time taken: 252.548 seconds
query via MR
hive> set hive.execution.engine=mr;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive> select count(*) from gbif_0004991 where mediatype = 'STILLIMAGE';
Query ID = wanghongmeng_20180224192630_b2934027-2423-4945-864b-6ce663e676fa
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1519462946874_0012, Tracking URL = http://localhost:8088/proxy/application_1519462946874_0012/
Kill Command = /usr/local/Cellar/hadoop/2.8.2/bin/hadoop job -kill job_1519462946874_0012
Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 1
2018-02-24 19:26:43,988 Stage-1 map = 0%, reduce = 0%
2018-02-24 19:27:00,086 Stage-1 map = 50%, reduce = 0%
2018-02-24 19:27:03,287 Stage-1 map = 74%, reduce = 0%
2018-02-24 19:27:05,422 Stage-1 map = 100%, reduce = 0%
2018-02-24 19:27:08,595 Stage-1 map = 100%, reduce = 100%
Ended Job = job_1519462946874_0012
MapReduce Jobs Launched:
Stage-Stage-1: Map: 2 Reduce: 1 HDFS Read: 602777 HDFS Write: 106 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
total
374998
Time taken: 38.903 seconds, Fetched: 1 row(s)
query via Tez
hive> set hive.execution.engine=tez;
hive> select count(*) from gbif_0004991 where mediatype = 'STILLIMAGE';
Query ID = wanghongmeng_20180224193241_f4edd363-fdb8-4461-b687-4b775e8719c0
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id application_1519462946874_0013)
----------------------------------------------------------------------------------------------
VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
----------------------------------------------------------------------------------------------
Map 1 .......... container SUCCEEDED 2 2 0 0 0 0
Reducer 2 ...... container SUCCEEDED 1 1 0 0 0 0
----------------------------------------------------------------------------------------------
VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 16.54 s
----------------------------------------------------------------------------------------------
OK
total
374998
Time taken: 17.258 seconds, Fetched: 1 row(s)
3.summary
Table |
Total Count |
Create Table |
Query | ||
gbif_0004998 |
327,316 | MR | 28.28s | MR | 28.529s |
Tez | 16.631s | Tez | 6.438s | ||
gbif_0004991 |
6,914,665 | MR | 481.311s | MR | 38.903s |
Tez | 252.548s | Tez | 17.258s |
Reply