部署测试环境
Hive环境部署
Hive 部署在本地dockersudo docker run -d --name hadoop-master -P -h hadoop-master teradatalabs/cdh5-hive
Beeline连接Hive进行测试
beeline连接成功
1
2
3
4
5
6
7
8
9
10
11# hive --service beeline
Beeline version 1.1.0-cdh5.9.1 by Apache Hive
beeline> !connect jdbc:hive2://localhost:10000
scan complete in 3ms
Connecting to jdbc:hive2://localhost:10000
Enter username for jdbc:hive2://localhost:10000:
Enter password for jdbc:hive2://localhost:10000:
Connected to: Apache Hive (version 1.1.0-cdh5.9.1)
Driver: Hive JDBC (version 1.1.0-cdh5.9.1)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://localhost:10000>连接成功后,操作ES数据需要添加驱动
驱动下载elasticsearch-hadoop-6.3.2.jar1
0: jdbc:hive2://localhost:10000> ADD JAR /home/iteblog/elasticsearch-hadoop-6.3.2.jar;
ES环境部署
ES 部署在阿里云主机 47.106.151.207
1 | # ES |
ES环境部署踩坑指南
单点部署配置文件 elasticsearch.yml
1
2
3
4
5
6
7
8cluster.name: "docker-cluster"
network.host: 0.0.0.0
http.cors.enabled: true
http.cors.allow-origin: "*"
# minimum_master_nodes need to be explicitly set when bound on a public IP
# set to 1 to allow single node clusters
# Details: https://github.com/elastic/elasticsearch/pull/17288
discovery.zen.minimum_master_nodes: 1ES-Head 无法新建索引
通过网络查看,发现是{"error":"Content-Type header [application/x-www-form-urlencoded] is not supported","status":406}
详细解决方法
HIVE 读取 ES 中的数据
在 ES 中存入数据
先参考ES 快速上手文档。
1. 创建索引
创建ElasticSearch中的index名为iteblog,type为user;user有regtime、uid、mobile以及username四个属性。
PUT http://47.106.151.207:9200/iteblog
1 | { |
2. ES 插入测试数据
POST http://47.106.151.207:9200/iteblog/user
1 | { |
3. 创建HIVE表
建表语句如下
1
2
3
4
5
6
7
8
9
10
11
12CREATE EXTERNAL TABLE `user`(
register_time string,
user_id int, mobile string,
username string)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES(
'es.resource' = 'iteblog/user',
'es.nodes' = '47.106.151.207',
'es.port' = '9200',
'es.nodes.wan.only' = 'true',
'es.mapping.names' = 'register_time:regtime,user_id:uid'
);测试是否能读取到ES的数据
1
2
3
4
5
6
7
8
9
10
11
12
13
14
150: jdbc:hive2://localhost:10000> select * from user;
INFO : Compiling command(queryId=hive_20190412034747_e895d44d-ee7c-4363-bbeb-41b287734595): select * from user
INFO : Semantic Analysis Completed
INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:user.register_time, type:string, comment:null), FieldSchema(name:user.user_id, type:int, comment:null), FieldSchema(name:user.mobile, type:string, comment:null), FieldSchema(name:user.username, type:string, comment:null)], properties:null)
INFO : Completed compiling command(queryId=hive_20190412034747_e895d44d-ee7c-4363-bbeb-41b287734595); Time taken: 0.602 seconds
INFO : Concurrency mode is disabled, not creating a lock manager
INFO : Executing command(queryId=hive_20190412034747_e895d44d-ee7c-4363-bbeb-41b287734595): select * from user
INFO : Completed executing command(queryId=hive_20190412034747_e895d44d-ee7c-4363-bbeb-41b287734595); Time taken: 0.003 seconds
INFO : OK
+---------------------+---------------+--------------+----------------+--+
| user.register_time | user.user_id | user.mobile | user.username |
+---------------------+---------------+--------------+----------------+--+
| 20190412 | 1 | 13310667877 | vin |
+---------------------+---------------+--------------+----------------+--+
1 row selected (1.987 seconds)
成功读取ES内容
HIVE 数据写入 ES
试着在Hive构造数据并插入之前新建的外部表user,看看ES是否能同步到数据
写入ES的外部表
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
280: jdbc:hive2://localhost:10000> insert into table user values('20190411', 2, '13356583817', 'xiaoming')
INFO : Compiling command(queryId=hive_20190412045959_9a8ef2eb-da0c-42d8-9a64-72ebe7dd9701): insert into table user values('20190411', 2, '13356583817', 'xiaoming')
INFO : Semantic Analysis Completed
INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:_col0, type:string, comment:null), FieldSchema(name:_col1, type:int, comment:null), FieldSchema(name:_col2, type:string, comment:null), FieldSchema(name:_col3, type:string, comment:null)], properties:null)
INFO : Completed compiling command(queryId=hive_20190412045959_9a8ef2eb-da0c-42d8-9a64-72ebe7dd9701); Time taken: 0.207 seconds
INFO : Concurrency mode is disabled, not creating a lock manager
INFO : Executing command(queryId=hive_20190412045959_9a8ef2eb-da0c-42d8-9a64-72ebe7dd9701): insert into table user values('20190411', 2, '13356583817', 'xiaoming')
INFO : Query ID = hive_20190412045959_9a8ef2eb-da0c-42d8-9a64-72ebe7dd9701
INFO : Total jobs = 1
INFO : Launching Job 1 out of 1
INFO : Starting task [Stage-0:MAPRED] in serial mode
INFO : Number of reduce tasks is set to 0 since there's no reduce operator
INFO : number of splits:1
INFO : Submitting tokens for job: job_1554983306126_0002
INFO : The url to track the job: http://hadoop-master:8088/proxy/application_1554983306126_0002/
INFO : Starting Job = job_1554983306126_0002, Tracking URL = http://hadoop-master:8088/proxy/application_1554983306126_0002/
INFO : Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1554983306126_0002
INFO : Hadoop job information for Stage-0: number of mappers: 1; number of reducers: 0
INFO : 2019-04-12 04:59:26,997 Stage-0 map = 0%, reduce = 0%
INFO : 2019-04-12 04:59:35,953 Stage-0 map = 100%, reduce = 0%, Cumulative CPU 2.38 sec
INFO : MapReduce Total cumulative CPU time: 2 seconds 380 msec
INFO : Ended Job = job_1554983306126_0002
INFO : MapReduce Jobs Launched:
INFO : Stage-Stage-0: Map: 1 Cumulative CPU: 2.38 sec HDFS Read: 3950 HDFS Write: 0 SUCCESS
INFO : Total MapReduce CPU Time Spent: 2 seconds 380 msec
INFO : Completed executing command(queryId=hive_20190412045959_9a8ef2eb-da0c-42d8-9a64-72ebe7dd9701); Time taken: 21.019 seconds
INFO : OK
No rows affected (21.239 seconds)确认ES上是否成功新增数据
GEThttp://47.106.151.207:9200/_search
查看后发现新增数据成功
进一步测试
新增数据写入测试
更新数据写入测试
删除数据写入测试
参考
[1]全面docker!使用hue连接hive
[2]Docker 简单部署 ElasticSearch
[3]docker简易搭建ElasticSearch集群
[4]通过Hive将数据写入到ElasticSearch
[5]ES 快速上手
[6]Hive如何添加第三方JAR