hive on spark 测试
准备工作: 建库:
create database db2;
建表:
sql语句:
create table tb1(id int , name string);
测试1:插入单条数据
insert into tb1(id, name) values(10, 'linjb');
执行结果:1秒
Status: Running (Hive on Spark job[1])
Job Progress Format
CurrentTime StageId_StageAttemptId: SucceededTasksCount(+RunningTasksCount-FailedTasksCount)/TotalTasksCount [StageCost]
--------------------------------------------------------------------------------------
STAGES ATTEMPT STATUS TOTAL COMPLETED RUNNING PENDING FAILED
--------------------------------------------------------------------------------------
Stage-1 ........ 0 FINISHED 1 1 0 0 0
--------------------------------------------------------------------------------------
STAGES: 01/01 [==========================>>] 100% ELAPSED TIME: 1.00 s
--------------------------------------------------------------------------------------
Status: Finished successfully in 1.00 seconds
Loading data to table db2.tb1
测试2:修改表结构
alter table tb1 change id userid int ;
不经过spark
测试3:从物理机文件导入数据
data.txt
201,dd,3,1,guangzhou
123,aa,2,0,hangzhou
245,bb,3,1,beijing
789,cc,2,0,shanghai
201,dd,3,1,guangzhou
123,aa,2,0,hangzhou
245,bb,3,1,beijing
789,cc,2,0,shanghai
201,dd,3,1,guangzhou
---建表
create table tb4(id int, name string, age int, tel string, city string) row format delimited fields terminated by ',' stored as textfile;
---导入数据
load data local inpath '/root/data.txt' into table tb2;
---从hdfs导入
load data inpath '/user/hive/warehouse/db2.db/data.txt' into table tb2;
文件大小:1.9G 执行耗时:11秒
测试4:统计查询
select count(*) from tb4;
使用mr引擎:无法执行出结果,等待507s后手动取消任务, 使用spark引擎:35秒执行结束