hive on spark 测试

准备工作: 建库:

create database db2;

建表:

sql语句:

create table tb1(id int , name string);

测试1:插入单条数据

insert into tb1(id, name) values(10, 'linjb');

执行结果:1秒

Status: Running (Hive on Spark job[1])
Job Progress Format
CurrentTime StageId_StageAttemptId: SucceededTasksCount(+RunningTasksCount-FailedTasksCount)/TotalTasksCount [StageCost]
--------------------------------------------------------------------------------------
          STAGES   ATTEMPT        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED
--------------------------------------------------------------------------------------
Stage-1 ........         0      FINISHED      1          1        0        0       0
--------------------------------------------------------------------------------------
STAGES: 01/01    [==========================>>] 100%  ELAPSED TIME: 1.00 s
--------------------------------------------------------------------------------------
Status: Finished successfully in 1.00 seconds
Loading data to table db2.tb1

测试2:修改表结构

alter table tb1 change id userid int ;

不经过spark

测试3:从物理机文件导入数据

data.txt

201,dd,3,1,guangzhou
123,aa,2,0,hangzhou
245,bb,3,1,beijing
789,cc,2,0,shanghai
201,dd,3,1,guangzhou
123,aa,2,0,hangzhou
245,bb,3,1,beijing
789,cc,2,0,shanghai
201,dd,3,1,guangzhou
---建表
create table tb4(id int, name string, age int, tel string, city string) row format delimited fields terminated by ',' stored as textfile;
---导入数据
load data local inpath '/root/data.txt' into table tb2;
---从hdfs导入
load data inpath '/user/hive/warehouse/db2.db/data.txt' into table tb2;

文件大小:1.9G 执行耗时:11秒

测试4:统计查询

select count(*) from tb4;

使用mr引擎:无法执行出结果,等待507s后手动取消任务, 使用spark引擎:35秒执行结束

总结:使用spark引擎替换mr引擎可以很大提升sql语句执行效率