spark on hive 踩坑
搭建hadoop环境
掠过
搭建hive环境
掠过
搭建spark
- 注意版本兼容
在pom文件中查看
- 下载
spark-2.0.0-bin-hadoop2.4-without-hive
版本,without-hive必须
我的版本 hive:2.3.6 spark :spark-2.0.0-bin-hadoop2.4-without-hive hadoop:2.7
-
复制jar包
- cp scala-library-*.jar /hive_home/lib/
- cp spark-core_*.jar /hive_home/lib/
- cp spark-network-common_*.jar /hive_home/lib/
- chill-java chill jackson-module-paranamer jackson-module-scala jersey-container-servlet-core
- jersey-server json4s-ast kryo-shaded minlog scala-xml spark-launcher
- spark-network-shuffle spark-unsafe xbean-asm5-shaded
从spark的jars文件夹中复制过去
-
配置hive-site.xml
<property> <name>hive.enable.spark.execution.engine</name> <value>true</value> </property> <property> <name>spark.master</name> <--! <value>spark://localhost:7077</value> --> <value>local</value> <description/> </property> <property> <name>hive.execution.engine</name> <value>spark</value> <description> Expects one of [mr, tez, spark]. Chooses execution engine. Options are: mr (Map reduce, default), tez, spark. While MR remains the default engine for historical reasons, it is itself a historical engine and is deprecated in Hive 2 line. It may be removed without further warning. </description>
hadoop的坑
如果Datanode
存活数量少于1个,则无法提交job,原因,hdfs文件系统异常(可能是执行了多次初始化,或者断点,网络变动等原因)
解决方法:
找到hdfs-site.xml
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/hadoop/data/dfs/datanode</value>
</property>
将/hadoop/data
下所有内容清空,然后重新执行hadoop namenode -format
初始化文件系统