spark on hive 踩坑

搭建hadoop环境

掠过

搭建hive环境

掠过

搭建spark

  1. 注意版本兼容

在pom文件中查看

  1. 下载spark-2.0.0-bin-hadoop2.4-without-hive版本,without-hive必须

我的版本 hive:2.3.6 spark :spark-2.0.0-bin-hadoop2.4-without-hive hadoop:2.7

  1. 复制jar包

    • cp scala-library-*.jar /hive_home/lib/
    • cp spark-core_*.jar /hive_home/lib/
    • cp spark-network-common_*.jar /hive_home/lib/
    • chill-java chill jackson-module-paranamer jackson-module-scala jersey-container-servlet-core
    • jersey-server json4s-ast kryo-shaded minlog scala-xml spark-launcher
    • spark-network-shuffle spark-unsafe xbean-asm5-shaded

    从spark的jars文件夹中复制过去

  2. 配置hive-site.xml

    <property>
        <name>hive.enable.spark.execution.engine</name>
        <value>true</value>
      </property>
      <property>
        <name>spark.master</name>
        <--! <value>spark://localhost:7077</value> -->
        <value>local</value>
        <description/>
      </property>
    <property>
        <name>hive.execution.engine</name>
        <value>spark</value>
        <description>
          Expects one of [mr, tez, spark].
          Chooses execution engine. Options are: mr (Map reduce, default), tez, spark. While MR
          remains the default engine for historical reasons, it is itself a historical engine
          and is deprecated in Hive 2 line. It may be removed without further warning.
        </description>
    

hadoop的坑

如果Datanode存活数量少于1个,则无法提交job,原因,hdfs文件系统异常(可能是执行了多次初始化,或者断点,网络变动等原因)

解决方法:

找到hdfs-site.xml

<property>    
    <name>dfs.datanode.data.dir</name>    
    <value>file:/hadoop/data/dfs/datanode</value>  
</property>

/hadoop/data下所有内容清空,然后重新执行hadoop namenode -format初始化文件系统