Hive installation on Windows 10





Hive Introduction


In reference to Hadoop and HBase outline as well installation over Windows environment, already we have been talked and gone through the same in my previous post. We came to know that Hadoop can perform only batch processing, and data will be accessed only in a sequential manner. It does mean one has to search the entire data-set even for the simplest of jobs. 

In such scenario, a huge data-set when processed results in another huge data set, which should also be processed sequentially. At this point, a new solution is needed to access any point of data in a single unit of time (random access). Here the HBase can store massive amounts of data from terabytes to petabytes and allows fast random reads and writes that cannot be handled by the Hadoop. 

HBase is an open source non-relational (NoSQL) distributed column-oriented database that runs on top of HDFS and real-time read/write access to those large data-sets. Initially, it was Google Big Table, afterwards it was re-named as HBase and is primarily written in Java, designed to provide quick random access to huge amounts of the data-set.

Next in this series, we will walk through Apache Hive, the Hive is a data warehouse infrastructure work on Hadoop Distributed File System and MapReduce to encapsulate Big Data, and makes querying and analyzing stress-free. In fact, it is an ETL tool for Hadoop ecosystem, enables developers to write Hive Query Language (HQL) statements very similar to SQL statements.

Hive Installation


In brief, Hive is a data warehouse software project built on top of Hadoop, that facilitate reading, writing, and managing large datasets residing in distributed storage using SQL. Honestly, before moving ahead, it is essential to install Hadoop first, I am considering Hadoop is already installed, if not, then go to my previous post how to install Hadoop on Windows environment.

I went through Hive (2.1.0) installation on top of Derby Metastore (10.12.1.1), though you can use any stable version.

Download Hive 2.1.0
  • https://archive.apache.org/dist/hive/hive-2.1.0/


Download Derby Metastore 10.12.1.1
  • https://archive.apache.org/dist/db/derby/db-derby-10.12.1.1/


Download hive-site.xml
  • https://mindtreeonline-my.sharepoint.com/:u:/g/personal/m1045767_mindtree_com1/EbsE-U5qCIhIpo8AmwuOzwUBstJ9odc6_QA733OId5qWOg?e=2X9cfX
  • https://drive.google.com/file/d/1qqAo7RQfr5Q6O-GTom6Rji3TdufP81zd/view?usp=sharing


STEP - 1: Extract the Hive file


Extract file apache-hive-2.1.0-bin.tar.gz and place under "D:\Hive", you can use any preferred location – 

[1] You will get again a tar file post extraction – 

Hive folder

[2] Go inside of apache-hive-2.1.0-bin.tar folder and extract again – 

Extracted folder

[3] Copy the leaf folder “apache-hive-2.1.0-bin” and move to the root folder "D:\Hive" and removed all other files and folders – 

Hive folder details

STEP - 2: Extract the Derby file


Similar to Hive, extract file db-derby-10.12.1.1-bin.tar.gz and place under "D:\Derby", you can use any preferred location –

Derby folder
Derby folder details

STEP - 3: Moving hive-site.xml file


Drop the downloaded file “hive-site.xml” to hive configuration location “D:\Hive\apache-hive-2.1.0-bin\conf”. 

Hive-site.xml

STEP - 4: Moving Derby libraries


Next, need to drop all derby library to hive library location – 
[1] Move to library folder under derby location D:\Derby\db-derby-10.12.1.1-bin\lib.

Derby libraries

[2] Select all and copy all libraries.

[3] Move to library folder under hive location D:\Hive\apache-hive-2.1.0-bin\lib.

Moved libraries

[4] Drop all selected libraries here.

Libraries



STEP - 5: Configure Environment variables


Set the path for the following Environment variables (User Variables) on windows 10 – 
  • HIVE_HOME - D:\Hive\apache-hive-2.1.0-bin
  • HIVE_BIN - D:\Hive\apache-hive-2.1.0-bin\bin
  • HIVE_LIB - D:\Hive\apache-hive-2.1.0-bin\lib
  • DERBY_HOME - D:\Derby\db-derby-10.12.1.1-bin
  • HADOOP_USER_CLASSPATH_FIRST - true


This PC - > Right Click - > Properties - > Advanced System Settings - > Advanced - > Environment Variables 

HIVE_HOME


HIVE_BIN

HIVE_LIB

DERBY_HOME

HADOOP_USER

STEP - 6: Configure System variables


Next onward need to set System variables, including Hive bin directory path – 

HADOOP_USER_CLASSPATH_FIRST - true
Variable: Path 
Value: 
  1. D:\Hive\apache-hive-2.1.0-bin\bin
  2. D:\Derby\db-derby-10.12.1.1-bin\bin


System Variables

STEP - 7: Working with hive-site.xml


Now need to do a cross check with Hive configuration file for Derby details – 
  • hive-site.xml


[1] Edit file D:/Hive/apache-hive-2.1.0-bin/conf/hive-site.xml, paste below xml paragraph and save this file.

<configuration>
<property> 
<name>javax.jdo.option.ConnectionURL</name> 
<value>jdbc:derby://localhost:1527/metastore_db;create=true</value> 
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property> 
<name>javax.jdo.option.ConnectionDriverName</name> 
<value>org.apache.derby.jdbc.ClientDriver</value> 
<description>Driver class name for a JDBC metastore</description>
</property>
<property> 
<name>hive.server2.enable.impersonation</name> 
<description>Enable user impersonation for HiveServer2</description>
<value>true</value>
</property>
<property>
<name>hive.server2.authentication</name> 
<value>NONE</value>
<description> Client authentication types. NONE: no authentication check LDAP: LDAP/AD based authentication KERBEROS: Kerberos/GSSAPI authentication CUSTOM: Custom authentication provider (Use with property hive.server2.custom.authentication.class) </description>
</property>
<property>
<name>datanucleus.autoCreateTables</name>
<value>True</value>
</property> 
</configuration>

STEP - 8: Start the Hadoop


Here need to start Hadoop first -

Open command prompt and change directory to “D:\Hadoop\hadoop-2.8.0\sbin" and type "start-all.cmd" to start apache.

Start Hadoop



It will open four instances of cmd for following tasks – 
  • Hadoop Datanaode
  • Hadoop Namenode
  • Yarn Nodemanager
  • Yarn Resourcemanager


 Hadoop Started



It can be verified via browser also as – 
  • Namenode (hdfs) - http://localhost:50070 
  • Datanode - http://localhost:50075
  • All Applications (cluster) - http://localhost:8088 etc.


Hadoop In Browser

Since the ‘start-all.cmd’ command has been deprecated so you can use below command in order wise - 
  • “start-dfs.cmd” and 
  • “start-yarn.cmd”


STEP - 9: Start Derby server


Post successful execution of Hadoop, change directory to “D:\Derby\db-derby-10.12.1.1-bin\bin” and type “startNetworkServer -h 0.0.0.0” to start derby server.

Start Derby

Derby Started

STEP - 10: Start the Hive


Derby server has been started and ready to accept connection so open a new command prompt under administrator privileges and move to hive directory as “D:\Hive\apache-hive-2.1.0-bin\bin” – 

[1] Type “jps -m” to check NetworkServerControl

Validate server

[2] Type “hive” to execute hive server.

Start Hive

Hive Started

Congratulations, Hive installed !! 😊

STEP-11: Some hands on activities


[1] Create Database in Hive - 
CREATE DATABASE IF NOT EXISTS TRAINING;

Create database

[2] Show Database - 
SHOW DATABASES;

Show Databases

[3] Creating Hive Tables - 
CREATE TABLE IF NOT EXISTS testhive(col1 char(10), col2 char(20));

Create table

Create table

[4] DESCRIBE Table Command in Hive - 
Describe Students

describe students

[5] Usage of LOAD Command for Inserting Data Into Hive Tables
Create a sample text file using ‘|’ delimiter – 

text file

Load data

[6] Hive Select Data from Table - 
SELECT * FROM STUDENTS;

Select


Stay in touch for more posts

13 comments:

  1. thanks! I'll try to come with more articles, meanwhile I went through your training website www.fitaacademy.com
    ....it covered almost all courses...looks great !

    ReplyDelete
  2. I recently came across your blog and have been reading along. I thought I would leave my first comment. I don’t know what to say except that I have enjoyed reading. odzyskiwanie danych Warszawa

    ReplyDelete
  3. I followed the process as said. Whenever I type hive on cmd . The output is "The syntax of the command is incorrect. File Not Found. Not a Valid JAR: C:/Users/Dell/org.apache.hive.beeline.cli.HiveCli"
    My hadoop and hive home is in the E: drive and i have double checked every variable.

    ReplyDelete
  4. Hi Shwetabh,
    Regret about delay response.

    I trust you went through the same steps as I stated in the article. Do a cross check and follow again, but don't forget Hadoop installation is essential to proceed Hive installation.

    So if you missed the Hadoop installation, please visit the precise post and let me know if you still face difficulties.

    ReplyDelete
  5. Hi
    I followed exact steps as given above
    But i get this error FAILED: HiveException java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
    while executing "create database training"

    ReplyDelete
  6. I AM ALSO GETTING THE SAME EXCEPTION FAILED: HiveException java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient

    ReplyDelete
  7. Hi, I am getting below issue once I run the hive on the cmd:

    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in [jar:file:/C:/Hive/apache-hive-2.1.0-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/C:/Users/dream/Downloads/hadoop-3.1.0/hadoop-3.1.0/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
    ERROR StatusLogger No log4j2 configuration file found. Using default configuration: logging only errors to the console.
    Connecting to jdbc:hive2://
    Error applying authorization policy on hive configuration: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
    Beeline version 2.1.0 by Apache Hive
    Error applying authorization policy on hive configuration: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
    Connection is already closed.

    Not sure what is causing this.

    ReplyDelete
    Replies
    1. me too! do you resolve the problem?

      Delete
    2. i have the same problem as yours did you find a solution till now or not

      Delete
    3. i have solve the problem if u are using hive 2.1.0 and your hadoop version is 3.0 or above then it will not run on windows so install hadoop 2.8.0 then run hive it will work

      Delete
  8. Really it is very useful blog, by following this blog i installed hive in windows 10 successfully, but while installing we need to care about steps what he mentioned, thanks a lot for your contribution @Rajendra kumar.

    ReplyDelete