Hadoop installation on Windows 10

Haddop - Introduction


Here we will focus how to install Hadoop on Windows 10 environment rather than its detail framework, but will cover brief definition.

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. 

Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.

I went through Hadoop 2.8.0 version, though you can use any stable version.

Download Hadoop 2.8.0
Download Java JDK 1.8.0
Check either Java 1.8.0 is already installed on your system or not, using command prompt –
Java –version


If Java is not installed on your system, then first installs java under "D:\JAVA" or your preferred drive –

STEP - 1: Extract the Hadoop file


Extract file Hadoop 2.8.0.tar.gz or Hadoop-2.8.0.zip and place under "D:\Hadoop", you can use any preferred location –

[1] You will get again a tar file post extraction –


[2] Go inside of Hadoop-2.8.0.tar folder and extract again –


[3] Copy the leaf folder “hadoop-2.8.0” and move to the root folder "D:\Hadoop" and removed all other files and folders –



STEP - 2: Configure Environment variables


Set the path HADOOP_HOME and JAVA_HOME Environment variables (User Variables) on windows 10 –

This PC - > Right Click - > Properties - > Advanced System Settings - > Advanced - > Environment Variables



STEP - 3: Configure System variables



Next onward need to set some particulars System variables, including Hadoop bin directory and JAVA bin directory path –

Variable: Path
Value:
  • D:\Hadoop\hadoop-2.8.0\bin
  • D:\Hadoop\hadoop-2.8.0\sbin
  • D:\Hadoop\hadoop-2.8.0\share\hadoop\common\*
  • D:\Hadoop\hadoop-2.8.0\share\hadoop\hdfs
  • D:\Hadoop\hadoop-2.8.0\share\hadoop\hdfs\lib\*
  • D:\Hadoop\hadoop-2.8.0\share\hadoop\hdfs\*
  • D:\Hadoop\hadoop-2.8.0\share\hadoop\yarn\lib\*
  • D:\Hadoop\hadoop-2.8.0\share\hadoop\yarn\*
  • D:\Hadoop\hadoop-2.8.0\share\hadoop\mapreduce\lib\*
  • D:\Hadoop\hadoop-2.8.0\share\hadoop\mapreduce\*
  • D:\Hadoop\hadoop-2.8.0\share\hadoop\common\lib\*
  • D:\Java\jdk1.8.0_171\bin


STEP - 4: Create required folders


  1. Create some dedicated folders -
  2. Create folder "data" under "D:\Hadoop\hadoop-2.8.0".
  3. Create folder "datanode" under “D:\Hadoop\hadoop-2.8.0\data".
  4. Create folder "namenode" under “D:\Hadoop\hadoop-2.8.0\data”
  5. Create a folder to store temporary data during execution of a project, such as “D:\Hadoop\hadoop-2.8.0\temp.”
  6. Create a log folder, such as “D:\Hadoop\hadoop-2.8.0\userlog”
For example -


STEP - 5: Configure required XML files


Now need to configure four key files with minimal required details –
  • core-site.xml
  • hdfs-site.xml
  • mapred.xml
  • yarn.xml

[1] Edit file D:/Hadoop/hadoop-2.8.0/etc/hadoop/core-site.xml, paste below xml paragraph and save this file.

<configuration>

   <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
   </property>
</configuration>

[2] Rename "mapred-site.xml.template" to "mapred-site.xml" and edit this file D:/Hadoop/hadoop-2.8.0/etc/hadoop/mapred-site.xml, paste below xml paragraph and save this file.

<configuration>

   <property>
       <name>mapreduce.framework.name</name>
       <value>yarn</value>
   </property>
</configuration>

[3] Edit file D:/Hadoop/hadoop-2.8.0/etc/hadoop/hdfs-site.xml, paste below xml paragraph and save this file.

<configuration>

   <property>
       <name>dfs.replication</name>
       <value>1</value>
   </property>
   <property>
       <name>dfs.namenode.name.dir</name>
       <value>/D:/Hadoop/hadoop-2.8.0/data/namenode</value>
   </property>
   <property>
       <name>dfs.datanode.data.dir</name>
       <value>/D:/Hadoop/hadoop-2.8.0/data/datanode</value>
   </property>
</configuration>

[4] Edit file D:/Hadoop/hadoop-2.8.0/etc/hadoop/yarn-site.xml, paste below xml paragraph and save this file.

<configuration>

   <property>
     <name>yarn.nodemanager.aux-services</name>
     <value>mapreduce_shuffle</value>
   </property>
   <property>
       <name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name> 
 <value>org.apache.hadoop.mapred.ShuffleHandler</value>
   </property>
   <property>
  <name>yarn.nodemanager.log-dirs</name>
  <value>/D:/Hadoop/hadoop-2.8.0/userlog</value><final>true</final>
 </property>
  <property><name>yarn.nodemanager.local-dirs</name>
 <value>/D:/Hadoop/hadoop-2.8.0/temp/nm-localdir</value>
 </property>
</configuration>

[5] Edit file D:/Hadoop/hadoop-2.8.0/etc/hadoop/hadoop-env.cmd by closing the command line"JAVA_HOME=%JAVA_HOME%"


STEP - 6: Manage Hadoop configuration



Time to manage Hadoop configuration, download file Hadoop Configuration.zip –
  
4. Execute namenode format command; go to the location “D:\Hadoop\hadoop-2.8.0\bin” by writing on command prompt and then “hdfs namenode –format” –



STEP - 7: Start Hadoop


Time to start Hadoop, open command prompt and change directory to “D:\Hadoop\hadoop-2.8.0\sbin" and type "start-all.cmd" to start apache.


It will open four instances of cmd for following tasks –
  • Hadoop Datanaode
  • Hadoop Namenode
  • Yarn Nodemanager
  • Yarn Resourcemanager


It can be verified via browser also as –





Since the ‘start-all.cmd’ command has been deprecated so you can use below command in order wise -
  • “start-dfs.cmd” and
  • “start-yarn.cmd”

STEP - 8: Stop Hadoop

To stop the services, execute stop command such as –
  • Stop-all.cmd
  • Stop-dfs.cmd
  • Stop-yarn.cmd
Congratulations, Hadoop installed !! 😊

STEP - 9: Some Hands on activity



For example, copy file from local to HDFS –
  • hadoop fs -mkdir /raj/data
  • hadoop fs -ls /raj
  • hadoop fs -copyFromLocal D:/testhdfs.xlsx /raj/data/

  
It can be verify through web browser -


  

1 comment: