Hadoop installation on Windows 10

Haddop - Introduction


Here we will focus how to install Hadoop on Windows 10 environment rather than its detail framework, but will cover brief definition.

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. 

Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.

I went through Hadoop 2.8.0 version, though you can use any stable version.

Download Hadoop 2.8.0
Download Java JDK 1.8.0
Check either Java 1.8.0 is already installed on your system or not, using command prompt –
Java –version


If Java is not installed on your system, then first installs java under "D:\JAVA" or your preferred drive –

STEP - 1: Extract the Hadoop file


Extract file Hadoop 2.8.0.tar.gz or Hadoop-2.8.0.zip and place under "D:\Hadoop", you can use any preferred location –

[1] You will get again a tar file post extraction –


[2] Go inside of Hadoop-2.8.0.tar folder and extract again –


[3] Copy the leaf folder “hadoop-2.8.0” and move to the root folder "D:\Hadoop" and removed all other files and folders –



STEP - 2: Configure Environment variables


Set the path HADOOP_HOME and JAVA_HOME Environment variables (User Variables) on windows 10 –

This PC - > Right Click - > Properties - > Advanced System Settings - > Advanced - > Environment Variables



STEP - 3: Configure System variables



Next onward need to set some particulars System variables, including Hadoop bin directory and JAVA bin directory path –

Variable: Path
Value:
  • D:\Hadoop\hadoop-2.8.0\bin
  • D:\Hadoop\hadoop-2.8.0\sbin
  • D:\Hadoop\hadoop-2.8.0\share\hadoop\common\*
  • D:\Hadoop\hadoop-2.8.0\share\hadoop\hdfs
  • D:\Hadoop\hadoop-2.8.0\share\hadoop\hdfs\lib\*
  • D:\Hadoop\hadoop-2.8.0\share\hadoop\hdfs\*
  • D:\Hadoop\hadoop-2.8.0\share\hadoop\yarn\lib\*
  • D:\Hadoop\hadoop-2.8.0\share\hadoop\yarn\*
  • D:\Hadoop\hadoop-2.8.0\share\hadoop\mapreduce\lib\*
  • D:\Hadoop\hadoop-2.8.0\share\hadoop\mapreduce\*
  • D:\Hadoop\hadoop-2.8.0\share\hadoop\common\lib\*
  • D:\Java\jdk1.8.0_171\bin


STEP - 4: Create required folders


  1. Create some dedicated folders -
  2. Create folder "data" under "D:\Hadoop\hadoop-2.8.0".
  3. Create folder "datanode" under “D:\Hadoop\hadoop-2.8.0\data".
  4. Create folder "namenode" under “D:\Hadoop\hadoop-2.8.0\data”
  5. Create a folder to store temporary data during execution of a project, such as “D:\Hadoop\hadoop-2.8.0\temp.”
  6. Create a log folder, such as “D:\Hadoop\hadoop-2.8.0\userlog”
For example -


STEP - 5: Configure required XML files


Now need to configure four key files with minimal required details –
  • core-site.xml
  • hdfs-site.xml
  • mapred.xml
  • yarn.xml

[1] Edit file D:/Hadoop/hadoop-2.8.0/etc/hadoop/core-site.xml, paste below xml paragraph and save this file.

<configuration>

   <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
   </property>
</configuration>

[2] Rename "mapred-site.xml.template" to "mapred-site.xml" and edit this file D:/Hadoop/hadoop-2.8.0/etc/hadoop/mapred-site.xml, paste below xml paragraph and save this file.

<configuration>

   <property>
       <name>mapreduce.framework.name</name>
       <value>yarn</value>
   </property>
</configuration>

[3] Edit file D:/Hadoop/hadoop-2.8.0/etc/hadoop/hdfs-site.xml, paste below xml paragraph and save this file.

<configuration>

   <property>
       <name>dfs.replication</name>
       <value>1</value>
   </property>
   <property>
       <name>dfs.namenode.name.dir</name>
       <value>/D:/Hadoop/hadoop-2.8.0/data/namenode</value>
   </property>
   <property>
       <name>dfs.datanode.data.dir</name>
       <value>/D:/Hadoop/hadoop-2.8.0/data/datanode</value>
   </property>
</configuration>

[4] Edit file D:/Hadoop/hadoop-2.8.0/etc/hadoop/yarn-site.xml, paste below xml paragraph and save this file.

<configuration>

   <property>
     <name>yarn.nodemanager.aux-services</name>
     <value>mapreduce_shuffle</value>
   </property>
   <property>
       <name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name> 
 <value>org.apache.hadoop.mapred.ShuffleHandler</value>
   </property>
   <property>
  <name>yarn.nodemanager.log-dirs</name>
  <value>/D:/Hadoop/hadoop-2.8.0/userlog</value><final>true</final>
 </property>
  <property><name>yarn.nodemanager.local-dirs</name>
 <value>/D:/Hadoop/hadoop-2.8.0/temp/nm-localdir</value>
 </property>
</configuration>

[5] Edit file D:/Hadoop/hadoop-2.8.0/etc/hadoop/hadoop-env.cmd by closing the command line"JAVA_HOME=%JAVA_HOME%"


STEP - 6: Manage Hadoop configuration



Time to manage Hadoop configuration, download file Hadoop Configuration.zip –
  
4. Execute namenode format command; go to the location “D:\Hadoop\hadoop-2.8.0\bin” by writing on command prompt and then “hdfs namenode –format” –



STEP - 7: Start Hadoop


Time to start Hadoop, open command prompt and change directory to “D:\Hadoop\hadoop-2.8.0\sbin" and type "start-all.cmd" to start apache.


It will open four instances of cmd for following tasks –
  • Hadoop Datanaode
  • Hadoop Namenode
  • Yarn Nodemanager
  • Yarn Resourcemanager


It can be verified via browser also as –





Since the ‘start-all.cmd’ command has been deprecated so you can use below command in order wise -
  • “start-dfs.cmd” and
  • “start-yarn.cmd”

STEP - 8: Stop Hadoop

To stop the services, execute stop command such as –
  • Stop-all.cmd
  • Stop-dfs.cmd
  • Stop-yarn.cmd
Congratulations, Hadoop installed !! šŸ˜Š

STEP - 9: Some Hands on activity



For example, copy file from local to HDFS –
  • hadoop fs -mkdir /raj/data
  • hadoop fs -ls /raj
  • hadoop fs -copyFromLocal D:/testhdfs.xlsx /raj/data/

  
It can be verify through web browser -


  

10 comments:

  1. 19/10/30 16:53:53 INFO service.AbstractService: Service org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService failed in state INITED; cause: java.lang.NullPointerException
    java.lang.NullPointerException
    at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:636)
    at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:86)
    at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:630)
    at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:861)
    at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:625)
    at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:1071)
    at org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:177)
    at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:206)
    at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:738)
    at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:734)
    at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
    at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:734)
    at org.apache.hadoop.yarn.server.nodemanager.DirectoryCollection.createDir(DirectoryCollection.java:478)
    at org.apache.hadoop.yarn.server.nodemanager.DirectoryCollection.createDir(DirectoryCollection.java:476)
    at org.apache.hadoop.yarn.server.nodemanager.DirectoryCollection.createDir(DirectoryCollection.java:476)
    at org.apache.hadoop.yarn.server.nodemanager.DirectoryCollection.createDir(DirectoryCollection.java:476)
    at org.apache.hadoop.yarn.server.nodemanager.DirectoryCollection.createDir(DirectoryCollection.java:476)
    at org.apache.hadoop.yarn.server.nodemanager.DirectoryCollection.createNonExistentDirs(DirectoryCollection.java:275)
    at org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.serviceInit(LocalDirsHandlerService.java:206)
    at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
    at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
    at org.apache.hadoop.yarn.server.nodemanager.NodeHealthCheckerService.serviceInit(NodeHealthCheckerService.java:50)
    at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
    at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
    at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:357)
    at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
    at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:636)
    at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:684)

    ReplyDelete
    Replies
    1. Hey Jaseer, I am unable to access this link: https://mindtreeonline-my.sharepoint.com/:u:/g/personal/m1045767_mindtree_com1/EaT3z5mOMyFAkdba7ywoPqoBCpZ920qGbajvMTXQvYGYkA?e=iOiP6y, which is used to get hadoop configuration file. If you have already downloaded that zip file, can you please send it to bilalshafqat0336@gmail.com? So i can also use it. Thanks.

      Delete
  2. I getting issue while accessing this link:
    https://mindtreeonline-my.sharepoint.com/:u:/g/personal/m1045767_mindtree_com1/EaT3z5mOMyFAkdba7ywoPqoBCpZ920qGbajvMTXQvYGYkA?e=iOiP6y
    to download Hadoop configuration file.
    It says that user bilalshafqat0336@gmail.com is not listed in users directory, seems like access issue, please resolve it.

    ReplyDelete
    Replies
    1. Hi!

      https://drive.google.com/file/d/1AMqV4F5ybPF4ab4CeK8B3AsjdGtQCdvy/view

      You can see:

      https://github.com/TuanHenry/Step-by-step-install-Hadoop-on-Windows-10/wiki

      Delete
  3. Hi,

    I am not able to use sql count query in hive, getting this below error

    Error: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask (state=08S01,code=2).

    Please let me know how to solve this.

    ReplyDelete
  4. I getting issue while accessing this link:
    https://mindtreeonline-my.sharepoint.com/:u:/g/personal/m1045767_mindtree_com1/EaT3z5mOMyFAkdba7ywoPqoBCpZ920qGbajvMTXQvYGYkA?e=iOiP6y
    to download Hadoop configuration file.
    It says that user abhiismeonly@gmail.com is not listed in users directory, seems like access issue, please resolve it.

    ReplyDelete
    Replies
    1. Hi!

      https://drive.google.com/file/d/1AMqV4F5ybPF4ab4CeK8B3AsjdGtQCdvy/view

      You can see:

      https://github.com/TuanHenry/Step-by-step-install-Hadoop-on-Windows-10/wiki

      Delete
  5. How do I access this link?

    https://mindtreeonline-my.sharepoint.com/:u:/g/personal/m1045767_mindtree_com1/EaT3z5mOMyFAkdba7ywoPqoBCpZ920qGbajvMTXQvYGYkA?e=iOiP6y

    If anyone has this file, please email it to mansih.spit551@gmail.com!! Its very urgent. Thanks.

    ReplyDelete
    Replies
    1. Hi!

      https://drive.google.com/file/d/1AMqV4F5ybPF4ab4CeK8B3AsjdGtQCdvy/view

      You can see:

      https://github.com/TuanHenry/Step-by-step-install-Hadoop-on-Windows-10/wiki

      Delete