How to Install and Configure Apache Hadoop on a Single Node in CentOS 7 - Part 2
Abstract: edit yarn-site.xml file with the below statements enclosed between ... tagsAdd the following properties between
10. Now it’s time to setup Hadoop cluster on a single node in a pseudo distributed mode by editing its configuration files.
The location of hadoop configuration files is $HADOOP_HOME/etc/hadoop/, which is represented in this tutorial by hadoop account home directory (/opt/hadoop/) path.
Once you’re logged in with user hadoop you can start editing the following configuration file.
The first to edit is core-site.xml
file. This file contains information about the port number used by Hadoop instance, file system allocated memory, data store memory limit and the size of Read/Write buffers.
$ vi etc/hadoop/core-site.xml
Add the following properties between <configuration> ... </configuration>
tags. Use localhost or your machine FQDN for hadoop instance.
<property> <name>fs.defaultFS</name> <value>hdfs://master.hadoop.lan:9000/</value> </property>Configure Hadoop Cluster
11. Next open and edit hdfs-site.xml
file. The file contains information about the value of replication data, namenode path and datanode path for local file systems.
$ vi etc/hadoop/hdfs-site.xml
Here add the following properties between <configuration> ... </configuration>
tags. On this guide we’ll use /opt/volume/ directory to store our hadoop file system.
Replace the dfs.data.dir and dfs.name.dir values accordingly.
<property> <name>dfs.data.dir</name> <value>file:///opt/volume/datanode</value> </property> <property> <name>dfs.name.dir</name> <value>file:///opt/volume/namenode</value> </property>Configure Hadoop Storage
12. Because we’ve specified /op/volume/ as our hadoop file system storage, we need to create those two directories (datanode and namenode) from root account and grant all permissions to hadoop account by executing the below commands.
$ su root # mkdir -p /opt/volume/namenode # mkdir -p /opt/volume/datanode # chown -R hadoop:hadoop /opt/volume/ # ls -al /opt/ #Verify permissions # exit #Exit root account to turn back to hadoop userConfigure Hadoop System Storage
13. Next, create the mapred-site.xml
file to specify that we are using yarn MapReduce framework.
$ vi etc/hadoop/mapred-site.xml
Add the following excerpt to mapred-site.xml file:
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>Set Yarn MapReduce Framework
14. Now, edit yarn-site.xml
file with the below statements enclosed between <configuration> ... </configuration>
tags:
$ vi etc/hadoop/yarn-site.xml
Add the following excerpt to yarn-site.xml file:
<property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property>Add Yarn Configuration
15. Finally, set Java home variable for Hadoop environment by editing the below line from hadoop-env.sh
file.
$ vi etc/hadoop/hadoop-env.sh
Edit the following line to point to your Java system path.
export JAVA_HOME=/usr/java/default/Set Java Home Variable for Hadoop
16. Also, replace the localhost value from slaves file to point to your machine hostname set up at the beginning of this tutorial.
$ vi etc/hadoop/slavesPages: 1 2 3