How to Install and Configure Apache Hadoop on a Single Node in CentOS 7 - Part 2

Channel: Hadoop Linux
Abstract: edit yarn-site.xml file with the below statements enclosed between ... tagsAdd the following properties between
Step 3: Configure Hadoop in CentOS 7

10. Now it’s time to setup Hadoop cluster on a single node in a pseudo distributed mode by editing its configuration files.

The location of hadoop configuration files is $HADOOP_HOME/etc/hadoop/, which is represented in this tutorial by hadoop account home directory (/opt/hadoop/) path.

Once you’re logged in with user hadoop you can start editing the following configuration file.

The first to edit is core-site.xml file. This file contains information about the port number used by Hadoop instance, file system allocated memory, data store memory limit and the size of Read/Write buffers.

$ vi etc/hadoop/core-site.xml

Add the following properties between <configuration> ... </configuration> tags. Use localhost or your machine FQDN for hadoop instance.

<property>
    <name>fs.defaultFS</name>
    <value>hdfs://master.hadoop.lan:9000/</value>
</property>
Configure Hadoop Cluster

11. Next open and edit hdfs-site.xml file. The file contains information about the value of replication data, namenode path and datanode path for local file systems.

$ vi etc/hadoop/hdfs-site.xml

Here add the following properties between <configuration> ... </configuration> tags. On this guide we’ll use /opt/volume/ directory to store our hadoop file system.

Replace the dfs.data.dir and dfs.name.dir values accordingly.

<property>
    <name>dfs.data.dir</name>
    <value>file:///opt/volume/datanode</value>
  </property>

  <property>
    <name>dfs.name.dir</name>
    <value>file:///opt/volume/namenode</value>
</property>
Configure Hadoop Storage

12. Because we’ve specified /op/volume/ as our hadoop file system storage, we need to create those two directories (datanode and namenode) from root account and grant all permissions to hadoop account by executing the below commands.

$ su root
# mkdir -p /opt/volume/namenode
# mkdir -p /opt/volume/datanode
# chown -R hadoop:hadoop /opt/volume/
# ls -al /opt/  #Verify permissions
# exit  #Exit root account to turn back to hadoop user
Configure Hadoop System Storage

13. Next, create the mapred-site.xml file to specify that we are using yarn MapReduce framework.

$ vi etc/hadoop/mapred-site.xml

Add the following excerpt to mapred-site.xml file:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
  <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
  </property>
</configuration>
Set Yarn MapReduce Framework

14. Now, edit yarn-site.xml file with the below statements enclosed between <configuration> ... </configuration> tags:

$ vi etc/hadoop/yarn-site.xml

Add the following excerpt to yarn-site.xml file:

<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
</property>
Add Yarn Configuration

15. Finally, set Java home variable for Hadoop environment by editing the below line from hadoop-env.sh file.

$ vi etc/hadoop/hadoop-env.sh

Edit the following line to point to your Java system path.

export JAVA_HOME=/usr/java/default/
Set Java Home Variable for Hadoop

16. Also, replace the localhost value from slaves file to point to your machine hostname set up at the beginning of this tutorial.

$ vi etc/hadoop/slaves
Pages: 1 2 3

Ref From: tecmint
Channels: hadoop

Related articles