Set Up Hadoop Cluster

Introduction:
Hadoop is developed to run in distributed environment, where Namenode and Datanode may run on different nodes.

I am going to demonstrate how to setup Hadoop Cluster.

Scenario:
Node 1 (Master): Namenode
Node 2 (Slave): Datanode 1
Node 3 (Slave): Datanode 2

Steps:
A) Set Hostnames of Namenode and Datanodes
You can set any hostname, for understanding purpose I am setting-

For Namenode :- hadoop-namenode.org.com
For Datanode1:- hadoop-datanode1.org.com
For Datanode2:- hadoop-datanode2.org.com

You can set hostname in file /etc/sysconfig/network
For Example-

NETWORKING=yes
HOSTNAME=hadoop-namenode.org.com

Do set hostnames on datanodes machines also, with same process explained above.

B) Setup Namenode :
1) Put IP and hostnames of namenode and datanode in /etc/hosts
For Example-

172.30.30.61 hadoop-namenode.org.com
172.30.30.62 hadoop-datanode1.org.com
172.30.30.54 hadoop-datanode2.org.com

Here, “hadoop-namenode.org.com” will have namenode installed and It will be master machine and rest others are slaves with Datanode installed on them.

Try to ping machines with hostnames to check whether it resolving to IP or not.
For example- ping hadoop-datanode1.org.com

2) Download Apache Hadoop from- http://hadoop.apache.org/
Unpack it- Set HADOOP_HOME and JAVA_HOME environment variables. –
For Help – follow some steps from- https://pravinchavan.wordpress.com/2013/04/02/installing-hadoop-on-cent-os/

3) Under configuration directory, you have to modify slaves file.
Older Version – $HADOOP_HOME/conf/slaves
Newer Version – $HADOOP_HOME/etc/hadoop/slaves

Put IP or Hostnames of Datanodes in slaves file.
For example-

hadoop-datanode1.org.com
hadoop-datanode2.org.com

4) In same configuration folder you have to modify

core-site.xml, hdfs-site.xml, mapred-site.xml, yarn-site.xml*

*If you are using YARN framework(Next Gen Apache Hadoop Framework), then only you need to change yarn-site.xml

core-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://hadoop-namenode.org.com:50000</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the fleSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.
</description>
</property>

<property>
<name>hadoop.tmp.dir</name>
<value>/opt/hdfs/tmp</value>
<description>A base for other temporary directories.</description>
</property>
</configuration>

hdfs-core.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
<name>dfs.name.dir</name>
<value>/opt/hdfs/name</value>
<description>Determines where on the local filesystem an DFS name
node should store its blocks. If this is a comma-delimited list of
directories, then data will be stored in all named directories,
typically on different devices. Directories that do not exist are
ignored.
</description>
</property>

<property>
<name>dfs.data.dir</name>
<value>/opt/hdfs/data</value>
<description>Determines where on the local filesystem an DFS data
node should store its blocks. If this is a comma-delimited list of
directories, then data will be stored in all named directories,
typically on different devices. Directories that do not exist are
ignored.
</description>
</property>

<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.
The actual number of replications can be specified when the file is
created. The default is used if replication is not specified in
create time.
</description>
</property>
<!--
<property>
<name>dfs.datanode.du.reserved</name>
<value>53687090000</value>
<description>This is the reserved space for non dfs use</description>
</property> -->
</configuration>

mapred-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
<name>mapred.job.tracker</name>
<value>hadoop-namenode.org.com:8021</value>
</property>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>

yarn-site.xml

<?xml version="1.0"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<configuration>

<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.log-dirs</name>
<value>/root/logs/nodemanager</value>
<description>the directories used by Nodemanagers as log directories</description>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce.shuffle</value>
<description>Long running service which executes on Node Manager(s) and provides MapReduce Sort and Shuffle functionality.</description>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
<description>Enable log aggregation so application logs are moved onto hdfs and are viewable via web ui after the application completed. The default location on hdfs is '/log' and can be changed via yarn.nodemanager.remote-app-log-dir property</description>
</property>

<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>hadoop-namenode.org.com:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>hadoop-namenode.org.com:8031</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>hadoop-namenode.org.com:8032</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>hadoop-namenode.org.com:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>hadoop-namenode.org.com:8088</value>
</property>

</configuration>

5)Configuring passwordless ssh access

1. From Namenode hit follwing command which will generate

# ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
# cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

This will create a key id_rsa.pub

2. Copy the id_rsa.pub to datanode machine.

# scp ~/.ssh/id_rsa.pub hadoop-datanode1.org.com:~/.ssh/
 # scp ~/.ssh/id_rsa.pub hadoop-datanode2.org.com:~/.ssh/

3. Login to the datanode1 and datanode2 machine and execute the following command to authorize the key.

# cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys 

——

Alternatives to These steps:
Create key using following command:

# ssh-keygen -t rsa

Copy RSA key to other machine

# ssh-copy-id machine-name

C) Configuring Slaves (Datanodes) machine

1) Follow same steps that we have followed to configure master node(Name Node)
2) Just dont set $HADOOP_HOME/conf/slaves file
3) Slaves node will have same hadoop distribution same configuration for all core-site.xml, hdfs-site.xml, etc.
4) Dont forget to enable passwordless ssh login, by generating RSA keys and adding it to other nodes authorised key set. Repeat this ssh steps for all slaves nodes that you have.

D)Start and Stop hadoop daemons:

1. Format Namenode

$HADOOP_HOME/sbin/hadoop namenode -format

2. You need to start/stop the daemons only on the master machine that is namenode machine, it will start/stop the daemons in all slave machines that is datanode machine.

To Start all the daemons-

$HADOOP_HOME/sbin/start-all.sh

To Stop all the daemons-

$HADOOP_HOME/sbin/stop-all.sh

E) Check for java processes running on Namenode and Datanode

On Namenode-

[root@hadoop-namenode sbin]# jps
12854 NameNode
19513 Jps
13200 ResourceManager
13047 SecondaryNameNode

On Datanodes

[root@inteltxt1 hadoop]# jps
5555 NodeManager
5450 DataNode
12633 Jps
Advertisements

One response to “Set Up Hadoop Cluster

  1. Pingback: Yarn, installation | Here·

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s