Removing Node from Hadoop Cluster

1) Easy but Not recommended way-

Step 1: Stop all services of cluster.- $HADOOP/sbin/stop-all.sh
Step 2: Edit $HADOOP_HOME/etc/slaves file, Delete entry of node to remove from cluster.
Step 3: Start all service.- $HADOOP_HOME/sbin/start-all.sh

This may cause data loss, In case of replicas of data from removed data node not found on live datanodes.

2) Safe way

Add dfs.hosts.exclude property to hdfs-site.xml
Add mapred.hosts.exclude property to mapred-site.xml

Both of properties can point to file path, This file contains IP addresses of Host/datanodes to be removed.

Step 1:
Add IP address of the nodes to be decommissioned to the exclude file.

Step 2:
Restart all Hadoop Services:

Step 3:
# hdfs dfsadmin -refreshNodes

Step 4: Check Cluster status
# hdfs dfsadmin -report

Step 5:
Remove the nodes from the slaves file.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s