Removing Node from Hadoop Cluster

1) Easy but Not recommended way-

Step 1: Stop all services of cluster.- $HADOOP/sbin/
Step 2: Edit $HADOOP_HOME/etc/slaves file, Delete entry of node to remove from cluster.
Step 3: Start all service.- $HADOOP_HOME/sbin/

This may cause data loss, In case of replicas of data from removed data node not found on live datanodes.

2) Safe way

Add dfs.hosts.exclude property to hdfs-site.xml
Add mapred.hosts.exclude property to mapred-site.xml

Both of properties can point to file path, This file contains IP addresses of Host/datanodes to be removed.

Step 1:
Add IP address of the nodes to be decommissioned to the exclude file.

Step 2:
Restart all Hadoop Services:

Step 3:
# hdfs dfsadmin -refreshNodes

Step 4: Check Cluster status
# hdfs dfsadmin -report

Step 5:
Remove the nodes from the slaves file.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s