When we create MapReduce Application in Java, and run generated war on Hadoop platform, we may need to remotely debug that MapReduce Application at runtime.
Hadoop runs in 3 modes
2) Pseudo distributed
3) Fully distributed
It is possible to debug Hadoop Map Reduce App in all three modes.
Its also possible to debug the Mapper and Reducer Task which are executed with help of containers when job is submitted, which contains YarnChild.java (Part of Hadoop Framework), For these please refer- debugging child processes in Hadoop at the end of the current blog.
You have Virtual machine on which Hadoop is installed and you want to debug MapReduce App from eclipse in windows on same machine or another machine.
To achieve this-
1) Modify conf/hadoop-env.sh file in Hadoop installation directory.
# cd /root/hadoop/hadoop-1.0.4/conf
Open hadoop-env.sh and add follwing line -export
jdwp is java debugger wire protocol
suspend=y is for when breakpoint is found suspend execution until debugger is attached.
address=<PORT> Hadoop will listen on this port for debugging.
2) Now run job on Hadoop-
# hadoop jar /root/hadoop/app/WordCount.jar /root/hadoop/app/input/file1 /root/hadoop/app/output/file1
3) Now come to windows/another VM where your eclipse is present.
a) You should have same MapReduce project in your workspace of eclipse.
b) Right click on Project->Debug As->Debug configuration->Remote Java Application
i) Browse project from workspace.
ii) In IP field specify IP of VM where you are running Hadoop.
iii)Set Port Number equal to port number set in hadoop-env.sh HADOOP_OPTS value.
We had set-
In this case,
So now set port=5000 in debug configuration.
click on Debug to start debugging.
* Debugging Hadoop core components
1) Modify the file $HADOOP_HOME/etc/hadoop/yarn-env.sh. Add the following lines.
YARN_OPTS="$YARN_OPTS -agentlib:jdwp=transport=dt_socket, server=y,suspend=y,address=51234"
Add the following lines in the file $HADOOP_HOME/etc/hadoop/mapred-site.xml inside block. It will enable YARN Framework and job will run in YARN.
<property> <name> mapreduce.framework.name</name> <value>yarn</value> </property>
2) Execute the following commands
Follow the same steps as we did for debugging a map reduce job.
- Debugging Child Process in Hadoop-
1. set follwing property in mapred-site.xml
<property> <name>mapred.child.java.opts</name> <value>-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=5432</value> </property>
2. Follow above steps to debug.
3. You can debug child processes(Mapper/Reducer) in Hadoop cluster(fully distributed) also, But you dont know on which datanode the current Mapper / Reducer task is running, so you need to try to find it out by trying IP’s of datanodes with configured port(5432) with eclipse remote debugging.