We may have scenario where we have to submit Hadoop job from client machine and client machine is not part of existing Hadoop cluster. It is expected that job to be get executed on Hadoop cluster.
Consider a Scenario:
Here Namenode and Datanode forms Hadoop Cluster, Client submits job to Namenode.
To achieve this, Client should have same copy of Hadoop Distribution and configuration which is present at Namenode.
Then Only Client will come to know on which node Job tracker is running, and IP of Namenode to access HDFS data.
Go through configuration on Namenode,
core-site.xml will have this property-
<property> <name>fs.default.name</name> <value>192.168.0.1:9000</value> </property>
mapred-site.xml will have this property-
<property> <name>mapred.job.tracker</name> <value>192.168.0.1:8021</value> </property>
These are two important properties must be copied to client machine’s Hadoop configuration.
And you need to set one addtinal property in mapred-site.xml file, to overcome from Privileged Action Exception.
<property> <name>mapreduce.jobtracker.staging.root.dir</name> <value>/user</value> </property>
Also you need to update /ets/hosts of client machine with IP addresses and hostnames of namenode and datanode.
Now you can submit job from client machine with hadoop jar command, and job will be executed on Hadoop Cluster. Note that, you shouldn’t start any hadoop service on client machine.