Submitting Hadoop job from client machine

We may have scenario where we have to submit Hadoop job from client machine and client machine is not part of existing Hadoop cluster. It is expected that job to be get executed on Hadoop cluster.

Consider a Scenario:

client_submit_job

Here Namenode and Datanode forms Hadoop Cluster, Client submits job to Namenode.
To achieve this, Client should have same copy of Hadoop Distribution and configuration which is present at Namenode.
Then Only Client will come to know on which node Job tracker is running, and IP of Namenode to access HDFS data.

Go through configuration on Namenode,

core-site.xml will have this property-

  <property>
        <name>fs.default.name</name>
        <value>192.168.0.1:9000</value>
 </property>

mapred-site.xml will have this property-

 <property> 
      <name>mapred.job.tracker</name>
      <value>192.168.0.1:8021</value>
 </property>

These are two important properties must be copied to client machine’s Hadoop configuration.
And you need to set one addtinal property in mapred-site.xml file, to overcome from Privileged Action Exception.

 <property>
      <name>mapreduce.jobtracker.staging.root.dir</name> 
      <value>/user</value> 
 </property>

Also you need to update /ets/hosts of client machine with IP addresses and hostnames of namenode and datanode.
Now you can submit job from client machine with hadoop jar command, and job will be executed on Hadoop Cluster. Note that, you shouldn’t start any hadoop service on client machine.

Advertisements

2 responses to “Submitting Hadoop job from client machine

  1. My client does not have link to hadoop script, and I cannot execute the hadoop jar command. After setting all the properties mentioned above, I still get Privileged Action Exception. Any suggestions? Should I set some environment variables?

  2. Thanks for this interesting and useful tip

    When I tried it with Hadoop 1.0.4, I get the Privileged Action Exception!!

    Caused by: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.security.AccessControlException: Permission denied: user=test1sing, access=WRITE, inode=”staging”:test1:supergroup:rwxr-xr-x

    Any idea how to fix this?

    Many thanks!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s