Building Apache Hadoop From Source

In order to build Apache Hadoop from Source, first step is install all required softwares and then checkout latest Apache Hadoop code from trunk and build it.

Step A: Install Required Softwares

Requirements:
* Unix System
* JDK 1.6
* Maven 3.1.1
* ProtocolBuffer 2.4.1+ (for MapReduce and HDFS)
* CMake 2.6 or newer (if compiling native code)
* Internet connection for first build (to fetch all Maven and Hadoop dependencies)

Installation Steps for these softwares:

1) Install Maven 3.1.1

# wget http://www.dsgnwrld.com/am/maven/maven-3/3.1.1/binaries/apache-maven-3.1.1-bin.tar.gz
# tar xzf apache-maven-3.1.1-bin.tar.gz -C /usr/local
# cd /usr/local
# ln -s apache-maven-3.1.1 maven
# export M2_HOME=/usr/local/maven
# export PATH=$M2_HOME/bin:$PATH

2) Install Java
Download JDK1.6 from url-http://www.oracle.com/technetwork/java/javase/downloads/jdk6downloads-1902814.html

select- jdk-6u43-linux-x64.bin

go to path where downloaded-

# chmod +x jdk-6u43-linux-x64.bin
# ./ jdk-6u43-linux-x64.bin

Set java home to path where you have downloaded java binary file.

# export JAVA_HOME=/usr/java/jdk1.6.0_43
# export PATH=$JAVA_HOME/bin:$PATH

3) Install Open SSL
This is optional, OpenSSL is used in several Hadoop Security Features.

Download source of openssl-1.0.1c from
http://www.openssl.org/source/

# tar -xvf openssl-1.0.1c.tar.gz
# make clean
# ./config shared –prefix=/usr –openssldir=/usr/local/openssl
# make && make test
# make install
# openssl version

4) ProtocolBuffer 2.4.1

# yum install zlib
# yum install zlib-devel
# wget http://protobuf.googlecode.com/files/protobuf-2.4.1.tar.bz2
# tar xfj protobuf-2.4.1.tar.bz2
# ./configure
# make
# make install
# ldconfig

5) Cmake and Kernal Headers

# yum install cmake
# yum install kernel-headers gcc-c++

Step B. Download Apache Hadoop Source with SVN client.

# svn co http://svn.apache.org/repos/asf/subversion/trunk subversion

1)Extact Source
Change directory to checkout directory.
if it is zip then

# unzip *

if it is tar then

# tar -xvf *

2) Build it.

Change directory to top level directory of extracted source where you will find pom.xml, which is build script in case of maven.

# mvn package -Pdist -Pdoc -Psrc -Dtar -DskipTests

It will build package and resulting distribuition is stored in-

hadoop-dist/target/ directory

Now can install this distribution and run job over it.

Some Alternatives ways of Building distributions:

Create binary distribution without native code and without documentation:

# mvn package -Pdist -DskipTests -Dtar

Create binary distribution with native code and with documentation:

# mvn package -Pdist,native,docs -DskipTests -Dtar

Create source distribution:

# mvn package -Psrc -DskipTests

Create source and binary distributions with native code and documentation:

# mvn package -Pdist,native,docs,src -DskipTests -Dtar

Create a local staging version of the website (in /tmp/hadoop-site)

# mvn clean site; mvn site:stage -DstagingDirectory=/tmp/hadoop-site
Advertisements

15 responses to “Building Apache Hadoop From Source

  1. What about configuration folder. After building with maven could not see conf folder which causes error when tried sbin/start-dfs.sh.

  2. After building with maven, you have to look into hadoop-dist/target/ directory in which you will find hadoop-snapshot tar file, you can extract it which actually contains configuration folder you are talking about.
    at path – etc/hadoop
    and you will find sbin/start-dfs.sh

    • Thanks Pravin, I understand that mapreduce architecture has changed some after v2.0 so that is why conf files are located on different folders. Even after building older version 1.2.1 via ant, it does not create some folders as it is in the original structure e.g dfs folder disappears in the snapshot version

    • Have one more question, after building i get new version. Is it necessary to run same version in a cluster. I just want to update jobtracker code but after running nwe build, getting error about incompatible version issue. Thanks

  3. The build process is failing with Hadoop 2.2.0 on the Auth module with the following error.

    [ERROR] COMPILATION ERROR :
    [INFO] ————————————————————-
    [ERROR] /cids/hadoop/hadoop-2.2.0-src/hadoop-common-project/hadoop-auth/src/test/java/org/apache/hadoop/security/authentication/client/AuthenticatorTestCase.java:[88,11] error: cannot access AbstractLifeCycle
    [ERROR] class file for org.mortbay.component.AbstractLifeCycle not found
    /cids/hadoop/hadoop-2.2.0-src/hadoop-common-project/hadoop-auth/src/test/java/org/apache/hadoop/security/authentication/client/AuthenticatorTestCase.java:[96,29] error: cannot access LifeCycle
    [ERROR] class file for org.mortbay.component.LifeCycle not found
    /cids/hadoop/hadoop-2.2.0-src/hadoop-common-project/hadoop-auth/src/test/java/org/apache/hadoop/security/authentication/client/AuthenticatorTestCase.java:[98,10] error: cannot find symbol
    [ERROR] symbol: method start()

    How can I solve this?

  4. I ran into the same issue today and solved it using the link below :

    http://dougchang333.blogspot.com/2013/11/building-hadoop-cannot-access.html?showComment=1385499779667#c4618040668228639947

    To add However :
    1) There are two builds for Hadoop that can be downloaded . One is the src which is 19 MB and the other is something of 104 MB .
    2) When you download the patch , you would need to apply it to the 104 MB thing and then rebuild and i was not able to do that .
    2) The pom file entry applies to the 19MB src thing . However realize that each of the projects has a POM file .
    3) So i was trying to add this in the root folder pom.xml and nothing was happening .
    4) What finally helped me was that i navigated to the path :
    http://svn.apache.org/viewvc?view=revision&revision=1543190
    This is where the original changes have been added .
    5) and this tells that this entry needs to be made in the ” hadoop-common-project/hadoop-auth/pom.xml ”
    6) With this entry now made in the correct pom.xml , this error goes away .

  5. Hi All,

    I am new to hadoop. I installed cdh3u3 in my centos 6.3 machine. What I have to do if I want to edit some files in the hadoop source code and I want to build my own hadoop jar. Please reply

  6. Hi.. I am facing problem while executing first Word Count on Hadoop2.2

    Getting following errors:
    java.lang.NoClassDefFoundError: org/apache/hadoop/service/CompositeService
    Could not find the main class: org.apache.hadoop.mapreduce.v2.app.MRAppMaster

    Can someone tell me what are the classpath and hadoop variables to be set for this and where do I make changes?

    hadoop classpath shows :
    c:\hadoop\etc\hadoop;c:\hadoop\share\hadoop\common\lib\*;c:\hadoop\share\hadoop\
    common\*;c:\hadoop\share\hadoop\hdfs;c:\hadoop\share\hadoop\hdfs\lib\*;c:\hadoop
    \share\hadoop\hdfs\*;c:\hadoop\share\hadoop\yarn\lib\*;c:\hadoop\share\hadoop\ya
    rn\*;c:\hadoop\share\hadoop\mapreduce\lib\*;c:\hadoop\share\hadoop\mapreduce\*

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s