If you are writing Map Reduce Applications, where you want some files to be shared across all nodes in Hadoop Cluster. It can be simple properties file or can be executable jar file.
Hadoop Map Reduce Project provides us this facility with something called as DistributedCache.
This Distributed Cache is configured with Job Configuration, What it does is, it provides read only data to all machine on the cluster.
Step 1 : Put file to HDFS
# hdfs dfs -put /tmp/file1 /cachefile1
Step 2: Add cachefile in Job Configuration
Configuration conf = new Configuration(); Job job = new Job(conf, "wordcount"); DistributedCache.addCacheFile(new URI("/cachefile1"),job.getConfiguration());
Step 3: Access Cached file
Path cacheFiles = context.getLocalCacheFiles(); FileInputStream fileStream = new FileInputStream(cacheFiles.toString());