Code examples of MapReduce
Let's take the same example with the file sample.txt and code it into a program that will perform MapReduce. For the purpose of this example, the code will be written in Java, but MapReduce programs can be written in any language.
The entire MapReduce program can be fundamentally divided into three parts:
- Mapper Code The Map class extends the class Mapper which is already defined in the MapReduce Framework, and we define the input/output types.
1class Map {
2 map(key, value, context) {
3 const line = value.toString();
4 const tokenizer = line.split(/\s+/);
5 tokenizer.forEach(token => {
6 value.set(token);
7 context.write(value, 1);
8 });
9 }
10}
- Reducer Code Reducer class which extends class Reducer like that of Mapper. We define the data types of input and output key/value pairs and we aggregate the values in each of the list corresponding to each key and produce the final answer.
1class Reduce {
2 reduce(key, values, context) {
3 let sum = 0;
4 values.forEach(x => {
5 sum += x;
6 });
7 context.write(key, sum);
8 }
9}
- Driver Code In the driver class, we set the configuration of our MapReduce job to run in Hadoop. The given code snippet is specific to configuring a MapReduce job in Hadoop using Java. This type of code is typically not directly translated into other languages like JavaScript, Python, C++, or Go, as it is specific to the Hadoop Java API.
However, if you want to run similar MapReduce jobs in other languages, you would typically use a language-specific library or framework designed to interact with Hadoop or another distributed computing system.
1Configuration conf= new Configuration();
2Job job = new Job(conf,"My Word Count Program");
3job.setJarByClass(WordCount.class);
4job.setMapperClass(Map.class);
5job.setReducerClass(Reduce.class);
6job.setOutputKeyClass(Text.class);
7
8job.setOutputValueClass(IntWritable.class);
9job.setInputFormatClass(TextInputFormat.class);
10job.setOutputFormatClass(TextOutputFormat.class);
11Path outputPath = new Path(args[1]);
12
13FileInputFormat.addInputPath(job, new Path(args[0]));
14FileOutputFormat.setOutputPath(job, new Path(args[1]));
The command for running a MapReduce code in Hadoop cmd prompt is:
_hadoop jar hadoop-mapreduce-example.jar WordCount /sample/input /sample/output_
1import java.io.IOException;
2import java.util.StringTokenizer;
3
4import org.apache.hadoop.conf.Configuration;
5import org.apache.hadoop.fs.Path;
6import org.apache.hadoop.io.IntWritable;
7import org.apache.hadoop.io.LongWritable;
8import org.apache.hadoop.io.Text;
9import org.apache.hadoop.mapreduce.Job;
10import org.apache.hadoop.mapreduce.Mapper;
11import org.apache.hadoop.mapreduce.Reducer;
12import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
13import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
14import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
15import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
16
17public class WordCount {
18
19 // Mapper Code
20 public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {
21
22 public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
23 String line = value.toString();
24 StringTokenizer tokenizer = new StringTokenizer(line);
25 while (tokenizer.hasMoreTokens()) {
26 value.set(tokenizer.nextToken());
27 context.write(value, new IntWritable(1));
28 }
29 }
30 }
31
32 // Reducer Code
33 public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {
34
35 public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
36 int sum = 0;
37 for (IntWritable x : values) {
38 sum += x.get();
39 }
40 context.write(key, new IntWritable(sum));
41 }
42 }
43
44 // Driver Code
45 public static void main(String[] args) throws Exception {
46 Configuration conf = new Configuration();
47 Job job = Job.getInstance(conf, "My Word Count Program");
48 job.setJarByClass(WordCount.class);
49 job.setMapperClass(Map.class);
50 job.setReducerClass(Reduce.class);
51 job.setOutputKeyClass(Text.class);
52 job.setOutputValueClass(IntWritable.class);
53 job.setInputFormatClass(TextInputFormat.class);
54 job.setOutputFormatClass(TextOutputFormat.class);
55 Path outputPath = new Path(args[1]);
56 FileInputFormat.addInputPath(job, new Path(args[0]));
57 FileOutputFormat.setOutputPath(job, outputPath);
58 System.exit(job.waitForCompletion(true) ? 0 : 1);
59 }
60}