Thursday, May 31, 2012

Things to remember : In Map Reduce


Q 1. What is IdentityMapper?
A - An empty Mapper which directly writes key/value to the output.
         Mapper<K,V,K,V>
Q 2. What is InverseMapper?
A - A Mapper which swaps the <Key,Value> to <Value,Key>.
         Mapper<K,V,V,K>
Q 3. What is IdentityReducer?
A - It performs no reduction, directly writes key/value to the output.
         Reducer<K,V,K,V>
Q 4. What is Partitioner?
A - It runs after completion of Map Jobs. A custom Partitioner can be implemented to decide which key/value should go to which Reducer.
In Map-Reduce model, unique key 'K' with all its Iterable<V> should go to same Reducer.
Q 5. What are the uses of Combiner?
A - It helps in performing local aggregation on Map jobs output to reduce the ammount of data sent to any Reducer.
Q 6. Where Map outputs are stored?
A - Intermediate or Grouped Map output are stored in Sequence File(can be gzipped) on HDFS cluster.
Q 7. How to set number of mapper & reducer?
A - JobConf class object is used to set number of mapper and reducer.
JobConf is present in package org.apache.hadoop.mapred and extends org.apache.hadoop.conf.Configuration
public void setNumMapTasks(int n);// sets number of mapper Job
public void setNumReduceTasks(int n);// sets number of reducer Job
Q 8. What is ChainMapper?
A - It allows to use multiple Mapper class in single Map task.
Output of one mapper is passed to another mapper and so on.
Each Mappper get executed in chain.
Q 9. What is RegexMapper?
A - A Mapper that extracts text matching a regular expression.

No comments:

Post a Comment