Thursday, May 31, 2012

Things to remember : In Map Reduce


Q 1. What is IdentityMapper?
A - An empty Mapper which directly writes key/value to the output.
         Mapper<K,V,K,V>
Q 2. What is InverseMapper?
A - A Mapper which swaps the <Key,Value> to <Value,Key>.
         Mapper<K,V,V,K>
Q 3. What is IdentityReducer?
A - It performs no reduction, directly writes key/value to the output.
         Reducer<K,V,K,V>
Q 4. What is Partitioner?
A - It runs after completion of Map Jobs. A custom Partitioner can be implemented to decide which key/value should go to which Reducer.
In Map-Reduce model, unique key 'K' with all its Iterable<V> should go to same Reducer.
Q 5. What are the uses of Combiner?
A - It helps in performing local aggregation on Map jobs output to reduce the ammount of data sent to any Reducer.
Q 6. Where Map outputs are stored?
A - Intermediate or Grouped Map output are stored in Sequence File(can be gzipped) on HDFS cluster.
Q 7. How to set number of mapper & reducer?
A - JobConf class object is used to set number of mapper and reducer.
JobConf is present in package org.apache.hadoop.mapred and extends org.apache.hadoop.conf.Configuration
public void setNumMapTasks(int n);// sets number of mapper Job
public void setNumReduceTasks(int n);// sets number of reducer Job
Q 8. What is ChainMapper?
A - It allows to use multiple Mapper class in single Map task.
Output of one mapper is passed to another mapper and so on.
Each Mappper get executed in chain.
Q 9. What is RegexMapper?
A - A Mapper that extracts text matching a regular expression.

Wednesday, May 23, 2012

Things to remember : In Core JAVA

Q 1. Can you tell, which Algorithm is used by HashMap/HashTable?
A - HashMap internally uses bucket to store key-value pair. When a key is passed to HashMap, it is not used as 'key' as it is! It gets converted to another HashKey using HashCode(). When same HashKey is generated for multiple key(s) (ie: Collision in HashMap/HashTable), It(another key-value pair) get stored in same bucket as next item( Each bucket is a Linked List, It can contain multiple key-value pair).
HashMap can take a 'initial Capacity' & 'load Factor' in its constructor. 
initial Capacity : number of bucket get created at the time of initialization. 
load Factor : number of buckets get increased when Items cross this load factor.
HashTable is a synchronized version of HashMap. But HashMap gives performance bonus as object is not accessed by multiple Threads. 
Q 2. Name some way of Inter Process Communication(IPC)?
A - These are :
  1. Socket
  2. Message Queue
  3. Pipe
  4. Signal
  5. File
  6. Remote Method Invocation (RMI)
  7. Shared Memory
  8. SOAP, REST, Thrift, XML, JSON
Q 3. What is Mutual Exclusion?
A - Mutual Exclusion in OS (Mutex) is a collection of techniques/algorithms for sharing resources so that concurrent uses do not conflict and cause unwanted interactions. One of the most commonly used techniques for mutual exclusion is the semaphore.
Q 4. What are Abstraction and Encapsulation?
A - Abstraction : Hiding away unimportant details of an object, focuses on outside view.
      Encapsulation : It is defined as the process of wrapping up the data members and member functions together into a single unit.