Wednesday 14 May 2014

Some high-level Hadoop information!

1. It is a good idea to minimize the amount of data transferred between Mapper and Reducer in keeping with the Bandwidth constraints of the network. Thus, a combiner function is employed at the mapper to combine or aggregate mapper output before it is passed on to reducer.


2. Job profiling and tuning a job: make a Hadoop job run faster
3. Job tracker and Task tracker - > Job tracker assigns  tasks to one or more task trackers that actually run the job on its own split.


4. Hadoop Streaming: An interface provided by Hadoop framework that enables programmers to write MapReduce code in any language that supports standard streams. The technique that underpins this ease/convenience is Hadoop Streaming.


5. What is pig? A high level scripting language to write Hadoop programs faster and easily. It automatically maps the problem in MapReduce mode, saving developer efforts.

No comments:

Post a Comment