HADOOP is a software framework that was inspired by Google's Map Reduce
and Google File System and now is considered as best solution which can deal
with BigData.
When we talk about Big data, it can be anything in the form of picture, movie etc ...and consumes huge amount of space
When we go more detailed into the way Map Reduce job works then we can see that Hadoop run the job in terms of map tasks. The job is split into many pieces which we call as splits and one map task is assigned for each split. The size of the split is important in execution time for reaching the output. Ideally the split size should be the size of a HDFS block.
When we talk about Big data, it can be anything in the form of picture, movie etc ...and consumes huge amount of space
In Hadoop the storage
is provided by HDFS-it provides good way of storage to prevent loss of
data in case of failure, and analysis by Map Reduce(data processing)
using its own adhoc analysis and runs the query against a huge data and shows
the result in a reasonable amount of time.
HDFS and MapReduce are the key points in Hadoop.
MapReduce primarily works well on unstructured data and Semi-structured data
for example the web log file. These data are not organized as in relational
tables like oracle tables. And map reduces find easy to process these data
sets. Some of the higher level languages built on map reduce are Pig and Hive.
Map Reduce consist of two
functions mainly a map function and a
reduce function. It works on huge datasets and returns desired results.A
query which looks complicated can be expressed using MapReduce in the form of MapReduce job.
First step here is passing
the input data. As mentioned, Map Reduce will have two phases map phase (Map
function) and the reduce phase (Reduce function).The input data will be passed
on to the Map phase. Let’s take the example of unstructured data. Map function
will process the input data and take the required fields from the input and
pass to the reduce phase. This removes lot of unwanted records.
Output of the map function
will be passed on to Map Reduce phase. The reduce function will then further
process the data and extract the output from the mapped data based on the logic
of the job.
When we go more detailed into the way Map Reduce job works then we can see that Hadoop run the job in terms of map tasks. The job is split into many pieces which we call as splits and one map task is assigned for each split. The size of the split is important in execution time for reaching the output. Ideally the split size should be the size of a HDFS block.
Below are some other key
terms:
- Data locality Optimization
- Combiner function
No comments:
Post a Comment