Hadoop
is an open source project from Apache that has become into a major technology
movement. It has emerged as the best option to handle the Big data, including
not only structured data but also complex, unstructured data as well. It has
become popular because of its ability to store and process large amounts of
data, quickly and cost effectively across clusters of commodity hardware
With Hadoop, organizations are discovering
and putting into practice new data analysis and mining techniques that were
previously impractical for performance, cost, and technological reasons.
Advanatage of Hadoop is cost-effective
scalability to control hardware. It provides support for the processing of all
data types – whether structured, semi-structured or unstructured – and the open
extensibility of Hadoop
Apache
Hadoop is actually a collection of
several components including the following:
- MapReduce
- Hadoop Distributed File System (HDFS)
- Hive
- Pig
- HBase
- ZooKeeper
- Ambari
- HCatalog
No comments:
Post a Comment