Wednesday, 17 October 2012

An Introduction to Hadoop!

Hadoop is an open source project from Apache that has become into a major technology movement. It has emerged as the best option to handle the Big data, including not only structured data but also complex, unstructured data as well. It has become popular because of its ability to store and process large amounts of data, quickly and cost effectively across clusters of commodity hardware

With Hadoop, organizations are discovering and putting into practice new data analysis and mining techniques that were previously impractical for performance, cost, and technological reasons. 

Advanatage of Hadoop is cost-effective scalability to control hardware. It provides support for the processing of all data types – whether structured, semi-structured or unstructured – and the open extensibility of Hadoop

Apache Hadoop  is actually a collection of several components including the following:

  • MapReduce
  • Hadoop Distributed File System (HDFS)
  • Hive
  • Pig
  • HBase
  • ZooKeeper
  • Ambari
  • HCatalog

No comments:

Post a Comment

Related Posts Plugin for WordPress, Blogger...

ShareThis