Saturday 10 August 2013

Introduction about Pig In Hadoop

Pig was developed at Yahoo around 2006 for the purpose of reducing the burden of complex mapper and reducer programs in Hadoop.To get idea on Hadoop,please check  “An Introduction to Hadoop!
Pig is just like how we use SQL query for Oracle. In Pig most of the operations are designed to transform the data at one shot. This includes transformations like filtering and joining two or more data sets. Pig's language layer currently consists of a textual language called Pig Latin

Why the name Pig?
Like the animal pig this can eat anything i.e. it can handle any kind of data sets. Hence the name Pig

Pig contains mainly two components
  • Pig Latin: This is the language used for this platform
  • Runtime environment: Infrastructure where Pig Latin programs are executed as MapReduce jobs.
 There are mainly three steps in Pig Latin script
Load, Transform & Dump
  • Load: This step is to load the Hadoop data that is stored in form of HDFS(Hadoop Distributed File System)
  • Transform: To transform the data using set of transformations
  • Dump: To dump the data to the screen directly or store somewhere in a file.
 Pig Latin can be extended using UDF (User Defined Functions) ,using which the user can write in Java, Python and JavaScript and then call directly from the language. 

No comments:

Post a Comment

Related Posts Plugin for WordPress, Blogger...

ShareThis