Pig was developed at Yahoo around 2006 for the purpose of reducing
the burden of complex mapper and reducer programs in Hadoop.To get idea on Hadoop,please check “An Introduction to Hadoop!”
Pig is just like how we use SQL query for Oracle. In Pig most of
the operations are designed to transform the data at one shot. This includes
transformations like filtering and joining two or more data sets. Pig's language layer
currently consists of a textual language called Pig Latin
Why the name Pig?
Like the animal pig this can eat anything i.e. it can handle any
kind of data sets. Hence the name Pig
Pig contains mainly two components
- Pig Latin: This is the language
used for this platform
- Runtime environment:
Infrastructure where Pig Latin programs are executed as MapReduce jobs.
There are mainly three steps in Pig Latin script
Load, Transform & Dump
- Load: This step is to load
the Hadoop data that is stored in form of HDFS(Hadoop Distributed File
System)
- Transform: To transform the
data using set of transformations
- Dump: To dump the data to the
screen directly or store somewhere in a file.
Pig Latin can be extended using UDF (User Defined Functions)
,using which the user can write in Java, Python and JavaScript and then call directly
from the language.
No comments:
Post a Comment