Introduction about Pig In Hadoop

Saturday, 10 August 2013

Introduction about Pig In Hadoop

Pig was developed at Yahoo around 2006 for the purpose of reducing the burden of complex mapper and reducer programs in Hadoop.To get idea on Hadoop,please check “An Introduction to Hadoop!”

Pig is just like how we use SQL query for Oracle. In Pig most of the operations are designed to transform the data at one shot. This includes transformations like filtering and joining two or more data sets. Pig's language layer currently consists of a textual language called Pig Latin

Why the name Pig?

Like the animal pig this can eat anything i.e. it can handle any kind of data sets. Hence the name Pig

Pig contains mainly two components

Pig Latin: This is the language used for this platform
Runtime environment: Infrastructure where Pig Latin programs are executed as MapReduce jobs.

There are mainly three steps in Pig Latin script

Load, Transform & Dump

Load: This step is to load the Hadoop data that is stored in form of HDFS(Hadoop Distributed File System)
Transform: To transform the data using set of transformations
Dump: To dump the data to the screen directly or store somewhere in a file.

Pig Latin can be extended using UDF (User Defined Functions) ,using which the user can write in Java, Python and JavaScript and then call directly from the language.

DATAWAREHOUSE CONCEPTS

Pages

Saturday, 10 August 2013

Introduction about Pig In Hadoop

No comments:

Post a Comment

ShareThis