Apache Hive is
an open-source data warehouse system based on Hadoop and is used for ad-hoc
querying, data summarization and analyzing large datasets stored in Hadoop
files.
While initially developed by Facebook to analyze their
petabytes of data at Internet, Apache Hive is now used and developed by other
companies .Hive was developed by Facebook to allow their SQL developers to
control the Hadoop platform by writing Hive Query Language (HQL) statements.
Hive QL is a simple language similar to SQL .Hive QL
which converts SQL-like queries into MapReduce jobs executed on Hadoop, also
supports custom MapReduce scripts.
Hive is faster when compared to other queries running on
huge datasets. It can be run from a command line interface or from a Java
Database Connectivity (JDBC) or Open Database Connectivity (ODBC) application.
Note
:
Hive
is not designed for online transaction processing (OLTP), instead it’s meant
only for batch jobs over large sets (eg. web logs). Also Hive is not apt for
applications that need very fast response times.
More About Apache Hive
Hive stores metadata in an RDBMS called Apache Derby database
(comprised of tables, made up of partitions).There are mostly four file formats
supported in Hive, which are,
·
TEXTFILE or flat files
·
SEQUENCE
FILE (flat files consisting of binary key/value pairs)
·
ORC (Optimized Row Columnar )
·
RCFILE (Record Columnar Files which store
columns of a table in a columnar database way)
Note
: Using ORC files increases performance when Hive is reading, writing, and
handling data.
Hive supports Primitive Data Types
Integers
·
TINYINT
·
SMALLINT
·
INT
·
BIGINT
Boolean type
·
BOOLEAN
Floating point numbers
·
FLOAT
·
DOUBLE
String type
·
STRING
In addition, primitive data types can be combined to form
complex data types like
·
STRUCTS
·
MAPS
·
ARRAY
Nice article.
ReplyDeleteThanks
Partha
Thank you Partha ...Also check out our latest post on Unix @ http://dwhlaureate.blogspot.in/2013/12/how-to-do-sorting-in-unix.html
Delete