Apache Hive & Hive Query Language

Saturday, 16 November 2013

Apache Hive & Hive Query Language

Apache Hive is an open-source data warehouse system based on Hadoop and is used for ad-hoc querying, data summarization and analyzing large datasets stored in Hadoop files.

While initially developed by Facebook to analyze their petabytes of data at Internet, Apache Hive is now used and developed by other companies .Hive was developed by Facebook to allow their SQL developers to control the Hadoop platform by writing Hive Query Language (HQL) statements.

Hive QL is a simple language similar to SQL .Hive QL which converts SQL-like queries into MapReduce jobs executed on Hadoop, also supports custom MapReduce scripts.

Hive is faster when compared to other queries running on huge datasets. It can be run from a command line interface or from a Java Database Connectivity (JDBC) or Open Database Connectivity (ODBC) application.

Note :

Hive is not designed for online transaction processing (OLTP), instead it’s meant only for batch jobs over large sets (eg. web logs). Also Hive is not apt for applications that need very fast response times.

More About Apache Hive

Hive stores metadata in an RDBMS called Apache Derby database (comprised of tables, made up of partitions).There are mostly four file formats supported in Hive, which are,

· TEXTFILE or flat files

· SEQUENCE FILE (flat files consisting of binary key/value pairs)

· ORC (Optimized Row Columnar )

· RCFILE (Record Columnar Files which store columns of a table in a columnar database way)

Note : Using ORC files increases performance when Hive is reading, writing, and handling data.

Hive supports Primitive Data Types

Integers

· TINYINT

· SMALLINT

· INT

· BIGINT

Boolean type

· BOOLEAN

Floating point numbers

· FLOAT

· DOUBLE

String type

· STRING

In addition, primitive data types can be combined to form complex data types like

DATAWAREHOUSE CONCEPTS

Pages

Saturday, 16 November 2013

Apache Hive & Hive Query Language

2 comments:

ShareThis