Pages

Thursday, 26 April 2018

HDFS: Useful Hadoop Admin Commands

HDFS is designed more for batch processing rather than interactive use by users. The emphasis is on high throughput of data access rather than low latency of data access.

Applications that run on HDFS have large data sets. A typical file in HDFS is gigabytes to terabytes in size. Thus, HDFS is tuned to support large files. 

HDFS categorises its data in files and directories.It provides a command line interface called the FS shell that lets the user interact with data in the HDFS and manage your hadoop cluster.
This article provides a quick handy reference to all commonly used hadoop fs commands that can be used to manage files on a Hadoop cluster.The syntax of the commands is similar to bash and csh.

Check more about -HDFSand MapReduce


File transferring from local to HDFS
Hadoop fs –put

File transferring from HDFS to Local
Hadoop fs –get

File transferring within the HDFS
Hadoop fs –cp

List of the File Display inside the HDFS
Hadoop fs –ls

Display the content in the File within HDFS
Hadoop fs –cat

Remove particular File in HDFS
Hadoop fs –rm

Remove File directory in HDFS
Hadoop fs –rmr

Display the File size
Hadoop fs –du <File_name>

Display the directory size
Hadoop fs –dus <directory_name>

Create new directory in HDFS
Hadoop fs –mkdir <directory_name>

Display the whole directory and its content in HDFS
Hadoop fs –lsr <directory_name>

Displays last kilobyte of the File
Hadoop fs –tail <File_name>

Display the overview of HDFS
Hadoop fsck

File transferring from local to HDFS
Hadoop fs -CopyFromLocal

File transferring from HDFS to Local
Hadoop fs –CopyToLocal

Sunday, 18 March 2018

What are the types of PI (Primary Index) in Teradata?

The Teradata Primary index is not an index in the traditional sense, as it is not a lookup table. Instead, it is a mechanism that defines where each data row is physically located on the Teradata system. The primary index of a table may be defined as either a single column or as multiple columns. The values of the primary index columns within the table may be unique or non-unique. 

The Primary Index of a table should not be confused with the primary key of a table.The primary index is a part of the physical database model, and affects the storage and retrieval of data rows. The primary key is a part of the logical database model, and uniquely identifies each record in the table. Often, the primary key of a table is a good candidate for the primary index of a table, particularly for smaller “dimension” or “lookup” tables, but this is not always the case for other tables.

There are two types of Primary Index. Unique Primary Index ( UPI) and Non Unique Primary Index (NUPI). By default, NUPI is created when the table is created. Unique keyword has to be explicitly given when UPI has to be created.

UPI will slower the performance sometimes as for each and every row , uniqueness of the column value has to be checked and it is an additional overhead to the system but the distribution of data will be even. 

We should be careful while choosing a NUPI so that the distribution of data is almost even . UPI/NUPI decision should be taken based on the data and its usage.