Pages

Thursday, 12 December 2013

AUTOCOMMIT in SQL- Simple and Useful

DML Statements or Data Manipulation Language statements will not be committed unless they are committed explicitly, But if you use any data definition language (DDL) statement before or after , then Oracle Database issues an implicit COMMIT.

There is a provision in Oracle to commit every DML Transaction automatically once its run. This is called AUTOCOMMIT. We can also turn AUTOCOMMIT on and off based on our requirement.

Below are commands to check for AUTOCOMMIT.

Sunday, 8 December 2013

How to do Sorting in UNIX???

UNIX sort allows us to sort data in numeric, alphabetic and reverse order whether it’s of column or multiple columns with delimiters. Sort order can be restricted to individual columns and then merged to one single file. Sorting becomes handy in many situations.

Syntax
sort [options] [files...]

Sort Order Commands
Sort -o
Filename to write sorted output to file,
if not given then write to standard output.
Sort –d
Sort in dictionary order
Sort –n
Sort in numeric order(smallest first)
Sort -r
Sort in reverse order (z on top and a on bottom)
Sort –nr
Sort in  reverse numeric order
Sort +1
Sort starting with field 1 (starting from 0)
Sort +0.50
Sort starting with 50th character
Sort +1.5
Sort starting with 5th character of field 1
Sort -c
To see if a file is sorted (error message returned if not)
Sort -u
Suppress duplicate lines and output Identical input lines once.
Sort -M
Sort by month (JAN.FEB, MAR) Note that non-months are
sorted first and uppercase letters precede lowercase
letters for identical months.
Sort -k 1
Sort starting with the first character of the first field
and continuing through the end of the line.

Below we have explained some examples which will make it easy to understand and perform sorting.

1. Simple Sort  .i.e. sort in alphabetical order starting from the first character

Example 1
$ cat sort_number.txt
100
232
3
1
44444
99

$ sort sort_number.txt
1
100
232
3
44444
99

Note: The above records are sorted alphabetically, not mathematically sorted.Same with below

Example 2
$ cat myfile
Saina
Eesto
Jaffer
Michael

$ sort myfile
Eesto
Jaffer
Michael
Saina

Tuesday, 26 November 2013

Zookeeper (Apache Hadoop project)

The Apache Hadoop project is a collection of many sub projects and ZooKeeper(now a top-level project in its own) is one among them and is noticeable for its wide applicability for building distributed systems.

ZooKeeper is a distributed, open-source coordination service for distributed applications. Very large Hadoop clusters can be maintained by multiple ZooKeeper servers as it ensures the availability by never-ending services.In Hadoop project it is used to manage master election and store other process metadata.

For example
In Hadoop we do have many types’ nodes, master and multiple worker nodes. If by any chance the master node fails then role of master node has to be transferred to different node. This is done by zookeeper as it takes care of clusters by assigning tasks to new master node


Saturday, 16 November 2013

Apache Hive & Hive Query Language

Apache Hive is an open-source data warehouse system based on Hadoop and is used for ad-hoc querying, data summarization and analyzing large datasets stored in Hadoop files.

While initially developed by Facebook to analyze their petabytes of data at Internet, Apache Hive is now used and developed by other companies .Hive was developed by Facebook to allow their SQL developers to control the Hadoop platform by writing Hive Query Language (HQL) statements.

Hive QL is a simple language similar to SQL .Hive QL which converts SQL-like queries into MapReduce jobs executed on Hadoop, also supports custom MapReduce scripts.

Hive is faster when compared to other queries running on huge datasets. It can be run from a command line interface or from a Java Database Connectivity (JDBC) or Open Database Connectivity (ODBC) application.

All about Informatica Session Properties

This post is just an Introduction for beginners in Informatica to know more about Informatica sessions and its properties.

A session is an instance of Mapping Program or in other words a running instance of a mapping is referred as Session. For one Mapping Program we can create one or more Sessions. Generally we require one session for one mapping but for Parallel data loading we may create multiple sessions.
Now we will go through the following tabs in the session.

·        General Tab
·        Properties Tab
·        Mapping Tab & Config Object

Sunday, 10 November 2013

Informatica Session Components Tab: Pre-session and Post-session Command

Pre-Session Command:
We can define Operating System Commands or  programs to be executed before the data loading process (Session Process) starts. Operating System commands or programs can be defined as Reusable Commands (defined in the form of   Command Tasks) or Non-reusable commands (defined with this property directly). Some of the uses of commands or programs can be:

  • To enable/disable database users before data loading
  • Make backup/copy of target tables so that old data can be restored in case of data loading fails
  • To intimate users via Emails about the Data Loading success so that they can start analysis

Informatica Session: Sources and Targets Properties and Connection


Source Connection
Define source data connection for each source qualifiers. If the base tables for source qualifiers are from different databases then we should define any of the databases as connection database for all source qualifiers. We need to grant SELECT privilege on all the tables from different databases to the Connection Database. This task is done at Database level.

Target Connection
Select Relational Writer if the target is a Table else select File Writer if the target is a Text  File i.e. Flat File .If File Writer is selected then use 'Set file properties' button to define file structure like: Delimiter, Text Qualifier, etc and to give extra character instead of nulls in flat files.

Properties Page
If Relational Writer is selected then set the following properties:
  • Target load type: Normal or Bulk
  • Insert, if this option is Unchecked then target table can not receive New Records.
  • Delete, if this option is Unchecked then target table does not allow deletion of records.

Informatica Session:Properties Tab

Below are the properties defined under "Informatica Session Properties tab".

  • Enter session log file name (any name) 
  • Define session log file directory using $PMSessionLogDir system variable OR define the complete folder path (/runbatch/session/test_session.log). 

Informatica Session Properties : General Tab

Below are some of the session properties listed under General tab

  • Rename to define the session name
  • Enable option 'Fail parent if this task fails' if you want to fail the parent task also in case this session fails. For example:First session is to load data for Employee table  and second session is to load for Department  table. Employee session is Parent and Department  session  is Child (Sequential Link), if Department  session fails to load data then we should fail Employee session also since an Employee without Department  is not possible.

Thursday, 31 October 2013

What is HADOOP ( HDFS and MapReduce)

HADOOP is a software framework that was inspired by Google's Map Reduce and Google File System and now is considered as best solution which can deal with BigData.

When we talk about Big data, it can be anything in the form of picture, movie etc ...and consumes huge amount of space

In Hadoop the storage is provided by HDFS-it provides good way of storage to prevent loss of data in case of failure, and analysis by Map Reduce(data processing) using its own adhoc analysis and runs the query against a huge data and shows the result in a reasonable amount of time.

Saturday, 19 October 2013

ADVANTAGES OF CLOUD COMPUTING

Cloud computing offers numerous Advantages .Few of them are given below:

  • Easy access to the information with lower initial investment  since we don’t have to spend much on the hardware, software or licensing fees  and more throughput since everyone(multiple staff can access, share folders and files) can access the cloud through internet  on standard web browsers anytime and can login from anywhere provided they have the required bandwidth.

  • Instead of installing software on our computer directly cloud allows to access files and soft wares through internet. Both small and big companies can benefit from the cloud approach.
 

Tuesday, 15 October 2013

Features of Mongo DB

Mongo DB is one of newest database introduced by 10gen .Mongo DB is an open source, on the whole a document oriented Database system and is a part of NoSQL family of database. Despite the fact that it’s not a relational database it has some of the imperative features of RDBMS and has got implausible speed. This DB is used in Projects like Unique Identification Authority of India (UIDAI) , MTV networks and many others

Instead of storing data in tables here rows are replaced by Documents (basic unit of Data in Mongo DB just like a ROW in RDMS) and Collections (collection is a group of documents.) which allow representing complex relationships. It can manage huge amount of data and can load data across a cluster. Mongo DB can perform some features which relational database cannot do.

Below are some of the Features of Mongo DB:
  • Mongo DB supports Map reduce and Aggregation Tools
  • Java Scripts are used instead of Procedures
  • Mongo DB is a schema less Database
  • Most Importantly Mongo DB supports secondary indexes and geospatial indexes.
  • Simple to Administer the Mongo DB in cases of failures
  • Mongo DB designed to provide High Performance
  • MongoDB stores files of any size without complicating your stack.

If you like this post, please share it on google by clicking on the Google +1 button.


Please go through our latest post TOP 6 BIG DATA TRENDS IN THE NEAR FUTURE

Thursday, 10 October 2013

Informatica Scenarios:Pivoting of records(Pivoting of Employees by Department)

In this scenario we will discuss about how to pivot a set of records based on column:
Source Records: 
Dept_id
Emp_name
10
CLARK
10
KING
10
MILLER
20
JONES
20
FORD
20
ADAMS
20
SMITH
20
SCOTT
30
WARD
30
TURNER
30
ALLEN
30
BLAKE
30
MARTIN
30
JAMES

Expected Output
DEPT_ID
EMP_NAME
10
CLARK|KING|MILLER
20
JONES|FORD|ADAMS|SMITH|SCOTT
30
WARD|TURNER|ALLEN|JAMES|BLAKE|MARTIN

Thursday, 26 September 2013

Persistent And Transient Objects( SQL/PLSQL )

There are two different types of objects namely
1. Persistent and    2. Transient.

Persistent Objects: are those that are stored in the database [Objects created using abstract data types varrays, nested tables etc.].   These can be used both with SQL commands and also in PL/SQL blocks.  These reside in the data dictionary.  Persistent objects are available to the user until they are deleted explicitly.  They can be implemented as tables, columns or attributes.  

Transient object exists only within the scope of the PL/SQL block. These get automatically de-allocated once they go out of the scope of the PL/SQL block.  Examples of transient objects are PL/SQL variables. 


IN PL/SQL:
create or replace type Type_adds as object
(
Country varchar2(40),
phase_no varchar2(100),
postal varchar2(50),
city varchar2(200),
state varchar2(150),
pin number(10)
);
Type created.

Wednesday, 18 September 2013

USAGE OF PARAMETERIZED CURSOR (Oracle SQL/PLSQL)

Parameter is passed in cursor in the same way as how a procedure is passed a parameter except that the parameter can only be IN mode
  


The general form of a cursor with parameters is,

cursor cursor_name [(parameter_name datatype, ……..)]
is select_statement;
Pass parameter values to a cursor when the cursor is opened and the query is executed.
Open an explicit cursor several times with a different active set each time.
open cursor_name(parameter_value,……..);


  • PASSING PARAMETERS USING DYNAMIC VARIABLES

The following example displays the usage of parameterized cursor, which asks for the value from the user at runtime and passes the parameter value. 
SQL>
declare
cursor c1 (dd number) is select * from emp where deptno =dd;
y number;
begin
    for x in c1(&y)
    loop
    insert into PROD values(x.empno,x.ename);
    end loop;
    end;
/
Enter value for y: 10
old   5: for x in c1(&y)
new   5: for x in c1(10)

PL/SQL procedure successfully completed.
SQL> select * from PROD;
   EMPNO ENAME
--------- ----------
     7782 CLARK
     7839 KING
     7934 MILLER

SQL> truncate table PROD;
Table truncated.



  • USE THE CURSOR ALSO FOR A “COMPUTED FIELDS” (ORACLE SQL/PLSQL)

The following example displays the use of computed fields to define a cursor. 

SQL> declare
    y number :=100;
    cursor c1 is select sal+y salary from emp;
    begin
    for x in c1
    loop
    insert into PROD values(x.salary,'in for');
    end loop;
    end;
   /
PL/SQL procedure successfully completed.

SQL> select * from PROD;
   EMPNO ENAME
--------- ----------
      900 in for
     1700 in for
     1350 in for
     3075 in for
     1350 in for
     2950 in for
     2550 in for