2013

Tuesday 17 December 2013

Types of Cloud Computing

The different types of Cloud Computing are,

AUTOCOMMIT in SQL- Simple and Useful

DML Statements or Data Manipulation Language statements will not be committed unless they are committed explicitly, But if you use any data definition language (DDL) statement before or after , then Oracle Database issues an implicit COMMIT.

There is a provision in Oracle to commit every DML Transaction automatically once its run. This is called AUTOCOMMIT. We can also turn AUTOCOMMIT on and off based on our requirement.

Below are commands to check for AUTOCOMMIT.

How to do Sorting in UNIX???

UNIX sort allows us to sort data in numeric, alphabetic and reverse order whether it’s of column or multiple columns with delimiters. Sort order can be restricted to individual columns and then merged to one single file. Sorting becomes handy in many situations.

Syntax

sort [options] [files...]

Sort Order Commands

Sort -o	Filename to write sorted output to file, if not given then write to standard output.
Sort –d	Sort in dictionary order
Sort –n	Sort in numeric order(smallest first)
Sort -r	Sort in reverse order (z on top and a on bottom)
Sort –nr	Sort in reverse numeric order
Sort +1	Sort starting with field 1 (starting from 0)
Sort +0.50	Sort starting with 50th character
Sort +1.5	Sort starting with 5th character of field 1
Sort -c	To see if a file is sorted (error message returned if not)
Sort -u	Suppress duplicate lines and output Identical input lines once.
Sort -M	Sort by month (JAN.FEB, MAR) Note that non-months are sorted first and uppercase letters precede lowercase letters for identical months.
Sort -k 1	Sort starting with the first character of the first field and continuing through the end of the line.

Below we have explained some examples which will make it easy to understand and perform sorting.

1. Simple Sort .i.e. sort in alphabetical order starting from the first character

Example 1

$ cat sort_number.txt

100

232

44444

$ sort sort_number.txt

100

232

44444

Note: The above records are sorted alphabetically, not mathematically sorted.Same with below

Example 2

$ cat myfile

Saina

Eesto

Jaffer

Michael

$ sort myfile

Eesto

Jaffer

Michael

Saina

Zookeeper (Apache Hadoop project)

The Apache Hadoop project is a collection of many sub projects and ZooKeeper(now a top-level project in its own) is one among them and is noticeable for its wide applicability for building distributed systems.

ZooKeeper is a distributed, open-source coordination service for distributed applications. Very large Hadoop clusters can be maintained by multiple ZooKeeper servers as it ensures the availability by never-ending services.In Hadoop project it is used to manage master election and store other process metadata.

For example

In Hadoop we do have many types’ nodes, master and multiple worker nodes. If by any chance the master node fails then role of master node has to be transferred to different node. This is done by zookeeper as it takes care of clusters by assigning tasks to new master node

Apache Hive & Hive Query Language

Apache Hive is an open-source data warehouse system based on Hadoop and is used for ad-hoc querying, data summarization and analyzing large datasets stored in Hadoop files.

While initially developed by Facebook to analyze their petabytes of data at Internet, Apache Hive is now used and developed by other companies .Hive was developed by Facebook to allow their SQL developers to control the Hadoop platform by writing Hive Query Language (HQL) statements.

Hive QL is a simple language similar to SQL .Hive QL which converts SQL-like queries into MapReduce jobs executed on Hadoop, also supports custom MapReduce scripts.

Hive is faster when compared to other queries running on huge datasets. It can be run from a command line interface or from a Java Database Connectivity (JDBC) or Open Database Connectivity (ODBC) application.

This post is just an Introduction for beginners in Informatica to know more about Informatica sessions and its properties.

A session is an instance of Mapping Program or in other words a running instance of a mapping is referred as Session. For one Mapping Program we can create one or more Sessions. Generally we require one session for one mapping but for Parallel data loading we may create multiple sessions.

Now we will go through the following tabs in the session.

· General Tab

· Properties Tab

· Components Tab (Pre-session and Post-session Command)

· Mapping Tab & Config Object

Sunday 10 November 2013

Informatica Session Components Tab: Pre-session and Post-session Command

Pre-Session Command:

We can define Operating System Commands or programs to be executed before the data loading process (Session Process) starts. Operating System commands or programs can be defined as Reusable Commands (defined in the form of Command Tasks) or Non-reusable commands (defined with this property directly). Some of the uses of commands or programs can be:

To enable/disable database users before data loading
Make backup/copy of target tables so that old data can be restored in case of data loading fails
To intimate users via Emails about the Data Loading success so that they can start analysis

Source Connection

Define source data connection for each source qualifiers. If the base tables for source qualifiers are from different databases then we should define any of the databases as connection database for all source qualifiers. We need to grant SELECT privilege on all the tables from different databases to the Connection Database. This task is done at Database level.

Target Connection

Select Relational Writer if the target is a Table else select File Writer if the target is a Text File i.e. Flat File .If File Writer is selected then use 'Set file properties' button to define file structure like: Delimiter, Text Qualifier, etc and to give extra character instead of nulls in flat files.

Properties Page

If Relational Writer is selected then set the following properties:

Target load type: Normal or Bulk
Insert, if this option is Unchecked then target table can not receive New Records.
Delete, if this option is Unchecked then target table does not allow deletion of records.

Informatica Session:Properties Tab

Below are the properties defined under "Informatica Session Properties tab".

Enter session log file name (any name)

Define session log file directory using $PMSessionLogDir system variable OR define the complete folder path (/runbatch/session/test_session.log).

Below are some of the session properties listed under General tab

Rename to define the session name

Enable option 'Fail parent if this task fails' if you want to fail the parent task also in case this session fails. For example:First session is to load data for Employee table and second session is to load for Department table. Employee session is Parent and Department session is Child (Sequential Link), if Department session fails to load data then we should fail Employee session also since an Employee without Department is not possible.

What is HADOOP ( HDFS and MapReduce)

HADOOP is a software framework that was inspired by Google's Map Reduce and Google File System and now is considered as best solution which can deal with BigData.

When we talk about Big data, it can be anything in the form of picture, movie etc ...and consumes huge amount of space

In Hadoop the storage is provided by HDFS-it provides good way of storage to prevent loss of data in case of failure, and analysis by Map Reduce(data processing) using its own adhoc analysis and runs the query against a huge data and shows the result in a reasonable amount of time.

ADVANTAGES OF CLOUD COMPUTING

Cloud computing offers numerous Advantages .Few of them are given below:

Easy access to the information with lower initial investment since we don’t have to spend much on the hardware, software or licensing fees and more throughput since everyone(multiple staff can access, share folders and files) can access the cloud through internet on standard web browsers anytime and can login from anywhere provided they have the required bandwidth.

Instead of installing software on our computer directly cloud allows to access files and soft wares through internet. Both small and big companies can benefit from the cloud approach.

Features of Mongo DB

Mongo DB is one of newest database introduced by 10gen .Mongo DB is an open source, on the whole a document oriented Database system and is a part of NoSQL family of database. Despite the fact that it’s not a relational database it has some of the imperative features of RDBMS and has got implausible speed. This DB is used in Projects like Unique Identification Authority of India (UIDAI) , MTV networks and many others

Instead of storing data in tables here rows are replaced by Documents (basic unit of Data in Mongo DB just like a ROW in RDMS) and Collections (collection is a group of documents.) which allow representing complex relationships. It can manage huge amount of data and can load data across a cluster. Mongo DB can perform some features which relational database cannot do.

Below are some of the Features of Mongo DB:

Mongo DB supports Map reduce and Aggregation Tools
Java Scripts are used instead of Procedures
Mongo DB is a schema less Database
Most Importantly Mongo DB supports secondary indexes and geospatial indexes.
Simple to Administer the Mongo DB in cases of failures
Mongo DB designed to provide High Performance
MongoDB stores files of any size without complicating your stack.

If you like this post, please share it on google by clicking on the Google +1 button.

Please go through our latest post TOP 6 BIG DATA TRENDS IN THE NEAR FUTURE

Thursday 10 October 2013

Informatica Scenarios:Pivoting of records(Pivoting of Employees by Department)

In this scenario we will discuss about how to pivot a set of records based on column:

Source Records:

Dept_id	Emp_name
10	CLARK
10	KING
10	MILLER
20	JONES
20	FORD
20	ADAMS
20	SMITH
20	SCOTT
30	WARD
30	TURNER
30	ALLEN
30	BLAKE
30	MARTIN
30	JAMES

Expected Output

DEPT_ID	EMP_NAME
10	CLARK\|KING\|MILLER
20	JONES\|FORD\|ADAMS\|SMITH\|SCOTT
30	WARD\|TURNER\|ALLEN\|JAMES\|BLAKE\|MARTIN

Persistent And Transient Objects( SQL/PLSQL )

There are two different types of objects namely

1. Persistent and 2. Transient.

Persistent Objects: are those that are stored in the database [Objects created using abstract data types varrays, nested tables etc.]. These can be used both with SQL commands and also in PL/SQL blocks. These reside in the data dictionary. Persistent objects are available to the user until they are deleted explicitly. They can be implemented as tables, columns or attributes.

Transient object exists only within the scope of the PL/SQL block. These get automatically de-allocated once they go out of the scope of the PL/SQL block. Examples of transient objects are PL/SQL variables.

IN PL/SQL:

create or replace type Type_adds as object

(

Country varchar2(40),

phase_no varchar2(100),

postal varchar2(50),

city varchar2(200),

state varchar2(150),

pin number(10)

);

Type created.

USAGE OF PARAMETERIZED CURSOR (Oracle SQL/PLSQL)

Parameter is passed in cursor in the same way as how a procedure is passed a parameter except that the parameter can only be IN mode

The general form of a cursor with parameters is,

cursor cursor_name [(parameter_name datatype, ……..)]

is select_statement;

Pass parameter values to a cursor when the cursor is opened and the query is executed.

Open an explicit cursor several times with a different active set each time.

open cursor_name(parameter_value,……..);

PASSING PARAMETERS USING DYNAMIC VARIABLES

The following example displays the usage of parameterized cursor, which asks for the value from the user at runtime and passes the parameter value.

SQL>

declare

cursor c1 (dd number) is select * from emp where deptno =dd;

y number;

begin

for x in c1(&y)

loop

insert into PROD values(x.empno,x.ename);

end loop;

end;

Enter value for y: 10

old 5: for x in c1(&y)

new 5: for x in c1(10)

PL/SQL procedure successfully completed.

SQL> select * from PROD;

EMPNO ENAME

--------- ----------

7782 CLARK

7839 KING

7934 MILLER

SQL> truncate table PROD;

Table truncated.

USE THE CURSOR ALSO FOR A “COMPUTED FIELDS” (ORACLE SQL/PLSQL)

The following example displays the use of computed fields to define a cursor.

SQL> declare

y number :=100;

cursor c1 is select sal+y salary from emp;

begin

for x in c1

loop

insert into PROD values(x.salary,'in for');

end loop;

end;

PL/SQL procedure successfully completed.

SQL> select * from PROD;

EMPNO ENAME

--------- ----------

900 in for

1700 in for

1350 in for

3075 in for

1350 in for

2950 in for

2550 in for

DATAWAREHOUSE CONCEPTS

Pages

Tuesday 17 December 2013

Types of Cloud Computing

Thursday 12 December 2013

AUTOCOMMIT in SQL- Simple and Useful

Sunday 8 December 2013

How to do Sorting in UNIX???

Tuesday 26 November 2013

Zookeeper (Apache Hadoop project)

Saturday 16 November 2013

Apache Hive & Hive Query Language

All about Informatica Session Properties

Sunday 10 November 2013

Informatica Session Components Tab: Pre-session and Post-session Command

Informatica Session: Sources and Targets Properties and Connection

Informatica Session:Properties Tab

Informatica Session Properties : General Tab

Thursday 31 October 2013

What is HADOOP ( HDFS and MapReduce)

Saturday 19 October 2013

ADVANTAGES OF CLOUD COMPUTING

Tuesday 15 October 2013

Features of Mongo DB

Thursday 10 October 2013

Informatica Scenarios:Pivoting of records(Pivoting of Employees by Department)

Thursday 26 September 2013

Persistent And Transient Objects( SQL/PLSQL )

Wednesday 18 September 2013

USAGE OF PARAMETERIZED CURSOR (Oracle SQL/PLSQL)

ShareThis