Pages

Thursday, 28 February 2013

Difference In Handling of Query by Informatica and Database

Below are the few points to be noted while handling data through Informatica and Database.

Treatment of Nulls:
Oracle will treat null values has highest value while in sort order. In Informatica server at session level we can give nulls as the lowest value in the sort order
  
Case sensitivity:
Integration service uses case sensitive queries, which won’t give exact results if the case is different. Instead database like MSQL server is not case sensitive and results exact match irrespective of the case.

Datatype conversion:
Both Informatica and database converts the datatypes into different formats. For example in case of numeric to character conversion Informatica will convert the numeric 123 as character '123', but in database if you take a long numeric value its stored in different format.

Precision:
Informatica will have datatypes precision that can vary compared to the precision in data types in Database.

Pipeline Partitioning in Informatica

Pipeline Partitioning a mapping in Informatica actually means to Partition the pipeline inside the mapping. Hope you all are aware of the various Partitions available in Oracle, if not please check about Partitions in ORACLE.

A pipeline will contain the source, transformations and targets. Mapping will be subdivided into many pipelines, these are called stage pipeline .After enabling partitioning in stage pipeline when we run the Integration service it runs pretty faster and we can expect more performance. The Integration service will run the partition threads concurrently.

Now it’s about how to enable partition and where to enable partition
  • Set the partition points in the mapping
  • Set the number of partitions
  • Set the partition types
Partition points marks the boundaries that divide the pipeline into stages. By default Integration Service will keep partition at various transformations. Partition points mark the points in the pipeline where the Integration Service can redistribute data across partitions.

When we give Partition points it actually creates various stage pipelines inside the mapping. Now we need to decide how many partitions we need to have inside one stage pipeline. When we add more partitions it increases the performance of Informatica. You can define up to 64 partitions at any partition point in a pipeline.Also note that if we define two Partitions at any partition point, then the remaining partition points will also have two partitions.
 
By default Integration service will create a default Partition type for each partition point. Also we have the option of explicitly giving the Partitioning Type. The type of partition decides how Integration service will distribute the data across components. Below are the various Partitions available:

Monday, 25 February 2013

Sequence Generator Transformation

As you all know Sequence Generator Transformation used to generate unique keys and the missing values in a sequence. To generate the Primary key we need to connect the NEXT VAL column of the Sequence Generator to the required target columns.

If Cycle option is not enabled then the session will get failed once the sequence reaches the maximum value.
Hope you all are aware of the columns like
  • Start Value: Default value is zero
  • Increment By: Default value is 1
  • Current Value: First value that is used in the sequence
  • End Value: This is the maximum value that transformation generates
Also there is column Cache value in Sequence Generator Transformation
Cache Value
Use this when while using Reusable Sequence Generator, this value will decide how many sequence value Informatica must cache at one time. This comes handy when we use the same sequence in multiple sessions

Sunday, 24 February 2013

INFORMATICA SCENARIOS BASED QUESTIONS

In most of the Informatica Interview the panel will come with Scenario based questions which is bit tricky and confusing. Here I have listed few scenario based questions:





      ·        How to removeduplicate rows in Informatica

Friday, 22 February 2013

What Actually is a Persistent Cache?

Normally a lookup will be in cached form by default. Which means that when we do lookup on table then Informatica will be go into the lookup table and store the data in Cache file, which avoids re lookup into the table when we need the data again.Informatica will make use of the cache file and this makes the lookup more faster.

But now the question comes why we need Persistent Cache in Lookup. To use Persistent Cache we need to check the option of Using Persistent Cache in the lookup. When we do that what Informatica does is that it will store the cache file and won’t delete it after run of the session or workflow.

This becomes handy for situations where we use the same lookup in many mappings. Suppose that we use the Lookup LKP_GET_VALUE with same lookup condition and return and output ports in 10 different mappings. In this case if you don’t use Persistent Cache then we have to lookup the table 10 times, and if the table is a huge value then it will take some time to build the cache. This can be avoided by using Persistent Cache.

Datawarehouse Partitioning

In Datawarehouse Partitioning is important .As you all know Partitioning allows tables, indexes, and index-organized tables to be subdivided into smaller pieces, enabling these database objects to be managed and accessed at a finer level of granularity.

When it comes to level of Fact tables and Dimension tables having billions of rows, partitioning becomes important. It will be perfect to partition the fact data in terms of Month. Depending on the growth of data we can partition the data either on months or on Quarter.

Partitioning tables in Datawarehouse would ensure that:
  • More flexiblity to access data
  • Easy to Recover the data
  • Easy to Monitor the data
  • Easy Retreival of data since query needs to check only in one partition
  • Easy to port data from one media to other
  • Easy to Maintain the Partitions