Friday, 12 December 2014

Informatica PowerCenter Session Partitioning-Type of Informatica Partitions


Informatica provides you the option of enhancing the performance of the Informatica session by the The PowerCenter® Partitioning Option. After tuning all the performance bottlenecks we can further improve the performance by addition partitions.




We can either go for Dynamic partitioning (number of partition passed as parameter) or Non-dynamic partition (number of partition are fixed while coding). Apart from used for optimizing the session, Informatica partition become useful in situations where we need to load huge volume of data or when we are using Informatica source which already has partitions defined, and using those partitions will allow to improve the session performance.
The partition attributes include setting the partition point, the number of partitions, and the partition types.

Partition Point:
There can be one or more pipelines inside a mapping. Adding a partition point will divide this pipeline into many pipeline stages. Informatica will create one partition by default for every pipeline stage. As we increase the partition points it increases the number of threads. Informatica has mainly three types of threads –Reader, Writer and Transformation Thread.

The number of partitions can be set at any partition point. We can define up to 64 partitions at any partition point in a pipeline. When you increase the number of partitions, you increase the number of processing threads, which can improve session performance. However, if you create a large number of partitions or partition points in a session that processes large amounts of data, you can overload the system.



You cannot create partition points for the following transformations:
• Source definition
• Sequence Generator
XMLParser
• XML target
• Unconnected transformations

The partition type controls how the Integration Service distributes data among partitions at partition points. The Integration Service creates a default partition type at each partition point. 

Type of partitions are :
1. Database partitioning,
2. Hash auto-keys
3. Hash user keys
4. Key range
5. Pass-through
6. Round-robin.

Database Partitioning
For Source Database Partitioning, Informatica will check the database system for the partition information if any and fetches data from corresponding node in the database into the session partitions. When you use Target database partitioning, the Integration Service loads data into corresponding database partition nodes.
Use database partitioning for Oracle and IBM DB2 sources and IBM DB2 targets. 

Pass through
Using Pass through partition will not affect the distribution of data across partitions instead it will run in single pipeline.which is by default for all your sessions. The Integration Service processes data without redistributing rows among partitions. Hence all rows in a single partition stay in the partition after crossing a pass-through partition point.



Key range
Used when we want to partition the data based on upper and lower limit. The Integration Service will distribute the rows of data based on a port or set of ports that we define as the partition key. For each port, we define a range of values.Based on the range that we define the rows are send to different partitions



Round robin partition is used to when we want to distributes rows of data evenly to all partitions

Hash auto-keys: The Integration Service uses a hash function to group rows of data among partitions. The Integration Service groups the data based on a partition key.

Hash user keys: The Integration Service uses a hash function to group rows of data among partitions. We define the number of ports to generate the partition key.

3 comments:

  1. Data warhouse concept blog excellent blog rf services

    ReplyDelete
  2. Hi All,

    Please help me to resolve the below issue while Applying partitioning concept to my Session.

    This is a very simple mapping with Source, Lookup , router, and target.

    I need to Lookup on the target and compare with the source data, if any piece of data is new then Insert, and If any thing change in the existed data then Update.

    while Inserting the new records to the target table I'm generating sequence numbers with Unconnected lookup, by calling the maximum PK ID from the target table.

    The above flow is working fine from last one year.

    Now I wish to apply the Partitioning concept to the above floe(session)

    At source I used 4 pass through partitions.(For Each partition different filter conditions to pull the data from source)

    at Target I used 4 passthrough Partitions.

    it is working fine for some data, but for some rows for Insert Operation , it is throwing Unique key errors, because while Inserting the data it is generating the same sequence key twice.


    In detail : 1st row is coming from 1st partition and generated the sequence number 1 for that row.

    2nd row is coming from 1st partition and generated the sequence number 2 for that row

    3rd row is coming from the 2nd partition generated the sequence number 2 again for that row. (it must generate 3 for this row)

    the issue is becuase of generating the same sequence numbers twice for different partitions.

    Can any one Please help me to resolve this issue.

    While Applying partitions how can I generate a Unique Sequence numbers from Unconnected lookup for Each partitioned data.

    Regrads,
    N Kiran.

    ReplyDelete

Related Posts Plugin for WordPress, Blogger...

ShareThis