Informatica provides you the option of enhancing the performance of the Informatica session by the The PowerCenter® Partitioning Option. After tuning all the performance bottlenecks we can further improve the performance by addition partitions.
We can either go for Dynamic
partitioning (number of partition passed as parameter) or Non-dynamic
partition (number of partition are fixed while coding). Apart from used for
optimizing the session, Informatica partition become useful in situations where
we need to load huge volume of data or when we are using Informatica source
which already has partitions defined, and using those partitions will allow to
improve the session performance.
The
partition attributes include setting the partition point, the number of
partitions, and the partition types.
Partition
Point:
There can be one or more pipelines inside a mapping. Adding a partition point will divide this pipeline into many pipeline stages. Informatica will create one partition by default for every pipeline stage. As we increase the partition points it increases the number of threads. Informatica has mainly three types of threads –Reader, Writer and Transformation Thread.
There can be one or more pipelines inside a mapping. Adding a partition point will divide this pipeline into many pipeline stages. Informatica will create one partition by default for every pipeline stage. As we increase the partition points it increases the number of threads. Informatica has mainly three types of threads –Reader, Writer and Transformation Thread.
The
number of partitions can be set at any partition point. We can define up to 64
partitions at any partition point in a pipeline. When you increase the number
of partitions, you increase the number of processing threads, which can improve
session performance. However, if you create a large number of partitions or
partition points in a session that processes large amounts of data, you can
overload the system.
You
cannot create partition points for the following transformations:
•
Source definition
•
Sequence Generator
• XMLParser
• XML
target
•
Unconnected transformations
The
partition type controls how the Integration Service distributes data among
partitions at partition points. The Integration Service creates a default
partition type at each partition point.
Type of partitions are :
1.
Database partitioning,
2.
Hash auto-keys
3.
Hash user keys
4. Key
range
5.
Pass-through
6.
Round-robin.
Database
Partitioning
For Source Database Partitioning, Informatica will check the database system for
the partition information if any and fetches data from corresponding node in
the database into the session partitions. When you use Target database
partitioning, the Integration Service loads data into corresponding database
partition nodes.
Use database partitioning for
Oracle and IBM DB2 sources and IBM DB2 targets.
Pass
through
Using Pass through partition will not affect the distribution of data across
partitions instead it will run in single pipeline.which is by default for all
your sessions. The Integration Service processes data without redistributing
rows among partitions. Hence all rows in a single partition stay in the
partition after crossing a pass-through partition point.
Key
range
Used
when we want to partition the data based on upper and lower limit. The
Integration Service will distribute the rows of data based on a port or set of
ports that we define as the partition key. For each port, we define a range of
values.Based on the range that we define the rows are send to different partitions
Hash
auto-keys: The
Integration Service uses a hash function to group rows of data among
partitions. The Integration Service groups the data based on a partition key.
Hash
user keys: The
Integration Service uses a hash function to group rows of data among
partitions. We define the number of ports to generate the partition key.
very informative!!
ReplyDeleteHi All,
ReplyDeletePlease help me to resolve the below issue while Applying partitioning concept to my Session.
This is a very simple mapping with Source, Lookup , router, and target.
I need to Lookup on the target and compare with the source data, if any piece of data is new then Insert, and If any thing change in the existed data then Update.
while Inserting the new records to the target table I'm generating sequence numbers with Unconnected lookup, by calling the maximum PK ID from the target table.
The above flow is working fine from last one year.
Now I wish to apply the Partitioning concept to the above floe(session)
At source I used 4 pass through partitions.(For Each partition different filter conditions to pull the data from source)
at Target I used 4 passthrough Partitions.
it is working fine for some data, but for some rows for Insert Operation , it is throwing Unique key errors, because while Inserting the data it is generating the same sequence key twice.
In detail : 1st row is coming from 1st partition and generated the sequence number 1 for that row.
2nd row is coming from 1st partition and generated the sequence number 2 for that row
3rd row is coming from the 2nd partition generated the sequence number 2 again for that row. (it must generate 3 for this row)
the issue is becuase of generating the same sequence numbers twice for different partitions.
Can any one Please help me to resolve this issue.
While Applying partitions how can I generate a Unique Sequence numbers from Unconnected lookup for Each partitioned data.
Regrads,
N Kiran.
Use update else insert logic
Deletecheck in the update else insert option in session/mapping/target properties
We can't create partition point for sequence generator right?
DeleteU can create sequence using exp trnf ... v=v+1 for insert record n remove seq gen trnf...etc
profround
ReplyDelete