Sunday 12 August 2012

How to remove duplicate rows in Informatica

Scenario:Route unique rows to one table and duplicate to another using Informatica transformations. 

SOURCE:
COL1
COL2
COL3
100
101
102
10
11
12
100
101
102
999
998
997
100
101
102
1
2
3
1
2
3

Target Table 1: Table containing all the unique rows

COL1
COL2
COL3
100
101
102
10
11
12
999
998
997
1
2
3

Target Table 2: Table containing all the duplicate rows

COL1
COL2
COL3
100
101
102
100
101
102
1
2
3

We need to use transformations in such a way that Source table duplicate values are filter into separate table. As you guessed we need to use both Router and Aggregator
  • Bring the source table and connect to an Aggregator Transformation
  • Add a new column in Aggregator (group by key column) for eg. Count_rec having the count of the key column
  • Connect to Router with two groups
                1.ORGINAL: write count_rec=1
                2.DUPLICATE: write count_rec>1.
  •  Connect two groups to corresponding target table.

11 comments:

  1. In case of flat file:
    SRC-->SQ-->Sorter-->Aggr-->Router-->Tgt1 and Tgt2
    In Aggregator,create the variable ports
    1) PRES_ID = ID
    2) V_COUNT => IF(PRES_ID=PREV_ID,V_COUNT+1,1)
    3) PREV_ID = ID
    Create the output port
    4) O_COUNT = V_COUNT
    Pass the output to Router and create a group filter condition with O_COUNT=1.
    The default group will contain the duplicate records.

    ReplyDelete
    Replies
    1. I Think this logic will not work in Aggregator becasue:

      We haven't defined the group by ,so each row will be processed and passed to the next transformation. we are not performing count on group.

      The same can be achieved with expression also.


      Please correct me if i am wrong.

      Delete
  2. Can't we write the same logic in an expression transformation instead of aggr ??

    ReplyDelete
  3. we can also write same exp in expression t/r for performanece

    ReplyDelete
  4. Count_rec having the count of the key column means how can i perform
    this if anybody

    ReplyDelete
  5. Duplicate row in target were unique like expecting 100 101 102 two times,but in target getting one time. please provide alternative.

    ReplyDelete
  6. Sort the source data.

    1 2 3
    1 2 3
    10 11 12
    100 101 102
    100 101 102
    100 101 102
    999 998 999

    In the expression transformation, create a variable port and increment it by 1.
    Then assign the variable port to an output port and add one dummy column with value 1

    V_c1=c1
    V_c2=c2
    V_c3=c3
    V_count=IIF(V_c1=V_prev_c1 &&V_c2=V_prev_c2 &&V_c3=V_prev_c3,V_count+1,1)
    V_prev_c1=c1
    V_prev_c2=c2
    V_prev_c3=c3
    O_count=V_count
    O_dummy=1

    C1 c2 c3 count dummy
    1 2 3 1 1
    1 2 3 2 1
    10 11 12 1 1
    100 101 102 1 1
    100 101 102 2 1
    100 101 102 3 1
    999 998 999 1 1

    Then use Router transformation with 2 groups

    1) count=dummy-----unique rows
    2) count!=dummy-----duplicate rows

    ReplyDelete
  7. If we give --> write count_rec=1,write count_rec>1 in Router T/F then output table 1 (containig all unique rows)
    will have 2 records

    10,11,12
    1,2,3

    ReplyDelete
  8. Without using any transaction how to find whether there are duplicates in the target.(I don t want to use any sql and filter or select distinct option) in informatica

    ReplyDelete

Related Posts Plugin for WordPress, Blogger...

ShareThis