Informatica Developer: Remove Duplicate Records
Posted by: Sriraman Premkumar
Informatica Developer Client mapping example showing how to remove duplicate rows from the source data.
Overview
Duplicate records are occasionally found in source data. Due to primary key constraints on a target database, only one version of a duplicate source record should be loaded into the target. This mapping illustrates one alternative for removing duplicate records when the source has a primary key that can be used for grouping.The mapping illustrates the concept of using the functionality within an Aggregator transformation to remove duplicate records from a source and load this data into a target table of the same structure. Implementation Guidelines :
- Selects all rows from NIELSEN table which contains duplicate rows. The source rows are ordered by the primary key. e.g. STATE_TAX_ID.
- The aggregator after the source qualifier is configured for ?Sorted Input? and the STATE_TAX_ID port is selected as the "Group By" port in the transformation. In general, the number of "Group By" ports must correspond to the "Number of Sorted Ports" indicated in the Source Qualifier.
- The Informatica server, by default, returns the last row in a group if no aggregate function is specified. If two records with the same value for STATE_TAX_ID enter the Aggregator, only one record will be returned by the Aggregator. As a result, duplicate source records are eliminated.
Features
- Informatica Developer Client 9.5.1
- Oracle 10g and above.