Sunday, June 19, 2011

Informatica Partitioning



Performance Tuning at session level is applicable to remove Bottleneck at ETL data load. Session Partitioning means "Splitting ETL dataload in multiple parallel pipelines threads". It will be helpful on RDBMS like Oracle but not so effective for Teradata or Netezza (auto parallel aware architectural conflict ). Different Type of Partitioning supported by Informatica
1. Pass-Through (Default)
2. Round-robin
3. Database partitioning
4. Hash auto-keys
5. Hash user keys
6. Key range

Open Workflow Manager, Goto session properties, Mapping Tab, select Partition Hyperlink. Here we can add/delete/view partition,
Set Partition Point, Add Number of Partition then Partition type.

Pass-Through (Default) : All rows in a single partition: No data Distribution. Additional Stage area for better performance
Round-Robin : Equally data distribution among all partition using round robin algorithm. Each partition almost has same number of rows
Hash auto-keys : System generated partition key based on grouped ports at transformation level. When a new set of logical keys exists, Integration service generates a Hash key using Hash map and putted row to appropriate partition. Popularly used as Ramk, Sorter and Unsorted Aggregator
Hash user keys : User Defined group of ports for partition. For key value, System generated a Hash value using Hashing algorithm. Row is puted to ceratin partition based on Hash value.
Key range : Each port(s) for key range partition need to be assigned a range of value. Key value and range decide partition to held current value. Popularly used for Source and Target level.
System Level partitioning key generated for hash auto-keys, round-robin, or pass-through partitioning.
Session partitioning enables parallel processing logic of ETL load implementation. It enhance the performance using Multiprocessing/Grid processing ETL load.

No comments: