Informatica Tech Talks

Sunday, June 19, 2011

Informatica Partitioning

Performance Tuning at session level is applicable to remove Bottleneck at ETL data load. Session Partitioning means "Splitting ETL dataload in multiple parallel pipelines threads". It will be helpful on RDBMS like Oracle but not so effective for Teradata or Netezza (auto parallel aware architectural conflict ). Different Type of Partitioning supported by Informatica
1. Pass-Through (Default)
2. Round-robin
3. Database partitioning
4. Hash auto-keys
5. Hash user keys
6. Key range
Open Workflow Manager, Goto session properties, Mapping Tab, select Partition Hyperlink. Here we can add/delete/view partition,
Set Partition Point, Add Number of Partition then Partition type.

Pass-Through (Default) : All rows in a single partition: No data Distribution. Additional Stage area for better performance
Round-Robin : Equally data distribution among all partition using round robin algorithm. Each partition almost has same number of rows
Hash auto-keys : System generated partition key based on grouped ports at transformation level. When a new set of logical keys exists, Integration service generates a Hash key using Hash map and putted row to appropriate partition. Popularly used as Ramk, Sorter and Unsorted Aggregator
Hash user keys : User Defined group of ports for partition. For key value, System generated a Hash value using Hashing algorithm. Row is puted to ceratin partition based on Hash value.
Key range : Each port(s) for key range partition need to be assigned a range of value. Key value and range decide partition to held current value. Popularly used for Source and Target level.
System Level partitioning key generated for hash auto-keys, round-robin, or pass-through partitioning.
Session partitioning enables parallel processing logic of ETL load implementation. It enhance the performance using Multiprocessing/Grid processing ETL load.

Sunday, June 12, 2011

PushDown Optimization

Informatica 8+ version enabled Push down technique of execution of ETL. when we are using Push Down feature, Every transformation logic will be executed directly on RDBMS used for source/target. Informatica server will not store any intermediate data storage for session execution. Starting up with a very basic Pushdown mapping,

Source and target tables must be using same database connection on workflow manager level.
Workflow manager level changes,
goto -> session ->properties tab -> pushdown = full/source/target
Source/target/lookup all transformation must use same relational connection
Enable checkbox for temporary Views and Sequence if required
Set Datetime formate compatible to RDBMS used. It should be synch on mapping and session level
At the time of execution, Informatica engine creates views/nested views to buffer data from source. Using compatible SQL functions, Mapping transformation function (e.g. TO_CHAR, TO_DATE, SUBSTR etc) will be applied refering reference views.
To check if current settings enable the session for pushdown correctly or Not, Goto->mapping tab->click pushdown link in left treeview pan
It will show internal execution of pushdown and view creation with load plan
Errors will displayed in red color message
best of luck. We will know more in next Post

Sunday, May 24, 2009

pmcmd : Unix command to control Informatica Workflow

pmcmd command is used to control informatica repository events thru Unix, When informatica server is an Unix server. Since Unix is a very powerful and command end to interact with multiple data storage sources (flatfile, oracle, sql server, xml etc.). We can check for a indicator file and execute/stop/abort a session/workflow.In this way, we can reduce various task/commands and whole dataflow will be automated at an extend.
Session specific,
pmcmd stoptask -s $server_name -u $user_name -p $pass -f $folder -w $wkf $sess
Keys :
stoptask : To stop a task
gettaskdetails : To retrieve task details
starttask : To Execute a task
aborttask : To abort a task
waittask : To Run a task in wait mode
workflow specific,
pmcmd startworkflow-s $server_name -u $user_name -p $pass -f $folder $wkf
Keys :
stopworkflow : To stop a workflow
getworkflowdetails : To retrieve workflow details
startworkflow : To Execute a workflow
abortworkflow : To abort a workflow
waitworkflow : To Run a workflow in wait mode
pmcmd command with parameter file
pmcmd startworkflow-s $server_name -u $user_name -p $pass -f $folder -paramfile paramfile_name -wait $wkf

Apart from these pmcmd command options , various utility options available
(check informatica help file for details)

Sunday, May 17, 2009

Mapping Variables Stores Values in Informatica Repository

We use mapping Variales for current session run to store and manipulate expression. We will See intelligent use of mapping variables. Mapping variables are very simple to create.

In mapping designer,

Goto Mapping Menu

Select Parameters and Variables option. create date type variable $$URF_CA_FROM.

Initialized with '04/05/2009' (any valid date value for first run).

Use this variable in any transformation column.

Use function SETVARIABLE($$URF_CA_FROM,SYSDATE) to store current date.

Link it with some other transformation coulmn of same datatype(date).

Now, When we run this mapping, variable will store current session system date value and persist in repository. At Next run time, We can utilize this value in soure qualifier in where condition.

We can use any datatype of variable to store prev run value and use this value in next execution.

i.e.

If session run on 02/06/2009, $$x variable stores this value.

now, we are running session on 12/06/2009, we can retrieve value of $$x in any transformation in same mapping. We can see this variable'spersist value in workflow manager.

right click on session at workflow level. select View persistent Values option to see and reset values.

***This logic will eliminate lot of DB dependencies and make incremental logic easy.

Limitations:

Variable must be initialized
We need to link SETVARIABLE() function column to same datatype column.
It should executed at least once for insert/update/delete.

Saturday, May 16, 2009

Use of Update Override Properties In Target

Mapping : Target Properties
Update Override
Informatica supports Update stretegy transformation to update/insert/delete operations on Target. If we need to process insert timestamp for insert scenario and Update timestamp for Update scenario.Generaly, We will use 2 UPDATE STRETEGY transformations to achieve this functionality. Think If we have a complex mapping with more then 10 Targets, Then we need to use 20 UPDATE STRETEGY transformations.

It will increase Mapping complexity and execution time. It will create a mess in mapping.

Lets do it in decent way, We can make this mapping simple and fast.
Take SYSDATE as a temperory column in a transformation. Link it with insert timestamp and update timestamp columns. Now open TARGET property in mapping. Select UPDATE OVERRIDE property.Generate the default update statement. Take update timestamp and remove insert timestamp column from default update statement. Now when update scenario will come, only update timestamp will be updated.

This DB type is basically used in data warehousing, to implement increamental logic. When Table stores insert timestamp and update timestamp.