Pages

Thursday, 2 August 2012

Difference between Informatica and Datastage

Both Datastage and Informatica are powerful ETL tools . Both tools do almost exactly the same thing in almost exactly the same way. Performance, maintainability, learning curve are all similar and comparable. Below are the few things which I would like highlight regarding both these tools.

Multiple Partitions
Informatica offers partitioning as dynamic partitioning which defaults a workflow not at every Stage/Object level in a mapping/job. Informatica offers other partitioning choices as well at the workflow level.
DataStage's pipeline partitioning uses multiple partitions, processed and then re-collected with DataStage. DataStage lets control a job design based on the logic of the processing instead of defaulting the whole pipeline flow to one partition type.  DataStage offers 7 different types of multi-processing partitions.

User Interface
Informatica offers access to the development and monitoring effort through its 4 GUIs - offered as Informatica PowerDesigner, Repository Manager, Worflow Designer, Workflow Manager.  

DataStage caters to development and monitoring its jobs through 3 GUIs - IBM DataStage Designer(for development), Job Sequence Designer(workflow design) and Director(for monitoring).

Version Control
Informatica offers instant version control through its repository server managed with “Repository Manager” GUI console. A mapping with work-in-progress cannot be opened until saved and checked back into the repository. Version control is done by using checkin and check out.
Version Control was offered as a component until version Ascential DataStage7.5.x. Ascential was acquired by IBM and when DataStage was integrated into IBM Information Server with DataStage at version 8.0.1, the support of version control as a component was discontinued.

Repository based flow
Informatica, offers a step-by-step effort of creating a data integration solution. Each object created while mapping a source with a target gets saved into the repository project folder categorized by - Sources, Targets, Transformations, Mappings, Mapplets, User-defined functions, Business Components, Cubes and Dimensions. Each object created can be shared, dropped into a mapping across cross-functional development teams. Thus increasing re-usability. Projects are folder based and inter-viewable. 

DataStage offers a project based integration solution, projects are not interviewable. Every project needs a role based access. The step-by-step effort in mapping a source to a target lineages into a job. For sharing objects within a job, separate objects need to be created called containers that are local/shared.

Data Encryption
Informatica has an offering within PowerCenter Designer as a separate transformation called “Data Masking Transformation”.
Data Masking or encryption needs to be done before reaching DataStage Server.

Variety of Transformations
Informatica offers about 30 general transformations for processing incoming data. 
Datastage offers about 40 data transforming stages/objects. Datastage is more powerful transformation engine by using functions (Oconv and IConv) and routines. We can do almost any transformation.

Source_- Target flow
Within Informatica’s PowerCenter Designer, first a source definition needs to be created using “Source Analyzer” that imports the metadata, then a target definition is created using “Target Designer”, then a transformation using “Transformation Developer” is created, and finally maps a source-transformation-target using “Mapping Designer”.

Datastage lets drag and drop a functionality i.e a stage within in one canvas area for a pipeline source-target job. With  DataStage within the “DataStage Designer” import of both source and target metadata is needed, proceeding with variety of stages offered as database stages, transformation stages, etc.

The biggest difference between both the vendor offerings in this area is Informatica forces you to be organized through a step-by-step design process, while DataStage leaves the organization as a choice and gives you flexibility in dragging and dropping objects based on the logic flow.

Checking Dependencies
Informatica offers a separate edition – Advanced edition that helps with data lineage and impact analysis. We can go to separate targets and source and check all the dependencies on that.
DataStage offers through Designer by right clicking on a job to perform dependencies or impact analysis.

Components Used
The Informatica ETL transformations are very specific purpose, so you tend to need more boxes on the page to do the same thing. eg. A simple transform in Informatica would have a Source Table, Source Qualifier, Lookup, Router, 2 Update Strategies, and 2 Target Tables (9 boxes).
In DataStage, you would have a Table and Hashed File for the lookup, plus a Source Relational Stage, Transformation Stage, and 2 links to a target Relational Stage (5 boxes). This visual clutter in Informatica is a bit annoying.

Type of link
To link two components in Informatica, you have to link at the column level.We have to connect each and every column bw the two componenents
In DataStage, you link at the component level, and then map individual columns. This allows you to have coding templates that are all linked up - just add columns. I find this a big advantage in DS.

Reusability
Informatica offers ease of re-usability through Mapplets and Worklets for re-using mappings and workflows.This really improves the performance
DataStage offers re-usability of a job through containers(local&shared). To re-use a Job Sequence(workflow), you will need to make a copy, compile and run.

Code Generation and Compilation
Informatica’s thrust is the auto-generated code. A mapping gets created by dropping a source-transformation-target that doesn’t need to be compiled.
DataStage requires to compile a job in order to run it successfully.

Heterogeneous Sources
In Informatica we can use both heterogenous source and homogenous source.
Datastage does not perform very well with heterogeneous sources. You might end up extracting data from all the sources and putting them into a hash and start your transformation

Slowly Changing Dimension
Informatica supports Full History, Recent Values, Current & Previous Values using SCD wizards.
DataStage supports only through Custom scripts and does not have a wizard to do this

Dynamic Lookup Cache

Informatica's marvellous Dynamic Cache Lookup has no equivalent in DS Server Edition. The same saves some effort and is very easily maintainable.

Also read what FACTLESS FACT TABLE is?

5 comments:

  1. Vincent,
    The dynamic Lookup Cache does not mean just caching, but performing DB like operations (insert, update, update else insert, delete) in the cache memory. Beside this the ease of use of Informatica is years ahead DS.
    Costin

    ReplyDelete
    Replies
    1. Hi Costin, just one remark: with the dyn. lookup cache there are NO DELETES, only INSERTs and UPDATEs.
      - Karin

      Delete
  2. It would have been great if versions were mentioned for this comparision. As it stands today April 2012, this post is more or less outdated.
    Bottonline - Till 2003 to decide on DataStage or Informatica was like flip of a coin. Ascential did a game changer acqusition of Torrant
    which gave DataStage PX capability and Informatica lost the race. IBM acqusition took it to next level.

    Only thing which I don't like about DataStage is that it has no control on Metadata and I have see no improvement over the years.

    ReplyDelete
  3. Hi i would like to know that which is having good opportunities among ETL tools(mainly btn Datastage & informatica) for a fresher as well as experience..will u pls suggest to my mail tejareddy5051@gmail.com...soon i want to learn those tools..which is optimized

    ReplyDelete

  4. Although Informatica and DataStage perform well in ETL processes, Informatica has an advantage in intricate transformations due to its versatility.
    Informatica Datastage

    ReplyDelete