Thursday, 2 August 2012

Difference between Informatica and Datastage

Both Datastage and Informatica are powerful ETL tools . Both tools do almost exactly the same thing in almost exactly the same way. Performance, maintainability, learning curve are all similar and comparable. Below are the few things which I would like highlight regarding both these tools.

Multiple Partitions
Informatica offers partitioning as dynamic partitioning which defaults a workflow not at every Stage/Object level in a mapping/job. Informatica offers other partitioning choices as well at the workflow level.
DataStage's pipeline partitioning uses multiple partitions, processed and then re-collected with DataStage. DataStage lets control a job design based on the logic of the processing instead of defaulting the whole pipeline flow to one partition type.  DataStage offers 7 different types of multi-processing partitions.

User Interface
Informatica offers access to the development and monitoring effort through its 4 GUIs - offered as Informatica PowerDesigner, Repository Manager, Worflow Designer, Workflow Manager.  

DataStage caters to development and monitoring its jobs through 3 GUIs - IBM DataStage Designer(for development), Job Sequence Designer(workflow design) and Director(for monitoring).

Version Control
Informatica offers instant version control through its repository server managed with “Repository Manager” GUI console. A mapping with work-in-progress cannot be opened until saved and checked back into the repository. Version control is done by using checkin and check out.
Version Control was offered as a component until version Ascential DataStage7.5.x. Ascential was acquired by IBM and when DataStage was integrated into IBM Information Server with DataStage at version 8.0.1, the support of version control as a component was discontinued.

Repository based flow
Informatica, offers a step-by-step effort of creating a data integration solution. Each object created while mapping a source with a target gets saved into the repository project folder categorized by - Sources, Targets, Transformations, Mappings, Mapplets, User-defined functions, Business Components, Cubes and Dimensions. Each object created can be shared, dropped into a mapping across cross-functional development teams. Thus increasing re-usability. Projects are folder based and inter-viewable. 

DataStage offers a project based integration solution, projects are not interviewable. Every project needs a role based access. The step-by-step effort in mapping a source to a target lineages into a job. For sharing objects within a job, separate objects need to be created called containers that are local/shared.

Data Encryption
Informatica has an offering within PowerCenter Designer as a separate transformation called “Data Masking Transformation”.
Data Masking or encryption needs to be done before reaching DataStage Server.

Variety of Transformations
Informatica offers about 30 general transformations for processing incoming data. 
Datastage offers about 40 data transforming stages/objects. Datastage is more powerful transformation engine by using functions (Oconv and IConv) and routines. We can do almost any transformation.

Source_- Target flow
Within Informatica’s PowerCenter Designer, first a source definition needs to be created using “Source Analyzer” that imports the metadata, then a target definition is created using “Target Designer”, then a transformation using “Transformation Developer” is created, and finally maps a source-transformation-target using “Mapping Designer”.

Datastage lets drag and drop a functionality i.e a stage within in one canvas area for a pipeline source-target job. With  DataStage within the “DataStage Designer” import of both source and target metadata is needed, proceeding with variety of stages offered as database stages, transformation stages, etc.

The biggest difference between both the vendor offerings in this area is Informatica forces you to be organized through a step-by-step design process, while DataStage leaves the organization as a choice and gives you flexibility in dragging and dropping objects based on the logic flow.

Checking Dependencies
Informatica offers a separate edition – Advanced edition that helps with data lineage and impact analysis. We can go to separate targets and source and check all the dependencies on that.
DataStage offers through Designer by right clicking on a job to perform dependencies or impact analysis.

Components Used
The Informatica ETL transformations are very specific purpose, so you tend to need more boxes on the page to do the same thing. eg. A simple transform in Informatica would have a Source Table, Source Qualifier, Lookup, Router, 2 Update Strategies, and 2 Target Tables (9 boxes).
In DataStage, you would have a Table and Hashed File for the lookup, plus a Source Relational Stage, Transformation Stage, and 2 links to a target Relational Stage (5 boxes). This visual clutter in Informatica is a bit annoying.

Type of link
To link two components in Informatica, you have to link at the column level.We have to connect each and every column bw the two componenents
In DataStage, you link at the component level, and then map individual columns. This allows you to have coding templates that are all linked up - just add columns. I find this a big advantage in DS.

Reusability
Informatica offers ease of re-usability through Mapplets and Worklets for re-using mappings and workflows.This really improves the performance
DataStage offers re-usability of a job through containers(local&shared). To re-use a Job Sequence(workflow), you will need to make a copy, compile and run.

Code Generation and Compilation
Informatica’s thrust is the auto-generated code. A mapping gets created by dropping a source-transformation-target that doesn’t need to be compiled.
DataStage requires to compile a job in order to run it successfully.

Heterogeneous Sources
In Informatica we can use both heterogenous source and homogenous source.
Datastage does not perform very well with heterogeneous sources. You might end up extracting data from all the sources and putting them into a hash and start your transformation

Slowly Changing Dimension
Informatica supports Full History, Recent Values, Current & Previous Values using SCD wizards.
DataStage supports only through Custom scripts and does not have a wizard to do this

Dynamic Lookup Cache

Informatica's marvellous Dynamic Cache Lookup has no equivalent in DS Server Edition. The same saves some effort and is very easily maintainable.

Also read what FACTLESS FACT TABLE is?

6 comments:

  1. I am a DataStage blogger at ITToolbox and wanted to Just to make some corrections to your post on the latest versions of DataStage (8.5 release 2010 and 8.7 release 2011) parallel jobs. Most new DataStage customers are using Parallel jobs and not the old Server Jobs so this changes many of the evaluation answers.

    User Interface
    DataStage has two additional user interfaces as of version 8.7:
    - Operations Console: web based console for monitoring running and completed jobs, server resources, errors and warnings etc.
    - Information Server Manager: a release and deployment tool for creating release packages and checking them in and out of version control repositories.

    Version Control:
    - DataStage, via the Information Server Manager, lets you put releases into Source Control repositories and can be configured to work with API compliant source control tools such as SubVersion.

    Data Encryption
    - There is now a data masking pack for DataStage available under an additional license that puts masking capabilities into DataStage jobs.

    Variety of Transformations
    - Now up to 60 pre-built transformation stages plus over 100 transformation functions.

    Source to Target Flow
    There are two options for DataStage, the drag and drop Palette for developers to build flows or the add on product FastTrack that lets business users build source to target mappings in a spreadsheet style interface that can then generate DataStage jobs.

    Checking Dependencies
    DataStage has very good dependency checking built into the Designer. For an additional small license you can get Metadata Workbench and Metadata Asset Manager that can import and check dependencies across products - including database and BI assets.

    Reusability
    In 8.7 DataStage introduced re-usable data quality rules, the ability to re-use a rule in the new Rule Stage across jobs and databases. There is also the ability to create re-usable C++ transform routines in addition to the containers you mentioned. These data quality rules are re-usable across the Glossary, mapping, data quality monitoring and ETL tools.

    Heterogeneous Sources
    DataStage Hash files were a big part of Server Jobs but are not needed in Parallel jobs. In DataStage 8 parallel jobs anything can be added as a source for a lookup - tables, XML files, CSV files, java sources, .Net sources, stored procedures. DataStage parallel jobs are heterogeneous and support most common database, file, SOA and XML formats with no additional license fees.

    Slowly Changing Dimension
    DataStage since 8.5 has a very good Slowly Changing Dimension stage that takes the developer through a wizard of steps and caters for Type 1, 2, 3 and hybrid with optional surrogate key generation.

    Dynamic Lookup Cache
    DataStage parallel jobs have lookup caching by default and transformer looping for performing lookups across a group of records.

    regards
    Vincent
    http://it.toolbox.com/blogs/infosphere

    ReplyDelete
  2. Vincent,
    The dynamic Lookup Cache does not mean just caching, but performing DB like operations (insert, update, update else insert, delete) in the cache memory. Beside this the ease of use of Informatica is years ahead DS.
    Costin

    ReplyDelete
    Replies
    1. Hi Costin, just one remark: with the dyn. lookup cache there are NO DELETES, only INSERTs and UPDATEs.
      - Karin

      Delete
  3. It would have been great if versions were mentioned for this comparision. As it stands today April 2012, this post is more or less outdated.
    Bottonline - Till 2003 to decide on DataStage or Informatica was like flip of a coin. Ascential did a game changer acqusition of Torrant
    which gave DataStage PX capability and Informatica lost the race. IBM acqusition took it to next level.

    Only thing which I don't like about DataStage is that it has no control on Metadata and I have see no improvement over the years.

    ReplyDelete
  4. Hi i would like to know that which is having good opportunities among ETL tools(mainly btn Datastage & informatica) for a fresher as well as experience..will u pls suggest to my mail tejareddy5051@gmail.com...soon i want to learn those tools..which is optimized

    ReplyDelete
  5. Hi


    Hereunder an addendum to DS points that have been checked in your post.

    1. There is SCD in DS. among a huge number of transformation assets. I'd include the wrap stage to study and comparison.

    2. I think the approach expected for metadata shouldn't be narrowed to only the ETL tool in this case DS. If you are looking to manage metadata try the platform approach that Information Server has.

    3. Interfaces should include the webconsole and operations console.

    4. There are a lot of features to support version control ( and integration capabilities with IBM tools or other vendors). Also promoting to other environments.

    5. There is a new big player on the Blueprint Director with asset link

    Again the metadata approach is more platform oriented, so every single component has access to it.

    Just to clarify, many of these features are entitled with DS or QS licenses. Some others, as the metadata workbench require an authorized user.


    Here a series of redbooks that cover all of this

    http://www.redbooks.ibm.com/abstracts/sg247576.html

    http://www.redbooks.ibm.com/abstracts/sg247830.html

    http://www.redbooks.ibm.com/abstracts/sg247939.html?Open

    ReplyDelete

Related Posts Plugin for WordPress, Blogger...

ShareThis