Pages

Friday, 31 August 2012

Extracting and Loading Mechanism in DAC

DAC supports different kind of data extraction and loading combinations to give more flexibility over the kind of data transformation that happens.
Full Extract and Full Load: Loading of data warehouse first time happens in this way
Full Extract and Incremental Load: This comes in a situation like when you have to pull data from a new source and load it into the existing table in warehouse. It will have an additional logic whether to insert/update the record in cases of handling data integrity.
Incremental Extract and Incremental Load: This would be the regular process where the data loading happens every night which captures only the new or changed records.

DAC Process Cycle

DAC is used to design, execute, monitor and diagnose execution plans
Setup: Database connections, ETL setup in Informatica
Design: Define Tasks, Groups and Execution plans
Execute: Define parameters, schedule and run the execution plans
Monitor: Monitoring the run time executions
Diagnose: In case of task failure identifying the route cause and rerunning the task.

Components of DAC

DAC Client: User interface through which user performs the activities (configuration, administration, loading, monitoring) to manage the analytics schema.
DAC Server: Executes the instructions given from the DAC client for running the ETL routines.
DAC Repository: Stores the metadata used by DAC to manage this whole data warehouse process.

Why to use DAC?

Following are the reasons why we go for DAC
  • Scheduling Tasks: It helps in scheduling an execution plan run. An execution plan can be scheduled for different frequencies or recurrences by defining multiple schedules. (An execution plan is defined based on business requirements for when the data warehouse needs to be loaded. Execution plans are scheduled in sequence, not in parallel)
  • Automate Execution: It can be quite difficult and error prone to run different workflows manually and DAC helps us to automate execution of workflows according to our needs.
  • Easy Flow Management: It helps easy management of workflow execution. Take a simple example I have created 5 different workflows 3rd should only be run after 4th and 5th workflow. Running them manually can always leave a possibility of error and result in wrong data load. DAC can help us define dependency and order of the workflows in which they should be run.
Also DAC manage the performance by dropping indexes, truncating stage tables, rebuilding the indexes, and analyzing the tables during the process
If you do not use DAC, you have to write your own custom change capture process and need to redesign from scratch an ETL method that allow the restart of the ETL process from point of failure at record level. The biggest saving is that DAC can survive during the upgrade , while your custom processes cannot.

HTTP Transformation In Informatica

HTTP Transformation in Informatica allows us to connect to an HTTP server.After connecting to the server we can either retrieve data from HTTP server or else we can write to the server.
  • Reading from the HTTP server:When the Integration Service reads data from an HTTP server, it takes the data from the HTTP server and passes the data to the target tables.Once data is taken from the HTTP server you can perform the required calculation and pass it to the target tables.
  • Writing into the HTTP server:When the Integration Service writes to an HTTP server, it  writes data to the HTTP server and passes HTTP server responses to the target tables.
How to Connect to the HTTP Server
While configuring  an HTTP transformation, you can configure the URL for the connection or create an HTTP connection object in the Workflow Manager.

Null Values in Aggregate Functions



When you configure the Integration Service, you can choose how you want the Integration Service to handle null values in aggregate functions(NULL or zero). By default the Integration services treats  null values as Null in Aggregate functions, If you don’t want it to be like that then you can configure to treat null values as Zero.