Here,we
will see how the tool INFORMATICA is used to extract data from
Source(s), transform it and then load it into the Target. Data transformation
is done to eliminate any erroneous or redundant data. This ensures that only
the correct data is loaded into the Target (OLAP), which will be used for
analysis / reporting.
Informatica
PowerCenter architecture is used to achieve the extract, transform and load of
data. PowerCenter provides an environment that allows you
to load data into a centralized location, such as a datamart, data warehouse,
or operational data store (ODS). You can extract data from multiple sources,
transform the data according to business
logic
you build in the client application, and load the transformed data into file
and relational targets. PowerCenter provides the following integrated
components
- PowerCenter repository. The PowerCenter repository is at the center of the PowerCenter suite. You create a set of metadata tables within the repository database that the PowerCenter applications and tools access. The PowerCenter Client and Server access the repository to save and retrieve metadata.
- PowerCenter Repository Server. The PowerCenter Repository Server manages connections to the repository from client applications. It inserts, updates, and fetches objects from the repository database tables. It also maintains object consistency.
- PowerCenter Client. Use the PowerCenter Client to manage users, define sources and targets, build mappings and mapplets with the transformation logic, and create workflows to run the mapping logic. The PowerCenter Client has the following client applications: Repository Manager, Repository Server Administration Console, Designer, Workflow Manager, and Workflow Monitor.
- PowerCenter Server. The PowerCenter Server extracts the source data, performs the data transformation, and loads the transformed data into the targets.
- Relational. Oracle, Sybase, Informix, IBM DB2, Microsoft SQL Server, and Teradata.
- File. Fixed and delimited flat file, COBOL file, and XML.
- Application. You can purchase additional PowerCenter Connect products to access business sources, such as PeopleSoft, SAP R/3, Siebel, IBM MQSeries, and TIBCO.
- Mainframe. You can purchase PowerExchange for faster access to IBM DB2 on MVS.
- Other. Microsoft Excel and Access.
- Relational. Oracle, Sybase, Sybase IQ, Informix, IBM DB2, Microsoft SQL Server, and Teradata.
- File. Fixed and delimited flat file and XML.
- Application. You can purchase additional PowerCenter Connect products to load data into SAP BW. You can also load data into IBM MQSeries message queues and TIBCO.
- Other. Microsoft Access.
The
PowerCenter repository resides on a relational database. The repository
database tables contain the instructions required to extract, transform, and
load data. PowerCenter Client applications access the repository database
tables through the Repository Server.
You
add metadata to the repository tables when you perform tasks in the PowerCenter
Client application, such as creating users, analyzing sources, developing
mappings or mapplets, or creating workflows. The PowerCenter Server reads
metadata created in the Client application when you run a workflow. The
PowerCenter Server also creates metadata, such as start and finish times of a
session or session status.
- Global repository. The global repository is the hub of the domain. Use the global repository to store common objects that multiple developers can use through shortcuts. These objects may include operational or Application source definitions, reusable transformations, mapplets, and mappings.
- Local repositories. A local repository is within a domain that is not the global repository. Use local repositories for development. From a local repository, you can create shortcuts to objects in shared folders in the global repository. These objects typically include source definitions, common dimensions and lookups, and enterprise standard transformations. You can also create copies of objects in non-shared folders.
- Version control. A versioned repository can store multiple copies, or versions, of an object. Each version is a separate object with unique properties. PowerCenter version control features allow you to efficiently develop, test, and deploy metadata into production.
You
can connect to a repository, back up, delete, or restore repositories using pmrep,
a command line program.
You
can view much of the metadata in the Repository Manager. The Informatica
Metadata Exchange (MX) provides a set of relational views that allow easy SQL
access to the Informatica metadata repository.
The
Repository Server manages repository connection requests from client
applications. For each repository database registered with the Repository
Server, it configures and manages a Repository Agent process. The Repository
Server also monitors the status of running Repository Agents, and sends
repository object notification messages to client applications.
The
Repository Agent is a separate, multi-threaded process that retrieves, inserts,
and updates metadata in the repository database tables. The Repository Agent
ensures the consistency of metadata in the repository by employing object
locking.
The
PowerCenter Client consists of the following applications that you use to
manage the repository, design mappings, mapplets, and create sessions to load
the data:
- Repository Server Administration Console. Use the Repository Server Administration console to administer the Repository Servers and repositories.
- Repository Manager. Use the Repository Manager to administer the metadata repository. You can create repository users and groups, assign privileges and permissions, and manage folders and locks.
- Designer. Use the Designer to create mappings that contain transformation instructions for the PowerCenter Server. Before you can create mappings, you must add source and target definitions to the repository. The Designer has five tools that you use to analyze sources, design target schemas, and build source-to-target mappings:
- Source Analyzer. Import or create source definitions.
- Warehouse Designer. Import or create target definitions.
- Transformation Developer. Develop reusable transformations to use in mappings.
- Mapplet Designer. Create sets of transformations to use in mappings.
- Mapping Designer. Create mappings that the PowerCenter Server uses to extract, transform, and load data.
- Workflow Manager. Use the Workflow Manager to create, schedule, and run workflows. A workflow is a set of instructions that describes how and when to run tasks related to extracting, transforming, and loading data. The PowerCenter Server runs workflow tasks according to the links connecting the tasks. You can run a task by placing it in a workflow.
- Workflow Monitor. Use the Workflow Monitor to monitor scheduled and running workflows for each PowerCenter Server. You can choose a Gantt chart or Task view. You can also access details about those workflow runs.
The
PowerCenter Server reads mapping and session information from the repository.
It extracts data from the mapping sources and stores the data in memory while
it applies the transformation rules that you configure in the mapping. The
PowerCenter Server loads the transformed data into the mapping targets.
The
PowerCenter Server can achieve high performance using symmetric
multi-processing systems. The PowerCenter Server can start and run multiple
workflows concurrently. It can also concurrently process partitions within a
single session. When you create multiple partitions within a session, the
PowerCenter Server creates multiple database connections to a single source and
extracts a separate range of data for each connection, according to the
properties you configure.
The
PowerCenter Client uses ODBC and native drivers to connect to source and target
databases. It uses TCP/IP to connect to the Repository Server. The Repository
Server uses native drivers to connect to the repository database. The Workflow
Manager and the PowerCenter Server use TCP/IP to communicate with each other.
The
PowerCenter Server uses native drivers to connect to the databases to move
data. You can optionally use ODBC to connect the PowerCenter Server to the
source and target databases. It uses TCP/IP to connect to the PowerCenter
Client.
The
Repository Server maintains a pool of reusable database connections for serving
client applications. The server generates a Repository Agent process for each
database. The Repository Agent creates new database connections only if all the
current connections are in use.
For
example, if 10 clients send requests to the Repository Agent one at a time, the
agent requires only one connection. It reuses the same database connection for
all the requests. If the 10 clients send requests simultaneously, the
Repository Agent opens 10 connections. You can set the maximum number of open
connections using the DatabasePoolSize parameter in the repository
configuration file.
For
a session, a reader object holds the connection for as long as it needs to read
the data from the source tables. A writer object holds a connection for as long
as it needs to write data to the target tables.
The
PowerCenter Server maintains a database connection pool for stored procedure or
lookup databases in a workflow. You can optionally set the
MaxLookupSPDBConnections parameter to limit connections when you configure the
PowerCenter service. The PowerCenter Server allows an unlimited number of
connections to lookup or stored procedure databases. If a database user does
not have permission for the number of connections a session requires, the
session fails.
For
pre-session, post-session, and load-stored procedures, consecutive stored
procedures reuse a connection if they have identical connection attributes.
Otherwise, the connection for one stored procedure closes and a new connection
begins for the next stored procedure.
- Navigator. Connect to repositories, and open folders within the Navigator. You can also copy objects and create shortcuts within the Navigator.
- Workspace. Open different tools in this window to create and edit repository objects, such as sources, targets, mapplets, transformations, and mappings.
- Output. View details about tasks you perform, such as saving your work or validating a mapping.
- Status bar. Displays the status of the operation you perform.
- Overview. An optional window to simplify viewing a workspace that contains a large mapping or multiple objects. Outlines the visible area in the workspace and highlights selected objects in color.
- Instance data. View transformation data while you run the Debugger to debug a mapping.
- Target data. View target data while you run the Debugger to debug a mapping.
thanks for this
ReplyDeleteVery good article. Thanks.
ReplyDelete