Pages

Tuesday, 27 August 2013

Types of Dimensions

Dimension
A dimension table typically has two types of columns, primary keys to fact tables and textual\descriptive data.
Eg: Time, Customer

Types of Dimensions
  1. Slowly Changing Dimensions
  2. Rapidly Changing Dimensions
  3. Junk Dimensions
  4. Inferred Dimensions
  5. Conformed Dimensions
  6. Degenerate Dimensions
  7. Role Playing Dimensions
  8. Shrunken Dimensions
  9. Static Dimensions 
Slowly Changing Dimensions
Attributes of a dimension that would undergo changes over time. It depends on the business requirement whether particular attribute history of changes should be preserved in the data warehouse. This is called a slowly changing attribute and a dimension containing such an attribute is called a slowly changing dimension.

Types of Fact Table

The Types of Fact Table are
  1. Snapshot
  2. Cumulative
  3. Factless Fact Table
Snapshot
This type of fact table describes the state of things in a particular instance of time, and usually includes more semi-additive and non-additive facts. The second example presented here is a snapshot fact table.
Eg: Daily balances fact can be summed up through the customers dimension but not through the time dimension.

Types of Facts

Fact

A fact table typically has two types of columns, foreign keys to dimension tables and measures those that contain numeric facts. A fact table can contain fact’s data on detail or aggregated level.
Eg: Sales, Cost, and Profit


Types of Facts
  1. Non-Additive
  2. Semi-Additive
  3. Additive

Sunday, 25 August 2013

REGEXP_COUNT Function


This function is used to find the number of time a pattern had occurred in a string.

It takes input as string (characters) and returns output as integer .If no records found matching then the function returns 0

REGEXP_COUNT('source_char', 'pattern', position, 'match_param')

  • Source_char is the input that we give. It can be CHAR, VARCHAR2, NCHAR, NVARCHAR2, CLOB, or NCLOB.

  • Pattern is a text field which can be CHAR, VARCHAR2, NCHAR, or NVARCHAR2.

  • Position: Tells the position from where searching needs to start. It is a positive integer

  • Match_param :It can be any of these
'i' specifies case-insensitive matching
'c' specifies case-sensitive matching
'n' allows the period (.)
'm' treats the source string as multiple lines
'x' ignores white space characters.

For example

SELECT REGEXP_COUNT('123441441441', '441', 3, 'i') COUNT FROM DUAL;

     COUNT
----------
         3

Wednesday, 21 August 2013

Different Types of Logging Levels in OBIEE



Logging Levels
Logging Level Information That Is Logged
Level 0
No logging
Level 1
Logs the SQL statement issued from the client application.
Logs the elapsed times for query compilation, query execution, query cache processing, and back-end database processing.
Logs the query status like whether it is a success, failure, termination, or timeout.
Logs the user ID, session ID, and request ID for each query.
Level 2
Logs everything logged in Level 1.
Additionally, for each query,
Logs the repository name, business model name, presentation catalog name which is called as Subject Area in Answers.
Logs the SQL for the queries issued against physical databases.
Logs the queries issued against the cache.
Logs the number of rows returned from each query against a physical database and from queries issued against the cache, and the number of rows returned to the client application
Level 3
Logs everything logged in Level 2
Additionally,
Adds a log entry for the logical query plan, when a query that was supposed to seed the cache was not inserted into the cache, when existing cache entries are purged to make room for the current query, and when the attempt to update the exact match hit detector fails
Level 4
Logs everything logged in Level 3
Additionally, logs the query execution plan.
Level 5
Logs everything logged in Level 4
Additionally,
logs intermediate row counts at various points in the execution plan.
Level 6 & 7
For future purpose

Sunday, 18 August 2013

What is Informatica Repository?

A Repository is container of information about all the objects and activities done within the tools. it keeps references of remote objects like: Source Data Objects, Target Data Objects, Data Connections, etc. and some of the objects that are created by Informatica Tools are physically kept within Repository like: Mapping Programs and other reusable objects.

Informatica Repository is center point of all the tools and services. Interaction among the tools and services happens through the Repository.

Informatica Repository is physically maintained under a Database System like: Oracle, SQL Server, DB2, etc. while logical parts of the repository are for the Infomatica users and tools.

Repository is also known by the following other names:
1.  Data Dictionary
2.  Registry
3.  Catalog
4.  Meta Data

Saturday, 17 August 2013

Storage Tools Typically used in Big- Data

What is Big data!! It’s nothing but a collection of complex data which is difficult to process with the existing tools. The size of data ranges forms few dozen terabytes to many petabytes of data in a single data set. This data can be posts to social media sites, digital pictures and videos or any other information

Apache Hadoop is a very popular solution Big Data .For the storage of Big data we use different kind of storage like S3, Hadoop Distributed File System (HDFS)

  • Amazon S3 filesystem.

What is S3??
Amazon S3 (Simple Storage Service) is an online file storage web service (Internet hosting service specifically designed to host user files.) offered by Amazon Web Services. Apache Hadoop file systems can be hosted on S3, also Tumblr, Formspring, Pinterest, and Posterous images are hosted on the S3 servers.

S3 stores arbitrary objects (computer files,) up to 5 terabytes in size. This are stored in the form of buckets. It can store data from web applications to media files and we can retrieve from anywhere in Web.

  • Hadoop Distributed File System


The Hadoop Distributed File System (HDFS) is a portable file system built for the Hadoop framework.
HDFS is to store very large amount of data by sharing the storage and computation across many servers. HDFS stores large files with ideal file size is a multiple of 64 MB
Below are some of the organizations that are using Hadoop.

  • Facebook
  • Yahoo
  • Amazon.com
  • LinkedIn
  • StumbleUpon
  • Twitter
  • Google and many more companies…

Thursday, 15 August 2013

All about Flat File Data or Text File Data in Informatica

Text file data can be in any of the two formats :
1.  Fixed Record Length or Fixed Width Files
2.  Variable Record Length or Delimited Files



Fixed Record Length or Fixed Width Files

In case of Fixed Record Length:
1.  Size of each field is fixed and values have to be for the entire size. If the value is smaller than the size then some filler character(s) are used like: Zero, Spaces, etc. For example:
            1   INDIA***********1000
            2   CANADA*********2000

2.  No field separator (like comma) is used and no text Qualifier (like Single or Double Quotes) is used.

3.  Data from such file are extracted based on the starting position of the Field and Size of the field. For example:

COLUMN_NAME
POSITION
EMP_ID
Starting position 1 and size 3
EMP_NAME
Starting position 4 and size 15
SALARY
Starting position 19 and size 4

Variable Record Length or Delimited Files

In case of Variable Record Length
1.  Size of the fields is variable and they depend upon the actual size of the value.

2.  Fields values are separated by a Delimiter (like Comma, Pipe Delimiter) and values are enclosed within Text qualifiers (like Single or Double Quotes). For example:
            "1"|" INDIA "|"1000"
            "2"|" CANADA "|"2000"


3.  Values are extracted based on the position of Delimiters (like Comma).

####Also read about "How to use Flat Files"

Saturday, 10 August 2013

Introduction about Pig In Hadoop

Pig was developed at Yahoo around 2006 for the purpose of reducing the burden of complex mapper and reducer programs in Hadoop.To get idea on Hadoop,please check  “An Introduction to Hadoop!
Pig is just like how we use SQL query for Oracle. In Pig most of the operations are designed to transform the data at one shot. This includes transformations like filtering and joining two or more data sets. Pig's language layer currently consists of a textual language called Pig Latin

Why the name Pig?
Like the animal pig this can eat anything i.e. it can handle any kind of data sets. Hence the name Pig

Pig contains mainly two components
  • Pig Latin: This is the language used for this platform
  • Runtime environment: Infrastructure where Pig Latin programs are executed as MapReduce jobs.
 There are mainly three steps in Pig Latin script
Load, Transform & Dump
  • Load: This step is to load the Hadoop data that is stored in form of HDFS(Hadoop Distributed File System)
  • Transform: To transform the data using set of transformations
  • Dump: To dump the data to the screen directly or store somewhere in a file.
 Pig Latin can be extended using UDF (User Defined Functions) ,using which the user can write in Java, Python and JavaScript and then call directly from the language. 

Location of Back-End Programs (Procedures, Function and Packages) in Informatica

There are four databases where we may keep Back-end Programs used as Stored Procedures within Informatica:
1.  Source Database
2.  Target Database
3.  Repository Database
4.  External Database

Generally Source Database is opened by the Client only for Query purpose since any type of mistake during Data Manipulation or Data Definition may damage their database, so client never takes risk for Source Database. So Source Database is not the place for Stored Procedures.

Repository Database is generally for Informatica's internal use and any type of mistake may damage the entire Repository. So we should avoid using Repository Database for our use. So this database also is not for stored procedure.

Target Database is available to Data Warehouse Team for all kind of process including Query, Manipulation and Structure Definition. So this can be one location for Stored Procedures. If numbers of stored procedures are very high and they are used frequently then we should not keep under Target Database since data loading will be affected. In such cases it is better to keep Stored Procedures under External Databases.