Monday, 17 September 2012

Impact of Deleting all Rows from a Dimension Table


What will happen when we delete all rows from a dimension table? We will see this with help of an example.
Consider the Item Master source table.

Item_type
Order id
Wire
100
Web
258
Cisco
789

As you can see there are two columns Item type and Order id in the source. When we make a dimension from the source table it will have surrogate key.

Surrogate key
Item_type
Order id
1
Wire
100
2
Web
258
3
Cisco
789

The surrogate keys are in order now.
Now suppose a user wants to delete all the rows in the Item master dimension. In that case he will truncate the table

Truncate table Item_master_DIM

When we truncate and load again it can be updating of existing record or insertion of new record.
Suppose that the Item type HSE is updated to ELEC and there is new item FURN, then the dimension will look like

Surrogate key
Item_type
Order id
4
ELEC
100
5
Web
258
6
Cisco
789
7
FURN
144

Here as you can see the surrogate key gets changed from the previous dimension table. All the columns are having new surrogate keys. The reason why it is changed is that the Fact table row for the Item_Master_Dim is still referring to surrogate key 1,2,3. And hence when Fact tries to join with the dimension it throws error since the surrogate keys are new and not matching

Hence it’s better to avoid truncating dimensions and instead update the existing rows and insert new record so that surrogate keys don’t change.
Please share your comments too.

1 comment:

  1. I have a different opinion. Type 2 dimensions are always updates of the existing dim (change end date) and insert of new related record with the same dim id (which is not unique). Your unique clustered index can be on the id and end date columns combined. The facts never change the relation to the dim id. The fact will be related to both the old value and new value in the dim table, based on the non-unique dim id. This is a relationship we would not enforce with a constraint, so a foreign key would not be defined, but a ligcial model would indicate the relationship, as would the data dictionary.

    ReplyDelete

Related Posts Plugin for WordPress, Blogger...

ShareThis