What will happen when we
delete all rows from a dimension table? We will see this with help of an example.
Consider the Item Master source
table.
Item_type
|
Order id
|
Wire
|
100
|
Web
|
258
|
Cisco
|
789
|
As you can see there are
two columns Item type and Order id in the source. When we make a dimension from
the source table it will have surrogate key.
Surrogate key
|
Item_type
|
Order id
|
1
|
Wire
|
100
|
2
|
Web
|
258
|
3
|
Cisco
|
789
|
The surrogate keys are in
order now.
Now suppose a user wants to
delete all the rows in the Item master dimension. In that case he will truncate
the table
Truncate table Item_master_DIM
When we truncate and load
again it can be updating of existing record or insertion of new record.
Suppose that the Item type
HSE is updated to ELEC and there is new item FURN, then the dimension will look
like
Surrogate key
|
Item_type
|
Order id
|
4
|
ELEC
|
100
|
5
|
Web
|
258
|
6
|
Cisco
|
789
|
7
|
FURN
|
144
|
Here as you can see the
surrogate key gets changed from the previous dimension table. All the columns
are having new surrogate keys. The reason why it is changed is that the Fact
table row for the Item_Master_Dim is
still referring to surrogate key 1,2,3. And hence when Fact tries to
join with the dimension it throws error since the surrogate keys are new and
not matching
Hence it’s better to avoid
truncating dimensions and instead update the existing rows and insert new record
so that surrogate keys don’t change.
Please share your comments
too.
I have a different opinion. Type 2 dimensions are always updates of the existing dim (change end date) and insert of new related record with the same dim id (which is not unique). Your unique clustered index can be on the id and end date columns combined. The facts never change the relation to the dim id. The fact will be related to both the old value and new value in the dim table, based on the non-unique dim id. This is a relationship we would not enforce with a constraint, so a foreign key would not be defined, but a ligcial model would indicate the relationship, as would the data dictionary.
ReplyDelete