Saturday, 6 October 2012

Management of Data Growth in Data warehouse

How to manage a Data warehouse? This is what I would like to discuss in this post!

Datawarehouse is used for the storage of historical Data.Large data warehouses grow rapidly, with an annual growth rate of over 50% are now commonplace. These historical data even though huge are very important from the business perspective and hence needs to be managed at any cost. Infrastructure costs are other important factors which we need to minimize. Also we need to ensure that data is maintained in compliance to retention regulations.

Now we will talk about how to manage the data growth in data warehouses!

For managing the data inside a Datawarehouse we need to first check which data is actually used and which data is not used. This is really a tough and tedious task. In OLTP systems we can use various methods to check for the usefulness of data but in Datawarehouse it’s difficult because business users need all data in the Datawarehouse.

The only way to counter this is Constant monitoring of the Datawarehouse. Reports generated by BI tools (like OBIEE or BO) give us a good idea about how the data is being used. We get a picture about the customer and products that are using the data. We can deploy a Monitoring tool to track the usage of data .Using the above methods we can decide the importance of the historical data in Datawarehouse.

Once we have prioritized the data in Datawarehouse the next step is to decide how to mange it. There will be data that is no longer used in the Datawarehouse and also there are data that has still got importance with the business. For those data that is never used, we can Purge those and data is no longer used but needs to kept for compliance purpose we can Archive them. We can deploy an Archiving solution to maintain the data and move the data to archive.

Overall monitoring the Datawarehouse and prioritizing them for Archiving can help in maintaining the Datawarehouse. Apart from that there are technologies like Map Reduce when data is too unstructured

Please share your valuable thoughts also.....

