Nightly data preparation is a common information processing procedure for many businesses. This is particularly true for companies rooted firmly in the data warehouse stage of data maturity. In the beginning, your data preparation runs quickly, but your nightly processes slowly take longer and longer. Eventually their completion time creeps toward the middle of the next day. Whether it’s due to slow performance, information overgrowth, or problematic pipelines, there are steps you can take to alleviate the pain.
The first approach to increase processing velocity is to improve your code logic and data organization. You can create partitions, indexes for lookups and helper tables to organize your information better. These improvements can significantly decrease run time and apply to almost all data architectures. However, there are limits to how much optimization you can do within the scope of certain technologies.
Throw Hardware at the Problem
When you can’t squeeze any more juice out of your system, sometimes you just need a bigger system. Many data warehouses are built as monolithic systems designed to run a single machine. In these cases, there are hard limits as to how large your system can grow. If you are running parallel processing database software, then you can throw more machines at the problem. However, there will be more costs for the licensing and maintenance of those additional machines.
If your data sources can handle it, moving towards a continuous processing architecture can greatly reduce the amount of data your warehouse has to chew through at night. Streaming systems and advanced batch systems can do wonders to spread the processing load throughout the day. However, always ensure that your systems have enough juice to allow all of your consumers to run queries while your new data is being simultaneously processed.
Migrate to Something New
At some point, you may not like the path that you’re on and may want to migrate to a different architecture. MPP systems, cloud data warehouses and open source solutions are all tools that can be used to build a better architecture.
When your business doesn’t have reliable and timely data, the cost isn’t limited to the time spent fixing and optimizing your warehouse. Lost revenue potential due to incomplete information and lost time spent waiting on data is very costly. Your company needs to have digestible data.