Friday, February 3, 2017

Why most databases have so many useless and confusing objects in them

Recently I was working with a database system, trying to validate some output from an OLAP cube. I ran into discrepancies, and (with a lot of work and troubleshooting) narrowed it down to the fact that in one query, two tables were joined with the Product_ID field, and in another query, the two tables were joined via the dim_Product_ID field. These two fields, Product_ID and dim_Product_ID were identical most of the time, but not always! They were different enough to cause major problems trying to match numbers.

Here's another example of useless and confusing objects - the database system I'm working with now is littered with the remains of what look like three abortive attempts to create a data warehouse, all of them currently unused. But the fact that there's all these separate tables and databases out there, replicating very similar data in different formats - it's enormously confusing to anyone trying to do new work on the system, and trying to figure out what's being used and what's not.

Why is it? Why do databases become cluttered with unused objects, that cause massive headaches for anyone trying to use the data?

Here's what I think. It's a heck of a lot of work to try to figure out what's used and what's not, and archive off/delete the unused objects. Not only is it a a lot of work, but it's unrewarding and unappreciated work. You need to check with lots of people, try to figure out what might actually be used, compare lots of code and data.

And most managers don't really have a solid understanding of how much more work basic database tasks can be if the database contains a lot of leftover tables and other database objects. So the task of cleanup is never prioritized. Even though, over the long run, a clean, well-ordered database makes life so much easier for everyone that uses it.

The work that gets the glory and the praise is building new things - a brand new shiny data warehouse, for instance - even though it may just be adding to the general confusion in the database, especially (as happens so often) if it ends up not being used.


No comments:

Post a Comment