I ran into a situation recently that required lots of troubleshooting, inconvenience and painstaking research. I was merging and normalizing some data, and using a particular data set (let's call it the Employee table) that should have been standardized. In other words, there should have been one set of data that everyone at the company used (ignoring for now issues like sensitive fields, such as Salary).
Instead, different groups in the company were using their own version of the Employee table - with different primary keys (when there were primary keys), categorized differently, some with the latest data, some with data up till the end of last year. Overall, the differences between the data sets weren't huge - probably 15% or less - but when you're creating a dataset that ultimately needs to be able be joined with a dataset from another group, you want to be able to match up without too much pain. Otherwise the numbers won't match, nobody will trust your data, and reporting and cross-division communication can grind to a halt.
So, what can we do about this? Here are some potential fixes to try:
- Be very clear with upper management about the costs of multiple data sources for domain data that should only have one master. It's tricky because the costs can be hidden - it just takes everyone much longer to do cross-departmental work, everyone needs to research discrepancies in datasets that should match, but don't.
- Have a company wide "Data" Q & A forum or email alias (a forum is preferable, because it's more easily searchable) where people can post data related question. Transparency is always good, and the more people know about the various domain data sources, the better decisions they can make.
- Get the group responsible for domain data management to be as responsive as possible to user requests. Usually the reason why people create their own datasets is because they can't get the original manager of the data to respond to their needs.
The underlying problem, which I believe is related to the lack of choice and competition internally within companies, is more difficult one. I'm hoping to write up another post on it soon