Data Validation – Is It Worth the Costs?
This article discusses how poor or inaccurate data can negatively impact a business's bottom line
. Please keep in mind, this is based on cost estimates and will vary from case to case. The goal is to gain an understanding of the general process and potential impact that this could have on your business. With any validation project, you should start with a clear understanding of your goals and expectations as well as the initial quality level of your data. Any data validation project should start with a data quality audit to understand the possible scope of the project. Most data validation consulting companies can provide this.
Data validation can be a relatively expensive process. If this is the case, is there a valid business case supporting moving forward with a data validation program? Data validation is a process where a database is analyzed and updated. Some of the more common activities are as follows:
Identify inaccurate records or data
Remove spelling errors
Identify and remove obsolete records
Remove duplicates also called
deduplicationUpdate and correct incomplete records
Imagine a small database with 100,000 records containing critical customer information. A thorough data validation project for this database might take between 1000 and 8000 man hours to check and confirm each individual record. This would include checking each contact name, title, address, telephone numbers and any other important information critical to your business. For the sake of this article, let's estimate on the low side. If a project takes 1000 man hours, what would your internal costs be? Who would be qualified to check this information? Obviously, you will not utilize senior staff. At the same time, would you trust your critical information to a temp? In general, temps are not a viable solution. Often, a marketing or sales related team would be involved. Internal hourly costs might range from $20 to $35 per hour. This does not include lost opportunity costs. So, assuming a 1000 hour project at $20 per hour, this would cost approximately $20,000 on the low end. It could cost as much as $280,000. WOW! Why is there such a huge disparity in prices? This is partially due to the fact that there could be a wide range of variability in the data. If there are a high number of duplicates, the costs will go down versus if all 100,000 records are unique. If every record is unique, each one will need to be validated individually. Another variable is the number of details per record that need to be validated. Finally, the type of information involved can impact costs. Different resources are utilized to validate information depending on the types of data in the database.
How can you justify a project that could cost between $20,000 and $280,000? Aside from the costs, can a company afford to tie up internal resources for hundreds or thousands of man hours? The immediate answer is no. At the same time, can a company afford not to ensure that their information is accurate and up to date? What are the costs of erroneous data? Here are a few interesting facts to consider:
20% of all information may be entered incorrectly or inconsistently at the time if entry. This is the most cost effective time to validate information.
Batch validation and correction can cost up to 100X more when you factor in data storage costs, man power and various internal business expenses associated with erroneous data.
According to the US Post Office 1.2% of all addresses in bulk mailings will change every month. How does this affect your database over one year's time? As much as 15% of your data may be incorrect, assuming that it was correct to begin with.
Almost 25% of all mailing addresses have something wrong.
Employee Turnover is a major factor. Conservative numbers are exceeding 25% for North America.
What does this all mean and what is the cost impact? For a worst case scenario, let's assume the following:
100,000 records with 15% duplicates
15% of the information is incorrect due to entry errors
25% of the information is incorrect due to employee turnover
Regular Direct Marketing Campaigns and Analysis 4 quarterly mail campaigns $0.22 postage, $0.25 mailer materials
40 hours prep time - $20 per hour.
Total = $47,800 x 4 = $191,200
12 monthly e-mail news letter campaigns (Not including IT costs) 40 hours prep time - $20 per hour
Total = $800 x 12 = $9,600
2 semi-annual market/business analysis projects 80 hours analysis - $40 per hour
Total = $3,200 x 2 = $6,400
Grand Total of Direct Marketing $207,200
Losses due to duplicates $207,200 x 15% = $31,080
Losses due to erroneous entry $207,200 x 15% = $31,080
Losses due to incorrect information employee turnover $207,200 x $25% = $51,800
Total potential wasted costs $113,960
If you assume that all of the potential data errors are mutually exclusive, you could be looking at having 55% or more of your information and efforts wasted. If you look at a more conservative estimate where there is error overlap, you might be looking at 25% to 35%. This information is driving your business and costing you money at the same time. Is an accuracy rate of 65% acceptable? Back in school, that would not have been a passing grade! Also, is it acceptable to be wasting 25% to 35% of your hard earned revenue dollars? NO!
So what is the solution? One excellent option is to farm out the data validation. Data validation firms provide data quality audits which can give you a ballpark estimate of the amount of errors in your data. This will help you estimate the scope of your project. They can also provide the actual data validation at substantially reduced rates. Typical data validation costs range between $2 to $5 per record and the project can be completed in a fraction of the time without impacting internal resources.
Data Validation Is It Worth the Costs?
By: Jody Williquette
Mac Data Recovery In Case Of Accidental Deletion Options for Recovering Data After Your Hard Drive Fails Is it Possible to Recover Data Files When Your Thumb Drive Fails? Basics Of Web Data Mining And Challenges In Web Data Mining Process Where to Find a Part Time Data Entry Job Alcohol And Abuse The 4 "must-know" Myths Of Recovery How To Recover Lost Data In Quick Time Exactly what to Examine for in a Pasadena Rehab and Substance Recovery Center Recovery Time: One Of The Many Advantages Of Laser Liposuction Is There Anything I Can Do To Ensure A Safe And Timely Recovery From My Liposuction? Recovering QuickBooks database when the QuickBooks 2008 file is unable to open The Fed's Role In Crisis And Recovery Minimally Invasive Spine Surgery Facilitates Faster Recovery