subject: Managing The Quality Of Information [print this page] Information processing is a business process that resembles a normal production process with familiar demands for managing both the quantity processed as well as the quality of the output.
For many business processes there is a continuous pressure to increase the output. There is also constant demand for quality which acts as a brake on this main process. In the information processing area this problem is solved by using two different types of processes; batch and online. The quality indicator is the mechanism that will define how much the output is lowered in order to increase the quality (of the information).
An example of how this is done in practice you could imagine The Yellow Pages. The books contain a variety of information about companies an each month a book is published with a selection of companies in a certain region. This cycle continues until the last region of the country is handled after which the book publishing process starts all over again with a series for the next year. Publishing these books is a process that requires quite some organizing; most important is that the information is correct. Yet companies (and company information) do change a lot. To maintain this information the company information in the data base needs to be checked with information from third parties (for example from the chamber of commerce).
Efficiency is important when organizing these activities and information systems can help to organize this by separating batch from online activities. Batch obviously is a completely automated process, online is where human interaction is required.
For example, in the batch process the company information in the database can be compared with this third party data. The batch provides a selection of companies where the match (of the two data sources) is less than 100 percent. This means that human interaction is required to (visually) check whether the third party information is significantly different from the base data. "Microsoft Corp" (the base data) on one hand will show a difference from "Microsoft Inc." (from the chamber of Commerce) but a human eye will notice that both refer to the same company.
As there are many other fields in which company information can differ (between the two sources), the (batch) process can calculate a value to each company date that represent the level of matching, where a perfect match is represented by a 100%.
The output of search engines work in the same way. Search outcomes are sorted according to this "match indicator level;" the lower the indicator the lower the quality of the match.
This type of indicator can be used as a selection mechanism between batch and online activities. For instance by using a rule that all matches with a level below 60% should be controlled by an agent.
In this way the quality of the output is managed; the batch process to increase the level of output, the online part with human interference to check and increase the quality.
This same technique could be (and perhaps is) used by managing article websites. When an article gets submitted to the site there are a series of check required, which could also be done by a batch process. These check have also a matching mechanism in them where (parts of) the article is checked against existing content on the Web. This could give a resulting match level indicating the probability of the article authenticity.
For yet other examples think about the IRS; all tax contributors are assigned a credibility indicator that is calculated in a batch. The indicator is derived in matching various other sources (banks and other financial institutions) and the way in which the tax form is filled in.
What remains to be done in these environments is to define the quality level; how much do you dedicate to batch processing and how match to online processing. This allocation question is what defines much of the quality of the overall output. It is about the question; "at what level (of the indicator) are you going to check?" That's all up to you.