subject: Ocr Software - Optical Character Recognition Or Optical Crud Recognition? [print this page] If you are not familiar with OCR, this stands for Optical Character Recognition, a type of software that can convert printed text into searchable text for computers.
Using OCR in the right way can enable a user to find words easily on computer documents or page. Aside from this, an OCR user would be able to search keywords spanning an entire set of documents and retrieve each and every one of the pages with perfect accuracy. And unlike searching through books, which could take quite a while, OCR searches take only a matter of seconds.
However, this technology did not work well on older or poor quality documents that contained mixed fonts or combinations of texts and graphics. Until now! Due to several recent technology advances, it is now possible to obtain six-sigma level character accuracy from these types of document collections.
Although it is important to keep in mind that the quality and condition of the paper documents are still key factors in the successful OCR conversion, dramatically improved results can be obtained by enhancing the quality of the scanned image prior to processing. Noise removal is a feature found on some of the more advanced systems, and is used to eliminate some "dirt" from freshly scanned images.
Another exciting advancement in OCR would be the improvement in color filter rendering, which can reduce any undesired background colors - also, multi-light image capture technology can eliminate shadows caused by page creases.
Once document scanning and processing are complete, an OCR text layer can actually be added and hidden behind each image. An additional orientation filter can be used to ensure that the best image is presented to the OCR engines.
To achieve the highest conversion accuracy possible, the characters in the image can be processed using multi-engine OCR voting technologies that rank each character to determine the best text recognition fit. Words that are generated would then be filtered out with a comprehensive dictionary to make sure they are valid and to ensure quality output.
Once the text has been processed through OCR, it is now ready for use, and you can now search through OCR documents and libraries and be sure of high-quality search results and even higher-quality documents. Optical Character Recognition has indeed come a long way from Optical Crud Recognition.