Google sponsors Open Source OCR package

As reported on Slashdot, Google is sponsoring development of OCRopus, an "open source document analysis and OCR system" for Linux that will be made available under an Apache License 2.0. OCRopus will incorporate the company's Tesseract OCR engine but will be able to handle irregular page layouts better, and provide dictionary capabilities such as those found in other OCR packages.