| What is OCR? |
Optical Character Recognition (OCR) is a
process of scanning printed pages
as images on a flatbed scanner and then using OCR software to recognize
the letters as ASCII text. The OCR software has tools for both acquiring
the image from a scanner and recognizing the text. |
| Ideal Source Material
for OCR |
OCR works best with originals or very
clear copies and mono-spaced fonts
like Courier. If you have choices, use the following
source
material:
- 12 point or greater font size.
- Black text on a white background.
- A clean copy; not a fuzzy multi-generation copy from a copy
machine.
- Standard type font (Times, New Roman, etc.) Fancy fonts may not be
recognized.
- Single column layout.
|
| OCR
Limitations |
- Using text from a source with font size less than 12 points or from
a fuzzy copy will result in more errors.
- Except for tab stops and paragraphs marks, MOST document formatting
is lost during text scanning, (Bold, Italic &
Underline
are sometimes recognized).
- The output from a finished text scan will be a single column
editable
text file. This text file will always require spellchecking and
proofreading
as well as reformatting to desired final layout.
- Scanning plain text files or printouts from a spreadsheet usually
works,
but the text must be imported into a spreadsheet and reformatted to
match
the original.
|
| What Source Material
Doesn't Work Well for OCR? |
- Forms (especially with boxes and check boxes)
- Very small text
- Multi-generation fuzzy or blurry copies from a copy machine
- Mathematical formulas
- Draft copies of documents with hand-written revisions
- Fancy text and unusual fonts
- Handwritten text
|