[Ham-Computers] RE: Best Way To Scan Text
WA5CAB at cs.com
WA5CAB at cs.com
Wed Aug 10 18:15:50 EDT 2005
Duane,
I mostly agree with Aaron, except that I probably wouldn't have mentioned
.JPG. And if the long term storage is going to be .TIF (not the OCR software
output), I would use multi-page, at least for the final storage file. And if
your OCR software can read it, but see further comments below.
I'm not clear from what you wrote whether your OCR software reads a
previously scanned file or reads directly the output of the scanner. If the former,
then scan to .TIF, multi-page if your software will handle it (less file
handling to worry with) and single page if not. If the latter, then it's a moot
point. But in either case, 300 dpi is adequate for text down to at least 4 Point.
However, if your final output (for longterm storage) is going to be the
output of your OCR software, then use whatever scan dpi your OCR software is
happiest with (i.e., produces the lowest error rate). Once the OCR conversion
takes place, dpi has no relevance.
You said that you have a B/W scanner. If that's literally true, that's fine.
However, if it can also be set to do gray scale (typically either 16 or
256), just be sure that it isn't. If it has Edge Erase capability, be sure that's
turned on and set to around 0.15" if the source documents are a true 8-1/2" x
11" (or other physical size that your scanner recognizes). If your scanner
also has Border Erase capability (a variant of Edge Erase where you can set the
amount to erase independently on all four sides), use that only if your
source documents have asymmetrical physical defects like staple or punch holes. Or
if the originals aren't a physical size your scanner is programmed to
recognize.
In a message dated 8/10/2005 4:28:23 PM Central Daylight Time,
dfischer at usol.com writes:
> I have special OCR software that will scan a page of text and convert it
> into
> normal edit capable and readable text that can be saved into different word
> processing programs. Such as Professional Write 2.22 my DOS program or MS
> Word
> 4.0 etc. It allows me to simply insert page after page into the B/W scanner
> and
> go.
>
> This is a very expensive program for the blind and print handicapped
> originally
> by Arkenstone, known as OpenBook Unbound. I paid $995 for this program, the
> scanner was a HP 3P and was $395 more in 1997.
>
> I think using it would be perfect Aaron, as long as no images are involved.
>
>
> The scan accuracy is better than 98% if the text is of decent contrast.
>
> So let's focus on me having to scan text pages that contain images with
> text, as
> that one none of this will work with. Text yes, image, no!
>
> Thank you very much sir!
>
> Duane W8DBF
>
>
> ----------
> From: Hsu, Aaron (NBC Universal) <aaron.hsu at nbcuni.com>
> To: 'I>Ham-Computers' <Ham-Computers at mailman.qth.net>
> Cc: 'Duane Fischer, W8DBF' <dfischer at usol.com>
> Subject: [Ham-Computers] RE: Best Way To Scan Text
> Date: Wednesday, August 10, 2005 5:11 PM
>
> Scan at 300dpi, saved as a .TIF. You can also save as a .JPG, but use the
> least amount of compression possible. Save each page individually with a
> page number as part of the filename. If you're doing multiple documents,
> create a folder with the name of the document and then save each page as a
> page number. This is the easiest way to do it and will allow printable
> copies - caveat is that scans saved as graphics take up more disk space.
>
Robert Downs - Houston
<http://www.wa5cab.com> (Web Store)
MVPA 9480
<wa5cab at cs.com> (Primary email)
<wa5cab at houston.rr.com> (Backup email)
More information about the Ham-Computers
mailing list