[GreenKeys] OCR on PDF files [ was Quoth the RTTY...]

Sat May 20 16:52:37 EDT 2017

Adobe Acrobat includes an OCR function that takes pages in your images or
document, runs OCR, and embeds the text within the pdf file - and it
reduces the size of the file along the way.
TOOLS>RECOGNIZE TEXT>IN THIS FILE>SETTINGS>CLEARSCAN OUTPUT STYLE
There are a couple of advantages to this -
1 - the file now has text embedded so you can easily search for a text
string
2 - if you are posting the pdf on the web, search engines and other people
can search the content easily.
3 - the file size may be 1/4 - 1/2 the original image file size
4 - the image quality does not seem to suffer if the original image is good.
5 - you can use the "select text" tool within Acrobat or Reader to grab
some text off a page and copy it to a text file, web page, whatever.
6 - you can also use "select text" to input text from a book or manual to
your favorite TTY program!
https://www.youtube.com/watch?v=heVKTnZhaHE

The OCR isn't perfect, but it does a pretty good job, depending on the
clarity of the input, etc.
YMMV but I have scanned and posted many 1000's of Navy and Teletype manual
pages processed this way with good results. If you are scanning, you don't
have to run a separate OCR step - if you select CUSTOM SCAN, and "Make
Searchable (Run OCR)" under Document Settings.

FWIW - For stand alone OCR I have used OmniPage with excellent results - a
version was included with the flatbed scanner I bought. It even did a very
respectable job on a bunch of handwritten documents (hand printing not
script)

Good luck,
Nick England K4NYW
www.navy-radio.com

On Fri, May 19, 2017 at 6:41 PM, Sam Hallas <s.hallas at ntlworld.com> wrote:

> Jim Pruitt wrote:
>
>> Hello Sam. Can I ask you what you use for OCR?  Also how do you feed
>> the OCR software the data (scan a picture)?  I looked at the RTTY
>> Journal file and was wondering how to convert it to usable text (like
>> you did).
>>
>
> Fairly straightforward, Jim. The full version of Acrobat allows you to
> 'export all images' so I extracted the single page with the text and
> exported it to a JPG image. (You can buy second-hand copies of Acrobat
> legally from various sources).
>
> The OCR software I'm using is TextBridge Pro 11, which work bought for me
> some time ago. It's obsolete now but I also have a lite version of OmniPage
> which came free with my scanner.
>
> Both programs will open image files and you only need to drag a marquee
> round the required text and press 'Automatic' to start the OCR process.
>
> Hope that helps
>
> Sam
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.qth.net/pipermail/greenkeys/attachments/20170520/21272723/attachment.html>