Wednesday, July 30, 2008

Read, computer, read!

You could say I’m lazy. I could tell you that I like to work smart, not hard. We could debate the viewpoint while my computer does my work for me. Like recently, when the need arose to create a 16 page legal agreement, and there was only a printed hard copy to work from. I searched for a sneaky way out of having to type up all 16 pages again and discovered that my computer can read! I was pleasantly surprised and set it straight to work. Using an ordinary scanner, I scanned all pages into my computer using the black and white 300dpi or higher setting. Then I told the computer to go forth and recognise. I now know that the term scientists use is OCR which stands for optical character recognition and it is “the mechanical or electronic translation of images of handwritten, typewritten or printed text (usually captured by a scanner) into machine-editable text.” (sic Wikipedia).

Using the easy OCR wizard that came as part of the OmniPage software, which in turn shipped free with my Canon scanner, I was walked through the four steps of text recognition. These steps are: Start Processing (open program), Get Pages (scan document), Perform OCR (read, computer, read) and Export Results (dump text into MS Word). My printed template was recognised reasonably well; I would say it had 96% recognition and turned a scanned page of photocopied text into a page of text that I was then able to bring into my word processor and manipulate. The text layout was a bit all over the show, as bullet points and indents were wonky in places, but unless you’re a formatting fanatic, you could get by. As George and Fred Weasley would say: “Mischief Managed!”

Ever inquisitive, that made me wonder how well the OCR software would read my handwriting. Bad move, or shall we say: bad handwriting. Normally totally illegible, I put on my Sunday best and tested if the PC could make head or tail of it. My attempts failed miserably. When I wrote “Witness Geek”, the computer insisted I had said Wkne.’s Gee. My husbands attempts at “THE CAT IS BLACK” came back slightly better with the computer quoting “THE CAT IS LAC-K”. And that was written in bold painstakingly clear lettering that took both of us longer than it would have taken us to type a paragraph. So as for using this as solution to dump your lecturing notes, X-Nay, I’m afraid.

I got playful and took a photograph of my computer screen with this article open. I turned the photograph into a tiff file using my favourite photo manipulation program, FastStone, and told the computer to read, boy! I was pleasantly surprised that it came back with a very good rendition, the parts that were clear the OCR recognised 100%. So if there are any James Bond type spies out there, take note.

Doing a bit more reading on the internet to see what other people have discovered, I read that Microsoft Office ships with it’s own OCR facility inside Microsoft Office Document Imaging. Check to see if you have the program, too by clicking on Start, All Programs, Microsoft Office, Microsoft Office Tools, Microsoft Office Document Imaging. This program can view, manage, read and recognise text in image documents and faxes as well as reading documents straight off the scanner. I found it to actually work better than OmniPage.

No comments: