Daily Archives: 25 June 2004

Nota Bene

I’m here at work but I don’t know if I’ll make it through
the day.

I had an interesting experience yesterday. It started with
problems editing a PDF file and ended with hours of formatting
a word processing document.

Many of you know that PDF files are not intended to be
edited. That is, you create the original in another program
such as a word processor and when done, and only when done, you
convert the document into a PDF. Once in PDF form, you really
don’t want to be making substantial changes to the PDF. This is
because, in essence, a PDF is a picture of a document just as a
jpg file is a picture of something you photographed or created
in a paint program. Hence, trying to edit a picture as if it
were a word processing document is not going to get you very
far.

Indeed, if you do need to make substantial changes, you go
back to your original word processing document and make the
changes there – then convert again to PDF.

A problem occurs when you have to make substantial changes
but don’t have the original word processing file but you do
have the PDF file. If the PDF came from a word processing
document and you saved the font into the PDF, you may be able
to make substantial changes to the PDF. But if the PDF came
from a scan of the hard copy, you’re pretty much toast because
all you can do is rescan the document and run it through your
Optical Character Recognition (OCR) software.

This is where things get hairy. Said software is far from
perfect even though it is getting better. You would think with
systems available to recognize handwriting that software would
be able to read printed documents. But you would be wrong
because much of how we recognize written ideas is through the
context.

For example, a numbered list gives order and is intended to
be seen as a whole. To OCR software, the numbers are just
characters and have no attachment to the words that follow.
Hence, even if the OCR correctly reads the characters, your
word processing software will not recognize the output as a
numbered list. Hence, you spend much time formatting the
document to create the context.

I don’t know if anyone has done a study as to what point it
becomes more efficient to type in a document versus trying to
make corrections and format an OCR read document. But with the
33 page document (a memorandum of agreement) in question, all I
can do is cut and paste parts of the OCR into a clean word
processing document rather than waste time making
corrections.

Advertisements

Mail Call

Date: Thu, 24 Jun 2004 21:36:32 -0700
From: JHR
Subject: Duck Story

Dan –

Since you’re going in for a Cat scan tomorrow, thought you might appreciate this (from my #1 son in Seoul):

Regards,

JHR

A woman brought a very limp duck into a veterinary surgeon. As she lay her pet on the table, the vet pulled out his stethoscope and listened to the bird’s chest. After a moment or two, the vet shook his head sadly and said, “I’m so sorry, your pet has passed away.”

The distressed owner wailed, “Are you sure?” Yes, I’m sure. The duck is dead,” he replied. “How can you be so sure”, she protested. “I mean, you haven’t done any testing on him or anything. He might just be in a coma or something.”

The vet rolled his eyes, turned around and left the room. He returned a few moments later with a black Labrador Retriever. As the duck’s owner looked on in amazement, the dog stood on his hind legs, put his front paws on the examination table and sniffed the duck from top to bottom. He then looked at the vet with sad eyes and shook his head. The vet patted the dog and took it out and returned a few moments later with a beautiful cat. The cat jumped up on the table and also sniffed the bird from its beak to its tail and back again. The cat sat back on its haunches, shook its head, meowed softly, jumped down and strolled out of the room.

The vet looked at the woman and said, “I’m sorry, but as I said, this is most definitely, 100% certifiably, a dead duck.” Then the vet turned to his computer terminal, hit a few keys, and produced a bill, which he handed to the woman. The duck’s owner, still in shock, took the bill. “$150!” she cried. “$150 just to tell me my duck is dead?!!”

The vet shrugged. “I’m sorry. If you’d taken my word for it, the bill would have been $20. But what with the Lab Report and the Cat Scan, it all adds up.”

Have a Great Weekend Everyone – Aloha!