Many people have written to ask me how
long it takes to create a searchable PDF (PDF+Text) document from a book.
This weekend, I decided to time the process outlined here.
Book scanning is the most time-consuming method of digitizing paper. Unlike
loose pages, which can be scanned using a sheet-fed scanner, book pages
must be manually turned for each scan. A specialize book scanner can help
to greatly reduce time it takes to make a quality scan. A traditional scanner
is impractical for scanning more than a few pages.
I scanned a nine chapters, totalling 154 pages of text, including illustrations,
and diagrams, for an average of 4.7 minutes of total time (manual scanning
+ conversion to PDF + OCR) per chapter. The average per-page processing
time is approximately seventeen seconds (rounded up).
Here's the breakdown:
Scan page to TIFF (Manual) 12 seconds
Create PDF from TIFF
1 second
OCR to create PDF+TEXT
4 seconds
Average processing per page: 17 seconds
Is the productivity gain worth the time investment?
Using the above method, my 1024 page legal text would take almost 5 hours
to scan, so the question is, is the productivity gain worth the time
investment? Possibly. For important materials that are accessed
frequently, randomly, and quickly, throughout their use, definitely yes.
Ultimately, it depends on how often the material will be accessed and in
what context. In some cases, simply having a digital copy of the table
of contents, index, and glossary may be sufficient. In this case, I've
found the ability to create summaries and study guides simply by highlighting
with a pen quite valuable. Since
I'm using this book in digital form on a Tablet PC, the ability to annotate
and search for text during the lectures has been a definite advantage -
one that offsets the time invested in scanning. I can easily scan while
listening to music or a podcast.
Though I'd certainly prefer to purchase the text in digital form in the
first place and would gladly pay extra to have both book and a digital
copy on CD, I think that, in some circumstances, the benefits gained from
a digital copy can easily out weight the paper equivalent, even when one
considers the time it takes to scan. This value proposition will certainly
improve, once I am able to outsource the book scanning process to low-cost
child labor, provided by the Mack children. :-)
Unexpected benefits:
In an unexpected way, I've actually found that the process of scanning
actually helps acquaint me with the material. The page preview stays on
the screen approximately 7-10 seconds between each page scan. This is just
long enough to identify a several key pieces of information from the page
and create a neural association between page, section, and topic. While
I would not argue that this is a major benefit of book scanning, it is
a noticeable one.
Paperless challenge update:
My paperless
challenge project is going
very well; in fact, it's going much better than I would have imagined.
I love the fact that each day, for the past three weeks, I've ended the
day with less paper than I started with and I can now find information
in digital form quickly and easily.
I don't expect that everything I'm doing as part of this challenge will
prove to be the most productive, nor do I think that a totally paperless
existence is best. I am enjoying the process of discovering what
does and does not work for me and I know that this experience will ultimately
help me recommend specific technologies to my clients.
Meanwhile, I'll continue to collect my observations in a mind map, possibly
for a future blog or podcast.
Discussion/Comments (5):
Well, we can quickly find out how much a digital version of this book should be worth to you: just multiply 5 hours times your hourly rate and add the original cost of the book.
Publishers, are you listening?
Posted at 11/14/2005 11:27:58 by Scott
Do you still find the paper copy useful now that you have a scanned copy?
If not, then chopping up the book and sending it through an ADF should cut off a few hours of labour. Although that would destroy the original, so is the price of the book worth saving page flipping time?
I recently scanned in a few textbooks with a cheap HP multi-function. While a thousand pages still takes 6-7 hours of computer time, I was only involved for about 10 minutes of slicing pages out. Then another two or three minutes per hour refilling the feeder and fixing the occasional paper jam. Generally most scans were jam free, except when the paper was too thin.
Posted at 11/15/2005 16:15:45 by Bryan
Hey eric...
I have a Tecra M4 and my laptop doesnt go on standby or even hibernate!!
Do u have any idea why this happens??
What do u think might be the solution??
Posted at 11/18/2005 7:46:16 by Ahmed
Does the quality diminish significantly from digitalizing a textbook?? Are your pdf's or whatever format you use clear and readable??
Posted at 12/14/2005 0:14:48 by Anonymous
I wouldn't pay extra for the electronic version to be included. Book production requires that it be created anyway so the publisher should just include it or make it available to download.
Posted at 04/02/2006 5:35:00 by Ian
Discussion for this entry is now closed.