How hard is it to digitalize a book?

Monday, November 14th, 2005
Many people have written to ask me how long it takes to create a searchable PDF (PDF+Text) document from a book. This weekend, I decided to time the process outlined here.

Book scanning is the most time-consuming method of digitizing paper. Unlike loose pages, which can be scanned using a sheet-fed scanner, book pages must be manually turned for each scan. A specialize book scanner can help to greatly reduce time it takes to make a quality scan. A traditional scanner is impractical for scanning more than a few pages.

I scanned a nine chapters, totalling 154 pages of text, including illustrations, and diagrams, for an average of 4.7 minutes of total time (manual scanning + conversion to PDF + OCR) per chapter. The average per-page processing time is approximately seventeen seconds (rounded up).

Here's the breakdown:


    Scan page to TIFF (Manual)       12 seconds
    Create PDF from TIFF                  1 second
    OCR to create PDF+TEXT            4 seconds

     Average processing per page: 17 seconds

Is the productivity gain worth the time investment?

Using the above method, my 1024 page legal text would take almost 5 hours to scan, so the question is, is the productivity gain worth the time investment?  Possibly. For important materials that are accessed frequently, randomly, and quickly, throughout their use, definitely yes.

Ultimately, it depends on how often the material will be accessed and in what context. In some cases, simply having a digital copy of the table of contents, index, and glossary may be sufficient. In this case, I've found the ability to create summaries and study guides simply by highlighting with a pen quite valuable. Since I'm using this book in digital form on a Tablet PC, the ability to annotate and search for text during the lectures has been a definite advantage - one that offsets the time invested in scanning. I can easily scan while listening to music or a podcast.

Though I'd certainly prefer to purchase the text in digital form in the first place and would gladly pay extra to have both book and a digital copy on CD, I think that, in some circumstances, the benefits gained from a digital copy can easily out weight the paper equivalent, even when one considers the time it takes to scan. This value proposition will certainly improve, once I am able to outsource the book scanning process to low-cost child labor, provided by the Mack children. :-)

Unexpected benefits:

In an unexpected way, I've actually found that the process of scanning actually helps acquaint me with the material. The page preview stays on the screen approximately 7-10 seconds between each page scan. This is just long enough to identify a several key pieces of information from the page and create a neural association between page, section, and topic. While I would not argue that this is a major benefit of book scanning, it is a noticeable one.

Paperless challenge update:

My paperless challenge project is going very well; in fact, it's going much better than I would have imagined. I love the fact that each day, for the past three weeks, I've ended the day with less paper than I started with and I can now find information in digital form quickly and easily.

I don't expect that everything I'm doing as part of this challenge will prove to be the most productive, nor do I think that a totally paperless existence is best. I am enjoying the process of discovering what does and does not work for me and I know that this experience will ultimately help me recommend specific technologies to my clients.

Meanwhile, I'll continue to collect my observations in a mind map, possibly for a future blog or podcast.

Discussion/Comments (5):

How hard is it to digitalize a book?

Well, we can quickly find out how much a digital version of this book should be worth to you: just multiply 5 hours times your hourly rate and add the original cost of the book.

Publishers, are you listening?

Posted at 11/14/2005 11:27:58 by Scott


How hard is it to digitalize a book?

Do you still find the paper copy useful now that you have a scanned copy?

If not, then chopping up the book and sending it through an ADF should cut off a few hours of labour. Although that would destroy the original, so is the price of the book worth saving page flipping time?

I recently scanned in a few textbooks with a cheap HP multi-function. While a thousand pages still takes 6-7 hours of computer time, I was only involved for about 10 minutes of slicing pages out. Then another two or three minutes per hour refilling the feeder and fixing the occasional paper jam. Generally most scans were jam free, except when the paper was too thin.

Posted at 11/15/2005 16:15:45 by Bryan


How hard is it to digitalize a book?

Hey eric...

I have a Tecra M4 and my laptop doesnt go on standby or even hibernate!!

Do u have any idea why this happens??

What do u think might be the solution??

Posted at 11/18/2005 7:46:16 by Ahmed


How hard is it to digitalize a book?

Does the quality diminish significantly from digitalizing a textbook?? Are your pdf's or whatever format you use clear and readable??

Posted at 12/14/2005 0:14:48 by Anonymous


How hard is it to digitalize a book?

I wouldn't pay extra for the electronic version to be included. Book production requires that it be created anyway so the publisher should just include it or make it available to download.

Posted at 04/02/2006 5:35:00 by Ian



Discussion for this entry is now closed.