Skip to content

Scanning books

7 messages · 2008-03-05 → 2008-03-07 · Yahoo Group era · View archive on archive.org

Participants: Adam Trionfo, Robert "Exile In Paradise" Murphey, zxbruno

Preserved from the Timex/Sinclair 2068 Yahoo Group (2001–2019), which is no longer online. Text reproduced from the archive.org archive; email addresses masked.

Messages

1. Scanning Books

Adam Trionfo · Wed, 5 Mar 2008 09:18:

I exchanged a few emails with Rory about scanning and I asked his permission if I can post them on the group.  His suggestions would benefit anyone that is planning to do some scanning.


Adam,

I've started doing this with my all my Atari books and manuals. I've done about 50-75 manuals.

I'm using a Fujitsu SnapScan (or was it ScanSnap?) and converting them to editable PDF using a WinXP box. My scanner came with Adobe Acrobat Standard with the scanner (full version). Editable PDF means you can grab the text from the PDF and insert it into whatever you want through the clipboard, or whatever...

Heres what I do to prep a softcover book:

1.) Remove binding with T-square and Utility knife (very dangerous).  The Kinko's $5 charge seems fair. If you are not extremely handy with a utility knife, kinkos is the way to go. If you decide to use the knife/tquare route, place the tsquare across front of book to the approximate 'glue line' on the spine. drag the knife along tsquare edge using moderate pressure being VERY CAREFUL. Do not remove any pages until you have gone completely through the book.

2.) The Fujitsu scanner does duplex(both sides a once), but you can only feed about 25-40 pages at a time. If you time it correctly, you can add another small stack while its scanning the last page. 9 times out of 10 the scanner thinks it has more scanning to do.

3.) I then quality control the scan, fixing the page orientation if needed. Compress the PDF for Acrobat 7 compatibility (this usually shrinks it to 1/4-1/2 original file size of PDF.

4.) Depending on the rarity and overall shape of the book, rebind it or recycle the paper. Most books I've done are fairly common, so they get recycled.

Overall, I've had good success doing this. It takes some work, but its worth it.

FWIW, the first manual I did was the Mark Williams C Lexicon (800 pages - when compressed only 48 megabytes)

Thanks,
Rory McMahon
[email]

_________________________________________________________________
Climb to the top of the charts! Play the word scramble challenge with star power.
http://club.live.com/star_shuffle.aspx?icid=starshuffle_wlmailtextlink_jan

2. RE: Scanning books

Adam Trionfo · Wed, 5 Mar 2008 09:21:

Thanks for the advice, Rory. This summer I have TONS of Bally Astrocade scanning that I'll be doing. I was going to be using the scanner that I have, which is just a flatbed scanner (I scan one-page at a time). Perhaps I should get a sheet-fed scanner. For MOST of the Astrocade work I can't use a sheet-feed scanner, as I'm sure the scanner will jam.

I just checked some reviews for this fujitsu snapscan and they say the scanner is good, but you can't use TWAIN drivers, which is bad. Could you point me to a couple of examples of the books that you have scanned?

More comments on follow:

>>
I'm using a Fujitsu SnapScan (or was it ScanSnap?) and converting them
to editable PDF using a WinXP box.
>>

I've used Omnipage for this before and it works pretty well, but it spits up when it enounters code.

>>
Remove binding with T-square and Utility knife (very dangerous).
>>

Huh. I might try this and see how it goes. If I chop off a finger I'll let you know. ;-)

>>
The Fujitsu scanner does duplex(both sides a once), but you can only feed about 25-40 pages at a time.
>>

Online it says that you can feed 50 pages. I hate it when docs lie. It also says that it scans 18 pages a minutes, which seems damn near impossible. What's the truth. How does the orientation of the pages look when they are scanned; does the scanner keep the pages straight (I know that you said you have to "quality control" the page orientation-- do you have to do this often?). Do you scan in grayscale or B&W?

>>
Most books I've done are fairly common, so they get recycled.
>>

Does this mean you basically toss them?

Finally, do you think that I (or you) can post the email that you sent (and this answer) to the group-- as it's useful knowledge.

Adam
_________________________________________________________________
Helping your favorite cause is as easy as instant messaging. You IM, we give.
http://im.live.com/Messenger/IM/Home/?source=text_hotmail_join

3. RE: Scanning Books

Adam Trionfo · Wed, 5 Mar 2008 09:27:

Here is Rory's reply:


Adam Trionfo wrote:
>>
I just checked some reviews for this fujitsu snapscan and they say the scanner is good, but you can't use TWAIN drivers, which is bad. Could you point me to a couple of examples of the books that you have scanned?
>>

Sure, one is attached... The scanner is a direct to PDF, so yes, no TWAIN support. However, most of the time the text is selectable from the PDF, so copying text or code snippets can be easily exported. The text can also be searched, so finding certain words or phrases quite easily.

>>
I've used Omnipage for this before and it works pretty well, but it spits up when it encounters code.
>>

The PDF works pretty well except trying to decipher non-standard characters like Atari's ATASCII.

>>
Huh. I might try this and see how it goes. If I chop off a finger I'll let you know. ;-)
>>

Yes, very dangerous. I can't stress how dangerous. I have nearly injured myself 2-3 times. Kinko's is the way to go if you have ANY doubt. I should probably make a youtube video or something to show how I do it...

>>
Online it says that you can feed 50 pages. I hate it when docs lie. It also says that it scans 18 pages a minutes, which seems damn near impossible. What's the truth? How does the orientation of the pages look when they are scanned; does the scanner keep the pages straight (I know that you said you have to "quality control" the page orientation-- do you have to do this often?). Do you scan in grayscale or B&W?
>>

B&W unless the manual may require color for illustration. A color picture converted to B&W will leave a block of black. If the text appears to refers to certain pictures, best to use color. Color means a HUGE file. If the graphics are just carefully placed 'clip art', I just scan in B&W. I think for duplex scanning, it does about a page per minute. Sometimes Adobe 'rotates' the page so you will have to do a quick check and rotate the image 90 or 180 degrees. You can preview about 20-30 thumbnails, so QC'ing a doc doesnt take that long. 5-6 mins usually for large manuals...

As for speed, I put a stack in and come back in about 10 minutes to feed another stack in...

>>
Does this mean you basically toss [the books after you scan] them?
>>

Yes. A lot of the books that I scan are usually in bad shape, falling apart, some water damage, or marked up by previous owner or I already have a copy in better condition. These are backed up on several machines here at home. An uncompressed and compressed version. Those drives will be filled, backed up to DVDs and drives are disconnected and put in storage.

>> 
Finally, do you think that I (or you) can post the email that you sent (and this answer) to the group-- as it's useful knowledge.
>>

I don't having any problems with having it posted to the group, added to a FAQ, or whatever. Having to search a stack of PDFs instead of a pile of books makes this worthwhile to me. I can fit a lot more DVDs on the shelf than books... :)
_________________________________________________________________
Shed those extra pounds with MSN and The Biggest Loser!
http://biggestloser.msn.com/

4. RE: Scanning Books

Adam Trionfo · Wed, 5 Mar 2008 09:42:

Using Google I checked the web for FAQs on scanning books and 
found an interesting one.  If you're interested, you can read it here:

http://www.gutenberg.org/wiki/Gutenberg:Scanning_FAQ

This FAQ is on scanning books for OCR, but many of the topics hold
true even for scanning books as images.

Adam
_________________________________________________________________
Need to know the score, the latest news, or you need your Hotmail®-get your "fix".
http://www.msnmobilefix.com/Default.aspx

5. Re: [ts2068] RE: Scanning books

Robert "Exile In Paradise" Murphey · Wed, 05 Mar 2008 13:08

On Wed, 2008-03-05 at 09:21 -0800, Adam Trionfo wrote:
> For MOST of the Astrocade work I can't use a
> sheet-feed scanner, as I'm sure the scanner will
> jam.

The older the paper, the more likely.
Plus, the paper content... too smooth, no good.
Too rough, no good either. Must be expensive
special paper sold (coincidentally) by the
scanner manufacturer "to ensure best results."

> I've used Omnipage for this before and it works
> pretty well, but it spits up when it enounters
> code.

I wish I could find the combination of character
conditions in a row that caused this. It's the
main thing that aggravates me with that program.
Because of it, scan straight to PDF is practically
unusable for anything beyond 10 or pages, unless
you have tools that can stack multiple PDFs into
one.

And, its not just code. Tabular data seems to be
a big one too. Whatever it is causes pietro.exe
to die and good bye scan.

> Remove binding with T-square and Utility knife
> (very dangerous).

Indeed. Many blood sacrifices to the spirit
of EGGZACKTOE can happen.

Of course, the alternative could be someone trying
a jigsaw or some other Tim Allen/Home Improvement-style
"just needs more power" solution. Keep it safe.
Let the people with the machine and liability insurance
do the cutting, if you can.

> Huh. I might try this and see how it goes. If I chop
>  off a finger I'll let you know. ;-)

Typing could be harder.

> The Fujitsu scanner does duplex(both sides a once),
> but you can only feed about 25-40 pages at a time.
> 
> Online it says that you can feed 50 pages.

You can feed them. Doesn't necessary mean they
feed, scan, and stack correctly.
And 50 pages means "25 sheets of paper, with two
sides each". This avoids confusion for people
who don't use single-sided paper.

Sort of like how hard drive manufacturers make
sure to use K=1000 in all literature, so they
don't confuse us poor computer people who grew
up misinformed about the value of a kilobyte.

>  I hate it when docs lie. It also says that it scans
> 18 pages a minutes, which seems damn near impossible.

True story:
More than a decade ago now (ouch), I was tasked to
outfit a project to scan 2.5 million documents on
a 3 year contract. The documents dated back to the
1800s which means they would disintegrate if put
through your ordinary scanner. We ended up with a
scanner that scanned 100 page-per-minute, both
sides, using a vacuum and belt arrangement. The
vaccuum sucked the paper down to the belt (gently)
which then whisked the page past a top and bottom
scanner and dropped it in a tray at the other end.
You could stack 1000 pages at once (roughtly) in.
The hardest part of the whole project was keeping
the scanner fed, requiring a team of people to
prep the documents to go in.

So, the point of the rambling is that 100ppm was
possible long ago ... in a $60K scanner. I have no
idea what you're looking at, but roller-fed could
do 18ppm pretty well, nowadays. The biggest delays
aren't mechanical, their in the bus and computer
receiving the stream. Back then it required a
dedicated CPU with dedicated SCSI controller, and
a trunked network bus into a high-speed hardware
RAID array to get those documents done.

Today's USB and multicore systems could probably
handle it, if the mechanicals can.

>  What's the truth. How does the orientation of the
>  pages look when they are scanned; does the scanner
>  keep the pages straight (I know that you said you
>  have to "quality control" the page orientation--
>  do you have to do this often?). Do you scan in
>  grayscale or B&W?

And does the thing pull evenly on the pages, removing
the <censored> wriggles?

I'll spare you the other story about the Sun math
cluster used to straighten out thousand-plus foot
long log scans...especially when the logs have
grids on them that have to be re-aligned.

-- 
Robert "Exile In Paradise" Murphey
James Joyce -- an essentially private man who
wished his total indifference to public notice
to be universally recognized. -- Tom Stoppard

6. Re: Scanning books

zxbruno · Fri, 07 Mar 2008 05:43

Hmmm, you live and learn. Thanks for the advices on color, resolution,
etc. Today I also learned that Adobe Acrobat Pro can be used with a
plugin that adjusts all pages so that they all have the same
dimensions. Nice. Maybe I'll fix the robots book and upload it again.
Adam, if you go to www.worldofspectrum.org and click on the news link,
you'll find a link to the book. That's the zip file with jpgs (yes, I
know, I know...). The one in pdf should be up tomorrow, on that same page.

At work I asked about the binding removal. At Kinkos it costs $5, at
Staples it costs $2.

7. Re: Scanning books

Adam Trionfo · Fri, 7 Mar 2008 07:20:

>>
At work I asked about the binding removal. At Kinkos it costs $5, at
Staples it costs $2.
>>

I will have to go over there to have some bindings removed.  I plan to scan 
a couple of my favorite T/S 2068 books (not sure which ones yet).  I'm also
getting some 6800 CPU Assembly books (for use with the APF game console).
I plan to scan a few of those too.  There are also a few Z80 books that might
be interesting to scan.  Many of my Z80 books are hardcover-- I wonder if I
can even have those bindings removed?  I guess that I'll find out!

Adam
_________________________________________________________________
Helping your favorite cause is as easy as instant messaging. You IM, we give.
http://im.live.com/Messenger/IM/Home/?source=text_hotmail_join

Indexed under

Tape & library archiving (TAP/TZX) · Books & manuals