synthetikal.com Forum Index


Advice on making ebooks with Adobe Acrobat 7.0.1
Page 1 of 1
Post new topic   Reply to topic    synthetikal.com Forum Index -> E-Books
Author Message
Lief

Joined: 16 Feb 2005
Posts: 112
4494.38 Points

Sat Jun 18, 2005 9:17 pm
Reply with quote

isaacbh

Total Messages 3

Subject: Tips for converting a book to PDF?

--------------------------------------------------------------------------------

Hi, I'm converting two very old books to PDF. I'm a home user, using a consumer-level scanner. I'm using Acrobat 7.0.1. Here's what I did:

- Scanned the pages at 600dpi monochrome (300 is too choppy for my eyes), saved as 1-bit uncompressed TIFF.

- Ran the files through a deskew and cleaning program (GTX ScanCleanOffice).

- Additionally cleaned each page manually.

- Created the PDF in Acrobat through Create PDF -> From Multiple Files. (JBIG2 compression).

- Performed OCR in Acrobat. Output: Searchable Image (Exact). (It's important to keep the original look of the pages intact).

- Added links, bookmarks, metadata.

- Final saving using "Save As".

To give an example, the first book has 174 pages of black text, and the resulting PDF is 5.69MB (Compared to 2.74MB in DjVu).

I'm looking for tips on the process. Is there something I could do better? Is it OK to use "Reduce File Size" or "PDF Optimizer"? I heard it degrades the quality. I'll appreciate any comments. Thx.



--------------------------------------------------------------------------------
Posted: 21 Apr 2005 01:31 AM

Duff_Johnson

Total Messages 58

Subject: RE: Tips for converting a book to PDF?

--------------------------------------------------------------------------------

> - Performed OCR in Acrobat. Output: Searchable Image (Exact).
> (It's important to keep the original look of the pages intact).

Don't go with "Exact" - there's no need (unless you can tell the
difference, which I bet you cannot), and you are "crippling" the JBIG2
compression by doing so.

> To give an example, the first book has 174 pages of black
> text, and the resulting PDF is 5.69MB (Compared to 2.74MB in DjVu).

That's like comparing an 18-wheeler with a wheelbarrow...

> I'm looking for tips on the process. Is there something I
> could do better? Is it OK to use "Reduce File Size" or "PDF
> Optimizer"? I heard it degrades the quality.

All other things being equal (and it sounds like you did a good job with
the imaging part), the only way to get smaller sizes would be to go with
"lossy" Searchable Image.

You already have a size of 32k per page, which isn't bad. You can
likely do significantly better with conventional Searchable Image output
(not "exact").

Another thing to note... if there are any graphics at all in the
document, JBIG2 gets a lot less efficient with those images...
especially if they are photographs or similar. Consider replacing the
black-and-white image of the graphic with a JPEG image. Do NOT simply
"mask" the image... delete it form the black-and-white original first.

Duff Johnson
Document Solutions, Inc.
http://www.document-solutions.com




--------------------------------------------------------------------------------
Posted: 21 Apr 2005 02:09 AM

LeonardR

Total Messages 7524

Subject: Re: Tips for converting a book to PDF?

--------------------------------------------------------------------------------

At 09:34 PM 4/20/2005, p-pdf-paper-to-pdf Listmanager wrote:
>To give an example, the first book has 174 pages of black text, and the
>resulting PDF is 5.69MB (Compared to 2.74MB in DjVu).
>
>I'm looking for tips on the process. Is there something I could do better?
>Is it OK to use "Reduce File Size" or "PDF Optimizer"? I heard it degrades
>the quality. I'll appreciate any comments. Thx.
>

Do NOT use "Reduce File Size".

DO use PDF Optimizer (or 3rd party tools such as PDF Enhancer or
pdfCompressor).


Leonard

---------------------------------------------------------------------------
Leonard Rosenthol
Chief Technical Officer
PDF Sages, Inc. 215-938-7080 (voice)
215-938-0880 (fax)





--------------------------------------------------------------------------------
Posted: 21 Apr 2005 02:09 AM


Jim

Total Messages 34

Subject: RE: Tips for converting a book to PDF?

--------------------------------------------------------------------------------

>Scanned the pages at 600dpi monochrome (300 is too choppy for my eyes), saved as 1-bit uncompressed TIFF.

This is good. 600 dpi gives you the best character definition.

>Ran the files through a deskew and cleaning program (GTX ScanCleanOffice).

This is good. A deskewed page OCRs better and looks better.

>Additionally cleaned each page manually.

This is good. There can be artificacts in margin areas that cannot be automatically cleaned - it can only be done manually. Manual cleaning improves appearance and reduces file size.

>Created the PDF in Acrobat through Create PDF -> From Multiple Files. (JBIG2 compression).

This is good. Be sure to use JBIG2-lossy compression. Lossy is OK - one company refers to "lossy" as "perceptually lossless".

>Performed OCR in Acrobat. Output: Searchable Image (Exact). (It's important to keep the original look of the pages intact).

This is acceptable, however, be aware that other OCR programs like ABBYY will produce searchable text that is 40% more accurate that that produced by Acrobat.

>Added links, bookmarks, metadata.

Right - this finishes off the book presentation.

>Final saving using "Save As".

Right - this removes any redundant or unused data in the file for optimial size.

>To give an example, the first book has 174 pages of black text, and the resulting PDF is 5.69MB (Compared to 2.74MB in DjVu).

If you use JBIG2-lossy, your PDF file will probably be 3.x MB, or about 20% larger than the DjVu equivalent (which uses a variant compression method called JB2)

>I'm looking for tips on the process. Is there something I could do better? Is it OK to use "Reduce File Size" or "PDF Optimizer"? I heard it degrades the quality. I'll appreciate any comments. Thx.

These methods will only reduce the DPI resolution, so don't use them. Keep your 600 dpi images.

You can make the file a little smaller by using a third-party JBIG2 compression engine, such as JRAPublish or cVision, that is more efficent than the one in Acrobat.

If you use a third-party OCR engine, then your filesize will increase slightly because there is more recognized text in the file.

Consider setting the opening view of the book to "2-page viewing". Make sure that you have scanned the blank pages of the book. Then your pages will display in correct recto-verso order throughout the book.

It is good that you used Searchable Image - Exact, for the closest fidelity to the original.

If you publish on the web, make sure that Fast Web View is enabled (it enables by default when you do Save As). If you publish that DjVu file on the web also, store it in INDIRECT format on your server for faster web display.

Jim Rile



--------------------------------------------------------------------------------
Posted: 22 Apr 2005 11:57 PM

email profile




isaacbh

Total Messages 3

Subject: RE: Tips for converting a book to PDF?

--------------------------------------------------------------------------------

Thank you all for the help. You have given me some good pointers.

I've tried "Searchable Image (Compact)", but I don't like the results. It's significantly worse. I'll stick with "Exact".

I'll give JBIG2-Lossy a try.

Jim, you've mentioned Abbyy. I thought you can't use a third-party program for OCR with Acrobat. Can you? Or do you mean using Abbyy to create the PDF? I don't think it has JBIG2 compression. The ideal scenario is to use Abbyy for the OCR only, fix errors, merge the text in Acrobat with the images and create the PDF. Right now I'm stuck with the OCR of Acrobat, plus I can't fix errors (you only can in "Formatted Text" output according to the Help).

-- Isaac


--------------------------------------------------------------------------------
Posted: 26 Apr 2005 08:15 PM

profile




Duff_Johnson

Total Messages 58

Subject: RE: Tips for converting a book to PDF?

--------------------------------------------------------------------------------

> I'll give JBIG2-Lossy a try.

Excellent.

> Jim, you've mentioned Abbyy. I thought you can't use a
> third-party program for OCR with Acrobat. Can you? Or do you
> mean using Abbyy to create the PDF?

He meant using ABBYY to create the PDF, yes.

> I don't think it has
> JBIG2 compression.

True, but Acrobat does. Make the PDF with FineReader, then open it in
Acrobat to finish your work. Acrobat's Optimizer will give you the
JBIG2 option.

> The ideal scenario is to use Abbyy for the
> OCR only, fix errors, merge the text in Acrobat with the
> images and create the PDF.

Well... simply have FineReader make your PDF for you, then finish it in
Acrobat....

> Right now I'm stuck with the OCR
> of Acrobat, plus I can't fix errors (you only can in
> "Formatted Text" output according to the Help).

Yes... you need FineReader, Capture, OmniPage, or any one of several
OCR-to-PDF packages that include a text-correction module.

Duff Johnson
Document Solutions, Inc.
http://www.document-solutions.com



--------------------------------------------------------------------------------
Posted: 26 Apr 2005 10:22 PM

email profile




isaacbh

Total Messages 3

Subject: RE: Tips for converting a book to PDF?

--------------------------------------------------------------------------------

>True, but Acrobat does. Make
>the PDF with FineReader, then
>open it in
>Acrobat to finish your work.

I've looked at the PDF Format settings of FineReader. There's a "JPEG Quality" setting there. Does that mean it saves the images in JPEG? What I'm really asking is how FineReader saves the page images. Obviously I don't want to work in Acrobat on a PDF that already went through a lossy process.

>Acrobat's Optimizer will give
>you the
>JBIG2 option.

So I need to use PDF Optimizer to get JBIG2 on an existing PDF? I guess I'll need to turn off downsampling there to keep the original DPI. That's the only thing I see there that can affect quality. The rest just removes redundancies.

-- Isaac


--------------------------------------------------------------------------------
Posted: 26 Apr 2005 11:00 PM

profile




Duff_Johnson

Total Messages 58

Subject: RE: Tips for converting a book to PDF?

--------------------------------------------------------------------------------

> I've looked at the PDF Format settings of FineReader. There's
> a "JPEG Quality" setting there. Does that mean it saves the
> images in JPEG? What I'm really asking is how FineReader
> saves the page images. Obviously I don't want to work in
> Acrobat on a PDF that already went through a lossy process.

I would not be recommending it to you if it wasn't going to work.

If you are feeding (as I suspect) FineReader with bitonal (black and
white) G4 compressed (lossless) TIFFs, then FineReader will give you PDF
files with the same lossless images "inside". OK, you can likely mess
with FineReader's settings to get it NOT to do this, but I've described
the default behavior.

FineReader CAN handle color images, and the JPEG settings would apply
there.

> >Acrobat's Optimizer will give
> >you the
> >JBIG2 option.
>
> So I need to use PDF Optimizer to get JBIG2 on an existing
> PDF?

Correct.

> I guess I'll need to turn off downsampling there to keep
> the original DPI. That's the only thing I see there that can
> affect quality. The rest just removes redundancies.

Correct. Note that JBIG2 compression will mean that users with
Acrobat/Reader versions earlier than 5.x will NOT be able to open your
files.

Duff Johnson
Document Solutions, Inc.
http://www.document-solutions.com



--------------------------------------------------------------------------------
Posted: 26 Apr 2005 11:57 PM

email profile




Dickie

Total Messages 4

Subject: RE: Tips for converting a book to PDF?

--------------------------------------------------------------------------------

Hi

Please excuse my ignorance as a newbie. I am also trying to use a home setup to turn an old book into a searchable PDF that still looks like the original pages, and am looking for some help.

I have scanned several test pages (anywhere between 300 - 600 dpi) on a trusty HP5370C and saved as both .jpg and .tif files. I then opened them with Acrobat Pro 7, ran the OCR but found it very inflexible and difficult to use. It failed to even let me amend most inaccuracies and then words started overwriting each other.

My second attempt was to use Omnipage Pro 14, by importing the 600dpi .tif, running the OCR (ran like a dream!) and then exporting the result as 'Save to file', choosing the 'PDF with image on text' option. The results were horrible!! The quality of the image had degraded quite considerably, to the point that it was difficult to read when the PDF was opened with Acrobat.

Would be grateful for any help, but would prefer to find out how to improve output image quality from Omnipage. My very best wishes, Dickie


--------------------------------------------------------------------------------
Posted: 08 Jun 2005 08:33 AM



http://forum.planetpdf.com/wb/default.asp?action=9&read=48454&fid=18
Back to top
Display posts from previous:   
Post new topic   Reply to topic    synthetikal.com Forum Index -> E-Books All times are GMT + 5.5 Hours
Page 1 of 1

 



Powered by phpBB 2.0.11 © 2001, 2002 phpBB Group

Igloo Theme Version 1.0 :: Created By: Andrew Charron