Re: [NTLK] PDFNewt

From: Vladimir Alexiev (Vladimir_at_worklogic.com)
Date: Tue Apr 09 2002 - 15:02:49 EDT


Having a PDF-to-book convertor would be very cool. It'd be also fairly hard
to implement. Consider:
- PDFs are hard-formatted for a certain page size.
-- Newt books are also hard-formatted but for a different size.
-- Eg you have to split a paragraph at the end of a page and restart it at
the beginning of the next page. (Get NewtsCape and play with it, this is one
of the Appearance options.)
-- So you'll need to collect the text and relayout it.
- PDF has less "structural" information than HTML.
-- Eg unless listed as a "bookmark", you can't tell that a certain line is
the title of a page.
-- there's not even guarantee that visually close (and semantically related)
texts will appear close to each other in the file stream.
-- The reason is that it's based on compressed Postscript which is a page
layout and specialized programming language, not a document representation
language.
-- that's the reason why PDF-to-HTML convertors don't work very well.
- the PKG format isn't well documented (IMHO). Eg I'm trying to make
MP3->PGK directly by diffing/patching some PKGs made with Padilla's
MP3Builder, but with little success.
Because of these difficulties I think you should first work on a
PDF-to-bookmaker convertor. If you succeed in this, you can then work to
skip the bookmaker and NTK steps.

> The problem of the native PDF format is its big size. The
> NewtonBook format economizes space

If you don't include fonts in the PDF, this is probably false.

> and the support is native (no more tools in memory, cool speed).

Agree, the best way is with a pdf->pkg convertor.

I may be available to help you some with your endeavor. I'm good with Perl,
which can be useful at least for prototyping.

-- 
Read the List FAQ/Etiquette: http://www.newtontalk.net/faq.html
Read the Newton FAQ: http://www.guns-media.com/mirrors/newton/faq/
This is the NewtonTalk mailing list - http://www.newtontalk.net



This archive was generated by hypermail 2.1.2 : Sun May 05 2002 - 14:03:15 EDT