Re: [NTLK] Newton Press and Project Gutenberg

From: Peter H. Coffin (hellsop_at_ninehells.com)
Date: Sat Apr 03 2004 - 13:00:51 PST


On Sat, Apr 03, 2004 at 10:47:25AM +0200, DJ Vollkasko wrote:
> >> I'm sure some people here have turned
> >> Project Gutenberg etexts into newtonbooks
> >> before - is there an easy way to clean up
> >> the line breaks without going through the
> >> entire file in a text editor? Anything that
> >> runs on linux or mac system 7.5 will work
> >> for me
> >> :-)
> >
> >I'd clean up the text a bit first, then find
> >out how many 'n's fit on one newt screen
> >line. Then give that to par(1). That will
> >preserve any fun stuff that transcriber did
> >with the left margin (indents, etc).
>
> Yes, but the Gutenberg etexts usually have hard
> linebreaks at the end of each line. If I rewrap
> the line, the zigzaging stays, doesn't it?

Plain text emails also have hard line breaks at the end of each line.
par(1) handles that for you. Just for giggles, I'll wrap the quotes
above at 50 characters instead of my normal 72. You end up with a ragged
right by default, but that's not huge problem, since one of the first
changes is that

> That's why I'm so slow in making books - I put a lot of handcraft
> into those babies: Saving the text as .rtf, manually removing the
> linebreaks and fixing any obvious typos/scanpos/OCRpos ;=} and then
> loading it to Press. After I created a package for *every* Newton
> format there is, I usually read it myself and fix typos (sometimes
> only a while after release, sometimes never, e.g. when my only Newt
> is dead and has the corrections for the Guide to the SI and Books of
> Scroundrels on it... If I lose these, they's gone...). Any way to
> speed up the line-break removing be appreciated.

It'd be possible where the etext has some indication of which line
breaks are necessary and which aren't. Not all that I've seen do.

For example, most of my email compositions have blank lines between
paragraphs. That's a detectable way of knowing that the line break just
before a blank line is important. Similarly, a tab character or a fixed
number of spaces can also indicate that the prior linebreak should stay.

-- 
For every subject you can think of there are at least 3 web sites.
The owners of these web sites know each other and at least one of
them hates at least one of the others.
                -- mnlooney's view of Skif's Internet Theorem
-- 
This is the NewtonTalk list - http://www.newtontalk.net/ for all inquiries
Official Newton FAQ: http://www.chuma.org/newton/faq/
WikiWikiNewt for all kinds of articles: http://tools.unna.org/wikiwikinewt/


This archive was generated by hypermail 2.1.5 : Sat Apr 03 2004 - 13:30:00 PST