Re: [NTLK] Newton Press and Project Gutenberg

From: DJ Vollkasko (DJ_Vollkasko_at_gmx.net)
Date: Sat Apr 03 2004 - 00:47:25 PST


>> I'm sure some people here have turned Project Gutenberg etexts into
>> newtonbooks before - is there an easy way to clean up the line breaks
>> without going through the entire file in a text editor? Anything that
>> runs on linux or mac system 7.5 will work for me :-)
>
>I'd clean up the text a bit first, then find out how many 'n's fit on
>one newt screen line. Then give that to par(1). That will preserve any
>fun stuff that transcriber did with the left margin (indents, etc).

Yes, but the Gutenberg etexts usually have hard linebreaks at the end of
each line. If I rewrap the line, the zigzaging stays, doesn't it?

That's why I'm so slow in making books - I put a lot of handcraft into
those babies: Saving the text as .rtf, manually removing the linebreaks and
fixing any obvious typos/scanpos/OCRpos ;=} and then loading it to Press.
After I created a package for *every* Newton format there is, I usually
read it myself and fix typos (sometimes only a while after release,
sometimes never, e.g. when my only Newt is dead and has the corrections for
the Guide to the SI and Books of Scroundrels on it... If I lose these,
they's gone...). Any way to speed up the line-break removing be appreciated.

DJV.

P.S.: Rhonda, there is one workaround: Don't use extexts with linebreaks.
Oh yeah, this really sounds like quite the moronic idiocyism ;=} , but to
my great joy I've found Gutenberg now has texts also as html pages. And
some etext sites offer texts as html (or rtf), too. These are nice to
process - cut & paste to .rtf, then load into Press using "File/Add" (or
paste into Press directly, if you don't want to proofread in an text editor).

-- 
This is the NewtonTalk list - http://www.newtontalk.net/ for all inquiries
Official Newton FAQ: http://www.chuma.org/newton/faq/
WikiWikiNewt for all kinds of articles: http://tools.unna.org/wikiwikinewt/


This archive was generated by hypermail 2.1.5 : Sat Apr 03 2004 - 13:30:00 PST