Re: [NTLK] Newton Book file sizes

From: Will Hartung (willh_at_msoft.com)
Date: Fri Apr 19 2002 - 13:44:00 EDT


From: "Paul Guyot" <pguyot_at_kallisys.net>
>
> Exactly. This is called Unicode UCS-2.
>
> >As soon as the book is sent to the Newton, the extra chars are
> >removed and your file size become half the size of the PKG when on your
> >hard drive.
>
> Well, not exactly. The package is compressed on the Newton. However,
> it seems that even if compression can reach 50% for text, it is not
> simply skipping every other byte. The text is still encoded in UCS-2,
> fortunately for every Newton user who doesn't think in 7 or 8 bits :)

It would be nice if they used the stream format for unicode, which leaves
off the extra byte for 7-bit characters (i.e. ascii), and then encodes the
extended characters as appropriate.

For most English work, this can be a pretty dramatic savings in space. I
don't know about more exotic languages however (being as in some
circumstances, the encoding will add a third byte to the character).

Now whether all of this effort actually results in a dramatic enough savings
to matter after compression, well, who knows?

Will Hartung
(willh_at_msoft.com)

-- 
Read the List FAQ/Etiquette: http://www.newtontalk.net/faq.html
Read the Newton FAQ: http://www.guns-media.com/mirrors/newton/faq/
This is the NewtonTalk mailing list - http://www.newtontalk.net



This archive was generated by hypermail 2.1.2 : Sun May 05 2002 - 14:04:31 EDT