Content-Transfer-Encoding: 7bit
Content-Type: text/plain;
charset=US-ASCII;
delsp=yes;
format=flowed
On Aug 23, 2006, at 4:48 PM, Frank Gruendel wrote:
>> Is 512 bytes 512 characters?
>
> Nope. The Newton's using the Unicode format, which is 2 bytes per
> character.
Just to be pedantic, that's not always strictly true. "Unicode"
basically just defines a set of glyphs and assigns them IDs. It's
how you encode those IDs that defines the byte length per glyph.
UTF-16, if I recall, is always 2 bytes per glyph, and I _think_
that's what the Newton uses.
UTF-8, on the other hand, can have glyphs anywhere from 1 to 3 bytes
long, even in the same string. (Ever seen three weird symbols on a
web page where there should have been a single apostrophe? That's
UTF-8 being misinterpreted as some other encoding!)
There is a fantastic article about all the history and madness of
string encoding here, for anyone who wants to learn more about this:
<http://www.joelonsoftware.com/articles/Unicode.html>
Computers: 60 years later and we still can't even get text figured
out. :)
Steven
http://panic.com/
http://stevenf.com/
-- This is the NewtonTalk list - http://www.newtontalk.net/ for all inquiries Official Newton FAQ: http://www.chuma.org/newton/faq/ WikiWikiNewt for all kinds of articles: http://tools.unna.org/wikiwikinewt/Received on Thu Aug 24 14:09:21 2006
This archive was generated by hypermail 2.1.8 : Thu Aug 24 2006 - 15:30:00 EDT