Re: [NTLK] [ANN] OPML support for IC/VC

From: Steven Frank <stevenf_at_panic.com>
Date: Thu Aug 24 2006 - 14:06:55 EDT

Content-Transfer-Encoding: 7bit
Content-Type: text/plain;
        charset=US-ASCII;
        delsp=yes;
        format=flowed

On Aug 23, 2006, at 4:48 PM, Frank Gruendel wrote:

>> Is 512 bytes 512 characters?
>
> Nope. The Newton's using the Unicode format, which is 2 bytes per
> character.

Just to be pedantic, that's not always strictly true. "Unicode"
basically just defines a set of glyphs and assigns them IDs. It's
how you encode those IDs that defines the byte length per glyph.

UTF-16, if I recall, is always 2 bytes per glyph, and I _think_
that's what the Newton uses.

UTF-8, on the other hand, can have glyphs anywhere from 1 to 3 bytes
long, even in the same string. (Ever seen three weird symbols on a
web page where there should have been a single apostrophe? That's
UTF-8 being misinterpreted as some other encoding!)

There is a fantastic article about all the history and madness of
string encoding here, for anyone who wants to learn more about this:

   <http://www.joelonsoftware.com/articles/Unicode.html>

Computers: 60 years later and we still can't even get text figured
out. :)

Steven
http://panic.com/
http://stevenf.com/

-- 
This is the NewtonTalk list - http://www.newtontalk.net/ for all inquiries
Official Newton FAQ: http://www.chuma.org/newton/faq/
WikiWikiNewt for all kinds of articles: http://tools.unna.org/wikiwikinewt/
Received on Thu Aug 24 14:09:21 2006

This archive was generated by hypermail 2.1.8 : Thu Aug 24 2006 - 15:30:00 EDT