Re: [NTLK] Newtontalk strangeness

From: Timothy Larkin (tsl1_at_cornell.edu)
Date: Wed Aug 03 2005 - 19:18:39 PDT


On Aug 3, 2005, at 8:38 PM, bill bennett wrote:

> I "think" (but can't remember for sure) this is
> because they come from Macintoshes with non-US
> character settings. Or could they be coming from
> Newts?

My brief investigation of the matter suggests the problem is not
related to Macs, but occurs because of incorrect decoding of MIME by
some agent in the process that takes a member's email and distributes
it to the list. See <http://www.helpdesk.umd.edu/documents/0/315/
rfc2045.txt>, which reproduces RFC 2045, which defines how the MIME
encoding works. In particular, Rule 1 states that any character may
be represented by a sequence consisting of an equal sign and the
hexadecimal notation of that character. Thus a space can be
represented as "=20". Rule 3 states that a space coming at the end of
a line MUST be represented as "=20" to avoid problems that would
arise when the mail transport system either pads out or trims a line.

The "=20", I think, represents a space at the end of a line created
by some text handling system that does a poor job of line breaking.
(At least that's the way I understand the RFC.) Since we all see this
"=20", it is probably not produced by our mail readers, but by some
agent higher in the chain. Since I see this only on mailing list
distributions, it may be a bug in some commonly used piece of mailing
list software.

As an example,

> Can anyone tell me where to find a live links to download the =20
> 2100=92s manual?
> For my Pismo, or to put on the Newton, or both; I don=92t mind.

The "=92" probably represents a curly single quote in some character
set which is unknown to the MIME decoder.

Tim Larkin

The relevant rules from RFT 2045 are these:

   (1) (General 8bit representation) Any octet, except a CR or
           LF that is part of a CRLF line break of the canonical
           (standard) form of the data being encoded, may be
           represented by an "=" followed by a two digit
           hexadecimal representation of the octet's value. The
           digits of the hexadecimal alphabet, for this purpose,
           are "0123456789ABCDEF". Uppercase letters must be
           used; lowercase letters are not allowed. Thus, for
           example, the decimal value 12 (US-ASCII form feed) can
           be represented by "=0C", and the decimal value 61 (US-
           ASCII EQUAL SIGN) can be represented by "=3D". This
           rule must be followed except when the following rules
           allow an alternative encoding.

     (3) (White Space) Octets with values of 9 and 32 MAY be
           represented as US-ASCII TAB (HT) and SPACE characters,
           respectively, but MUST NOT be so represented at the end
           of an encoded line. Any TAB (HT) or SPACE characters
           on an encoded line MUST thus be followed on that line
           by a printable character. In particular, an "=" at the
           end of an encoded line, indicating a soft line break
           (see rule #5) may follow one or more TAB (HT) or SPACE
           characters. It follows that an octet with decimal
           value 9 or 32 appearing at the end of an encoded line
           must be represented according to Rule #1. This rule is
           necessary because some MTAs (Message Transport Agents,
           programs which transport messages from one user to
           another, or perform a portion of such transfers) are
           known to pad lines of text with SPACEs, and others are
           known to remove "white space" characters from the end
           of a line. Therefore, when decoding a Quoted-Printable
           body, any trailing white space on a line must be
           deleted, as it will necessarily have been added by
           intermediate transport agents.

-- 
This is the NewtonTalk list - http://www.newtontalk.net/ for all inquiries
Official Newton FAQ: http://www.chuma.org/newton/faq/
WikiWikiNewt for all kinds of articles: http://tools.unna.org/wikiwikinewt/


This archive was generated by hypermail 2.1.5 : Thu Aug 04 2005 - 12:30:03 PDT