From: Timothy Larkin (tsl1_at_cornell.edu)
Date: Wed Aug 03 2005 - 19:18:39 PDT
On Aug 3, 2005, at 8:38 PM, bill bennett wrote:
> I "think" (but can't remember for sure) this is
> because they come from Macintoshes with non-US
> character settings. Or could they be coming from
> Newts?
My brief investigation of the matter suggests the problem is not
related to Macs, but occurs because of incorrect decoding of MIME by
some agent in the process that takes a member's email and distributes
it to the list. See <http://www.helpdesk.umd.edu/documents/0/315/
rfc2045.txt>, which reproduces RFC 2045, which defines how the MIME
encoding works. In particular, Rule 1 states that any character may
be represented by a sequence consisting of an equal sign and the
hexadecimal notation of that character. Thus a space can be
represented as "=20". Rule 3 states that a space coming at the end of
a line MUST be represented as "=20" to avoid problems that would
arise when the mail transport system either pads out or trims a line.
The "=20", I think, represents a space at the end of a line created
by some text handling system that does a poor job of line breaking.
(At least that's the way I understand the RFC.) Since we all see this
"=20", it is probably not produced by our mail readers, but by some
agent higher in the chain. Since I see this only on mailing list
distributions, it may be a bug in some commonly used piece of mailing
list software.
As an example,
> Can anyone tell me where to find a live links to download the =20
> 2100=92s manual?
> For my Pismo, or to put on the Newton, or both; I don=92t mind.
The "=92" probably represents a curly single quote in some character
set which is unknown to the MIME decoder.
Tim Larkin
The relevant rules from RFT 2045 are these:
(1) (General 8bit representation) Any octet, except a CR or
LF that is part of a CRLF line break of the canonical
(standard) form of the data being encoded, may be
represented by an "=" followed by a two digit
hexadecimal representation of the octet's value. The
digits of the hexadecimal alphabet, for this purpose,
are "0123456789ABCDEF". Uppercase letters must be
used; lowercase letters are not allowed. Thus, for
example, the decimal value 12 (US-ASCII form feed) can
be represented by "=0C", and the decimal value 61 (US-
ASCII EQUAL SIGN) can be represented by "=3D". This
rule must be followed except when the following rules
allow an alternative encoding.
(3) (White Space) Octets with values of 9 and 32 MAY be
represented as US-ASCII TAB (HT) and SPACE characters,
respectively, but MUST NOT be so represented at the end
of an encoded line. Any TAB (HT) or SPACE characters
on an encoded line MUST thus be followed on that line
by a printable character. In particular, an "=" at the
end of an encoded line, indicating a soft line break
(see rule #5) may follow one or more TAB (HT) or SPACE
characters. It follows that an octet with decimal
value 9 or 32 appearing at the end of an encoded line
must be represented according to Rule #1. This rule is
necessary because some MTAs (Message Transport Agents,
programs which transport messages from one user to
another, or perform a portion of such transfers) are
known to pad lines of text with SPACEs, and others are
known to remove "white space" characters from the end
of a line. Therefore, when decoding a Quoted-Printable
body, any trailing white space on a line must be
deleted, as it will necessarily have been added by
intermediate transport agents.
-- This is the NewtonTalk list - http://www.newtontalk.net/ for all inquiries Official Newton FAQ: http://www.chuma.org/newton/faq/ WikiWikiNewt for all kinds of articles: http://tools.unna.org/wikiwikinewt/
This archive was generated by hypermail 2.1.5 : Thu Aug 04 2005 - 12:30:03 PDT