Re: [NTLK] Newtontalk digest stopped by spam filter

From: Paul Guyot (pguyot_at_kallisys.net)
Date: Tue Jan 20 2004 - 23:08:22 PST


On Tue, 20 Jan 2004, Martin Joseph wrote:

> > Here's the header that the spam filter put in:
> >
> > X-Pmx-Spam: Gauge=XXXXXXXXXI, Probability=91%,
> > Report='KNOWN_ADVERT_DOMAIN 5, TONER 2.488, WEIRD_PORT 2.099,
> > NO_REAL_NAME 0.000, EMAIL_ATTRIBUTION 0, QUOTED_EMAIL_TEXT 0'
> >
> Sort of looks like it's not liking the domain newtontalk is coming from?
>
> My messages seem to come from mail.continuity.cx?
>
> Just a guess really.

Indeed, messages of the NTLK mailing list come from mail.continuity.cx
because NTLK is hosted by Bill Shamam who runs continuity.cx and this is
his very mail server. Nothing wrong here.

No spam filtering tool is perfect. Rule-based scoring ones like the one
that flagged the digest as spam probably are the best ones for distributed
spam filtering (i.e. spams is filtered on a server for everyone using the
same settings). However, there are false positives. The only way to avoid
false positives is to accept a high rate of false negatives (messages not
filtered as spams but spams nevertheless).

We use spamassassin at continuity (Bill Shamam also happens to host
Victor's and my mail and some of our domains) and my technique is to not
read anything that spamassassin scores above 6.3 and to file to a special
mail box e-mails between 5 and 6.3. This is mostly spams but I do get real
messages there. This works so far for the 50-100 daily spams I get (15%
of my incoming mail on average, more on weekends, less during week days).
The only case of false positives above 6.3 I had are messages from the
EIMS-L mailing list where people discuss about spams and even post their
own spams. I mean the only case I noticed, since with that amount of
spams, I don't check them all manually.

A solution would be to bounce all these e-mails. We could filter messages
on their bounceability (this isn't very hard and actually some commercial
software do this), as in testing the sender exists. If this worked,
spammers would try to use more and more spammees e-mail addresses as the
sender and we'll receive more bounces than spams. Look at the number of
bounces for stupidly configured anti-virus software we already get. No
serious system admin bounces viruses nowadays. (I still get e-mails from
my mother asking me if I think she has a virus on her iBook). The spammers
evolve rather quickly, at least a growing proportion of them. They reacted
to so-called bayesian filtering and no doubt that they start getting
victories over this technique.

Here, you can do the following:
- create a rule that takes precedence over the spam rule to file NTLK
digests (this is known as whitelisting)
- create this intermediate mailbox by filing everything with
Gauge=XXXX..XX to spam and Gauge=XXX..XX to spam? mailbox. The number of X
actually is the threshold for each category
- use bayesian (non-distributed) techniques that get better results (but
that are subject to false positives like everything else).

Paul

-- 
This is the NewtonTalk list - http://www.newtontalk.net/ for all inquiries
List FAQ/Etiquette/Terms: http://www.newtontalk.net/faq.html
Official Newton FAQ: http://www.chuma.org/newton/faq/


This archive was generated by hypermail 2.1.5 : Wed Jan 21 2004 - 00:00:01 PST