[NTLK] Removing mail addresses (Was: I want to publish NewtonTalk since 2000)

Ed Kummel tech_ed at yahoo.com
Mon Jan 9 18:19:32 EST 2012


I love redirection!
Back when I was running my Newton package server on my Windows 2000 server, I modified my HTTP/Get response to return headers for Scientific Linux. You should have seen my HTTP logs for hack attacks against Linux! But rarely a Windows attack!
I was experimenting with setting up Monowall to handle my routing and WiFi, so for testing I set my SSID to NSA_Open_Test and left it open just to see who would try to access it...Unfortunately, that didn't work so well...lots of curious people...so I changed it to TrojanVirus_WiFi. Now I rarely get anybody trying to access my WiFi! For a while there, I did Captive Portal and redirected connections to a webpage that performed a mock virus install!
Yeah...that was fun!

Ed
web/gadget guru

 
------------------------------------------------------------------------
"Oh Yeah...That's the stuff!"
Stewie Griffin, Family Guy


________________________________
 From: Frank Gruendel <newtontalk at pda-soft.de>
To: 'newtontalk digest users' <newtontalk at newtontalk.net> 
Sent: Monday, January 9, 2012 2:32 PM
Subject: [NTLK] Removing mail addresses (Was: I want to publish NewtonTalk since 2000)
 
> just run a find-replace command on the text files replacing the @ sign
with something meaningless and machine-unreadable.

I've briefly contemplated this, but decided against it for two reasons:

1) There are a lot of @s in our digests that aren't part of an E-Mail
address. It wouldn't exactly improve comprehensibility if those were simply
removed or replaced with something else.

2) I'm pretty sure that modern spam-harvesting software would have a fit of
the giggles if it encountered this kind of obfuscation.

For example, in our digests each post has a line that looks like this:

    From: "Enrico Caruso" <enricocaruso at sopranoharem.com>

If we just removed the @, we'd get

    From: "Enrico Caruso" <enricocarusosopranoharem.com>

If yours truly were to write E-Mail addresses harvesting software, it would
be coded like this:

a) Find a line that starts with "From:"
b) Find the character "<"
c) Remove this character and all that's to the left of it.
d) Check that the last character of what remains is ">". If it is, remove
it.

OK, that's the address that our stupid author obfuscated. Let's see what we
can do to resurrect it...

e) Find the rightmost occurrence of the character ".". 
g) Everything to the right of this position is the domain suffix.  Put it
away for later.
h) Remove the suffix including the ".". The remainder is the concatenated
mail and domain name.

For performance reasons the following two steps are restricted to the
namespace depicted by our suffix.

i) Feed what we have left into a reasonably current domain list.
j) Search for a domain whose name is a right-aligned substring of what we
have left.

k) If there is, remove the domain name from the right. What's left now is
our E-Mail name.
l) Add a "@" between the E-Mail name and the domain name. Add a "." to the
end, then add the domain suffix.
m) Check if this is a valid address.

Of course we could put more effort into obfuscating the address. But if we
did so, more of what should remain untouched would change, too. Apart from
that, people who write E-Mail address harvesting software are paid for
putting more effort into outsmarting us, and more often than not they
succeed.

In my opinion the safest way is finding EVERY E-Mail address and simply
replacing it with a fixed address that is... well... not what spammers
really want. For example 

    death_to_mail_address_harvesting_software_programmers at fbi.gov

Provided, of course, the American gouvernment doesn't have a problem with it
and is willing to establish this E-Mail account. Alternatively one could use
something like

    what_a_pity_you_did_not_find_an_address_here

Frank

-- Newton software and hardware at http://www.pda-soft.de



==================================================================== 
The NewtonTalk Mailing List - http://newtontalk.net/
The Official Newton FAQ     - http://splorp.com/newton/faq/
The Newton Glossary         - http://splorp.com/newton/glossary/
WikiWikiNewt                - http://tools.unna.org/wikiwikinewt/
====================================================================


More information about the NewtonTalk mailing list