How To Fight Spam
Spam, or unsolicited e-mail, has become
a tremendous problem that
won’t go away. Most resources on the net focus on a particular technique to deal with spam. However, an all-embracing approach gives you much better protection. In this article, I cover the main aspects of such an approach which works very well for me.
The three frontiers
Spam can be fought at roughly three frontiers: the social, the technical, and the legal level. This order also reflects the effectiveness of these approaches. For example, I expect anti-spam legislation to be only a mild deterrent for serious spammers so all you can do is try to sue a spammer after the damage has been done.
The social frontier, or: How not to be seen
The single most important and effective trick to avoid spam is to make yourself
invisible to spammers. Of course you have to be reachable so people need your e-mail address. But pass it to them over private channels: send them an e-mail, call them, write it on a scrap of paper for them. As soon as your e-mail address leaks into the public, you lost your most valuable line of defense: privacy.
It is important to understand what public means in this context. Obviously, web pages are publicly accessible. So don’t put up your e-mail address on your home page (see below for how to do so safely if you have to). But even if you do not have your own home page you are not safe. For example, a friend of yours could add your e-mail address to his home page for the next BBQ without any bad intentions. You might want to kindly raise her awareness of the sources of spam or refer her to this article.
Many websites ask you to enter personal information including an e-mail address although this information is completely irrelevant for them to provide their service (e.g. to download software or read news articles online). As long as you don’t really want them to send e-mails to you don’t tell them your address. For example, I supply my e-mail address to my insurance company but not to the New York Times. Just make up a random e-mail address where you are forced to supply one.
The bad thing about any public access to your e-mail address is that it is - by definition - accessible to anyone, even people like spammers. The problem is that your e-mail address is not safe among the herd. A spammer will write a program that crawls through the web to find e-mail addresses, break into databases to obtain them, they would even scan or type them in if there was something like a phone directory for e-mail addresses.
So keep your address private. As long as you can trust the people and companies you give your address to and make sure that it is not published elsewhere you will never have to delete a single spam mail.
The technical frontier
On the technical frontier the weapons of choice include obfuscation and millions of filtering techniques but it is equally important to have a rough idea how some of your communication gear works.
How to be hard to hit
The strategy of staying invisible to spammers extends into the technical domain, of course. Here is a bit of advice on keeping a low profile.
The amount of information archived on the web can be worrying. For example, most mailing lists have publicly accessible web archives and unfortunately, most of them do not obscure the e-mail addresses of the senders. The same is true for some news groups and the usenet. Other honeypots are any sites supporting user comments (news sites, blogs, etc.) to which you obviously should not supply your e-mail address.
How to confuse the enemy
Unfortunately, you sometimes want your e-mail address to be publicly available, so people who don’t know you personally can still reach you by e-mail - except spammers of course.
Spammers have two properties that we can use to our advantage: they are lazy bums and bad programmers. Spammers don’t trawl through millions of websites and note down e-mail addresses they find. Due to property one, they write programs to do that for them. Thanks to property two, these programs are pretty brain-dead and can be easily fooled.
To do that, you basically want to provide your e-mail address in a form that is hard to recognize for a piece of software but easy to recognize for a human being. Popular approaches are: mangling your address into something like bill [at] msn [dot] com, using special HTML codes for the @ character, or using an image rather than textual form.
I don’t really trust these techniques because eventually they might be overcome by spammers. Furthermore, most of them are awkward to use and only applicable where you have control over how your address is represented, i.e. on your own home page.
How to dodge elegantly
So you have to be reachable by e-mail and must make your address public - who says it has to be your real address? Having a public and a private address is good practice. But still one has to skim through heaps of spam to find the non-spam mails sent to the public address. This is where filtering can help.
Filters basically try to distinguish between spam and regular e-mail based on a set of rules. Unfortunately, filters sometimes classify spam as good e-mail or vice versa so they are not 100% reliable. There are gazillions of ways in which filtering can be used. I’ll describe those that work best for me personally.
How to use a wall
Server-side filtering takes out spam before it reaches your inbox. Such a service can provide you with a public e-mail address, filters out spams and forwards non-spam to your private address. You have to trust the service not to make your private address public and that their filters get it right, but usually they do.
An exception are mailing lists which a filtering service sometimes classifies as spam. In that case I suscribe both my public and private address to the mailinglist, disable mail delivery from the mailing list to the public address and I only post using the public address. Thus, I get all mails from the mailing list, replies can reach me, but I don’t have to worry about web archives which will only display my protected public address.
How to use armour
Client-side filtering deals with spam after it reached your inbox. You cannot benefit from it if you use your e-mail provider’s web interface to read your e-mail. In that case there is no such thing as a client. The advantage of client-side filtering is that it is under your control and can thus be more precise. The disadvantage is that you have to download both regular e-mail and spam before your filter can take action.
Most filters today use static rules like all mails from a certain sender are spam. Such rules can be overzealous or become obsolete at a depressing rate and spammers are always one step ahead of them. A recent trend in filtering is based on learning what a specific user regards as spam or not. It achieves very good results and is easy to use. This
Bayesian filtering classifies mail as spam, non-spam, or unsure. Even if it gets it wrong you can train it not to make the same mistake again. It keeps on learning to deal with new spammer tricks.
How to conclude
Keeping my private address private proves very effective for me. I receive about one spam mail per week on that address after it unfortunately slipped to the web for a little while.
My public addresses hosted at
despammed.com are basically bullet proof - maybe two spams a year, although very few non-spams do not make it through, as well.
My unprotected addresses at work receive a few spam mails per day. I attribute this low number mainly to rudimentary obfuscation on my web pages and not using them on dubious web forms or forums.
My client side filter
SpamBayes takes care of what slips through, which is not such an awful lot anyway. Its main use for me is to filter out mailing list mails on topics that I am not interested in.
So for the moment I can declare my personal war against spam as won. I hope others can benefit from the steps described.
Trackbacks
TrackBack is a system for enabling "conversations between weblogs". The links listed below are links to posts in other weblogs that reference this article:
To manually trackback this article use the following URL:
http://www.brain-dump.com/trackback/3
Post A Comment
Please try and keep your comments on-topic, informative and polite. Flaming and trolling is discouraged and may be deleted. In fact, we reserve the right to edit or delete any post for any reason.
Comments
This is a great article. I especially like how you describe ways for people to avoid getting spam at all. Rather than just saying what to do when the damage has been done and they drown in spam.
As to your solution: I use a similar approach. I have public and private addresses and it works quite nicely for me. The only problem is that I put the public address on my website in bot-readable form. So now it is no doubt listed in every spammer's address book. On average I get 50 spam mails per day through that address.
On brain-dump.com we use Hiveware's Email Enkoder ( http://www.hiveware.com/enkoder_form.php ) to protect our email addresses from being harvested. I wish I had known about this when I created my website.
To filter out spam I use two lines of defense. I use server-side filtering provided by my mail provider. The 10% or so of spam this doesn't catch I filter with SpamPal ( http://www.spampal.org/ ) plus Bayesian plugin ( http://spampalbayes.sourceforge.net/ ). This works pretty well for me. Only occasionally spam gets through to my inbox.
The only problem I have with filtering is that I have to check the mails that have been marked as spam for false positives.
Life would be so much easier without spam. I'm sick of getting gazillions of emails claiming they can help me elongate my member.
Hi,
Here a very interesting site about filtering tonns of spam.
http://www.acme.com/mail_filtering/
It is very insightfull of the mecanics of blocking with minimal resources. A must for email server administrator and ISP.
JPL
Filtering spam and hiding your identity is nice, but shouldn't we hit back those who inflict spam upon us in the first place: the senders or on who's behalf spam is sent?