I used over 1000 different email addresses with different companies to track where spam originates

For a long time, I've given unique email addresses to companies so that I can track where spam comes from. I have a separate domain from my regular mail. On that domain, I set up a mail server to catch all the mail regardless of the user. When I give out my email address, I give their company name as the user name. For example, Facebook emails me at facebook@crazydomain.com. This allows me to track where mail came from based on the To: address. Gmail has a similar feature where you can add a keyword to your address with a plus symbol after your username. (you+facebook@gmail.com)

I've been doing this since 2001, but in 2014 I started saving all the mail in a database so I can mine it. Since then, I've accumulated 650,000 emails. This excludes my day-to-day email with people that I know personally.

Over the three and a half years, I received mail on 1382 usernames.

A notable statistic is that Facebook sent me 7000 emails - a good deal more than the 4400 emails LinkedIn sent me.

Where my spam comes from

The vast majority of the spam I receive is on email addresses used in domain registrations. I have had hundreds of domains with a dozen registrars for my own projects or for those of clients. Interestingly, these are on the decline. More recent domains get far fewer emails than domains registered a decade ago. (For various reasons, we cannot always use WHOIS privacy). A few other public listings, like the Federal contractor database and the business license database, are heavy hit.

The next largest category is the spammers who guess usernames. Many guess common names like support@ or sales@. A fair number are a bunch of random characters. Interestingly, the same random characters are used over and over for years... as though a spammer tried it once, didn't get a bounce, and added it to a list somewhere.

Finally, a large and growing list of specific merchants and organizations that have had a breach. The first was a backup tape from the Ameritrade brokerage that was lost in transit and fell into the hands of spammers. There are well-known breaches like LinkedIn and MtGox, but also some smaller companies. When I contact smaller merchants they usually cannot comprehend that they have suffered a breach.

Most of what I knew was wrong

This data surprises me. It appears that the vast majority of companies I have given an email address to do not share email addresses. Companies are largely honest!

It is annoying that most companies will put you on a mailing list even if you don't ask for it, but my experience is that unsubscribing from these lists does actually unsubscribe you. This makes sense because spam reports hurt their newsletter deliverability... and most newsletters are paying to go through commercial providers like MailChimp or Constant Contact.

The advice of not clicking Unsubscribe on random spam still holds - that may serve to confirm you are a valid address; but for companies you recognize, it is certain they have your email address already and it seems to be safe to unsubscribe to reduce your unwanted email.

Filtering spam

Out-of-the-box Spam Assassin is the most effective thing I have tried. It has so many tests that it catches more spam than any other filtering method I have tried. Adding tricks like greylisting may reduce spam by a few percentage points. All of the tricks I have tried filter less than 10% of the remaining spam, cumulatively.

One trick that does seem to help is to hold mail that is at 80% of the spam threshold and run it through Spam Assassin a second time a few hours later. Blacklist databases take a little bit of time to learn about new spammers, new spams, and new relay IP addresses.

Between Spam Assassin and the ability to completely trash mail on compromised addresses, I have very little spam these days. What spam does get through I use to further train Spam Assassin.