Help - Search - Members - Calendar
Full Version: Journalist searching for historic spam data
SpamCop Discussion > Discussions & Observations > SpamCop Lounge
jimgiles
Hi All,

I'm a reporter with New Scientist magazine. I'm putting together a piece on the history of spam and am searching for data on historic spam levels. The most comprehensive dataset that I'm aware of is maintained by MAAWG, but it only goes back to 2005 (http://www.maawg.org/about/EMR). Does anyone know of older datasets? I've seen a few that are based on individual email accounts, but I need something a little more rigorous than that.

Any tips would be much appreciated!

Thanks,

Jim
rconner
I occasionally search for the same thing myself, but haven't had much luck. You might find something at the spam Links website (http://spamlinks.net/). You might also Google "spam corpus" referring to the collections of spam used to "train" Bayesian filters etc.

-- rick

Farelf
The question arises from time to time but unfortunately no-one really knows what they're looking at, in terms of collection methodologies and uncertainties about ISP filtering of various kinds. Assume you saw the topic http://forum.spamcop.net/forums/index.php?showtopic=10332 with its various links? Despite its title and initial direction it goes on to talk about general spam stats. And there are other discussions/topics in these forums - I'm not sure they will help but they may be worth some effort to find them if you're not getting much joy elsewhere.
michaelanglo
QUOTE(jimgiles @ Oct 13 2009, 11:28 AM) *
I'm a reporter with New Scientist magazine. I'm putting together a piece on the history of spam and am searching for data on historic spam levels. The most comprehensive dataset that I'm aware of is maintained by MAAWG, but it only goes back to 2005 (http://www.maawg.org/about/EMR).

One bit of lateral thinking that

(a) may help

(cool.gif interest your readers anyway.

would be to look for historic statistics on total email volumes

I assume that if the current assertion that 80% of email is spam is indeed true it was lower in the past
so we have an upper bound on spam level

HTH
StevenUnderwood
QUOTE(jimgiles @ Oct 13 2009, 06:28 AM) *
I'm a reporter with New Scientist magazine. I'm putting together a piece on the history of spam and am searching for data on historic spam levels. The most comprehensive dataset that I'm aware of is maintained by MAAWG, but it only goes back to 2005 (http://www.maawg.org/about/EMR). Does anyone know of older datasets? I've seen a few that are based on individual email accounts, but I need something a little more rigorous than that.

Any tips would be much appreciated!

You may want to contact a few of the spam blocking services. I have used postini at my last 2 positions and while that do not have historical information available publically, thay may have the numbers if you ask.
dra007
Even services like McAfee and Norton antivirus keep track of some spam data since some botnets and phishers are directly linked to the spam flow...

good luck in your research and do come back with what you find, I used to be an avid reader of you magazine in my younger years..
jimgiles
Thanks all for the suggestions. I'll check out the links. The spam filtering companies are telling me that they only have data going back to 04/05, so I'm still on the look out for older stats. I'll post a link to the story when it appears.

Cheers

Jim
kmolloy
QUOTE(jimgiles @ Oct 26 2009, 08:55 AM) *
Thanks all for the suggestions. I'll check out the links. The spam filtering companies are telling me that they only have data going back to 04/05, so I'm still on the look out for older stats. I'll post a link to the story when it appears.

Would news.admin.net-abuse.sightings be of any help? That goes back to the late '90s.
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2009 Invision Power Services, Inc.