Jump to content
Sign in to follow this  
Lking

Reporting Statistics

Recommended Posts

With only the graphs to look at and no long term numbers to digest, it seems to me that there has been a change in the relationship between the number of spam submitted and the number of reports sent.

It seems to me that in the past the number of reports sent was always larger than the number submitted. Now, after last November, the relationship seems reversed.

I don't see how the closing down of the botnet in November would cause the change. I don't remember rumors about a change in the parser. Do I need new glasses?

Share this post


Link to post
Share on other sites

Odd...I would have thought that the ratio would pretty much always be greater than 1:1. I do see in the 1yr graph that this also seems to have happened in May 2008.

Of course we don't really know exactly how either of the numbers in the equation are collected, that may be an issue.

-- rick

Share this post


Link to post
Share on other sites
Of course we don't really know exactly how either of the numbers in the equation are collected, that may be an issue.

I agree I would have though there would have been at least a 1:1 :: submit:sent.

I would have thought the numbers were just a counting issue, but you may have a point. Step (1) when you see a result you don't understand is; check/understand the data.

Looking at the daily-weekly charts makes me wonder if it is a processing restriction problem. Is a processor coming offline at night?

If you give incoming/submits and building the SCBL priority over outgoing reports, it kind of explains the drop in the sent graph if you don't have enough processor power or bandwidth. When incoming drops the spooled outgoing can be sent and then counted.

But the same daily cycle in the sent line is there even with lower input levels earlier this week. If it is a processor or bandwidth constriction I wouldn't think the swing would be the same. Then again maybe the lower submit levels trigger more processing per input thus more output.

Rick I guess you are correct, we don't know all the numbers in the equation.

Share this post


Link to post
Share on other sites

The relative positions of the blue and the green on the stats charts and their frequent flip-flops have been an abiding mystery since the start. No public explanation has ever been offered - the general supposition being such might somehow provide aid and comfort to the enemy. Wazoo was told once (I seem to recall an ancient admission somewhere in these very pages). He can probably tell you but maybe has to shoot you immediately after. "The quest for knowledge was never easy," as my Auntie Betty would say. She was the one known to generations of her devoted pupils as 'Madam Lash'.

Share this post


Link to post
Share on other sites

The radical change in the relationship between the green and blue elements on the graph is indeed bizarre, and this is the kind of topic that would generally have gained traction in the SC newsgroups. <_<

DT

Share this post


Link to post
Share on other sites
The radical change in the relationship between the green and blue elements on the graph is indeed bizarre, and this is the kind of topic that would generally have gained traction in the SC newsgroups. <_<
No reason not to explore/wonder some more about it here. Fundamental limitation is
Total spam report volume

These graphs show the number of messages submitted as spam along with the number of reports consumated regarding those messages. This data reflects more about SpamCop's usage patterns than it does about the spam. These numbers now reflect only a small fraction of total spam being processed by SpamCop, but they are still representative of the total.

Seems to be certain periodicity about the charts at every scale (and the internet archive/wayback machine - http://web.archive.org/web/*/http://www.sp....shtml?spamyear - has charts for earlier times). Loading alone doesn't seem to explain it (especially WRT the November reductions, as already mentioned and given the above statement that the numbers are merely 'representative'.).

The 'green' and the 'blue', taken independently, look tantalizing like the periodicities seen in 'textbook' time series though that is probably coincidental (and I confess I'm not quite up for Fourier analysis anyway - though Rick probably is :D). While loading seems to be deprecated as a 'cause' (if you can call it that), I wonder what the line for the sum of 'spam' plus 'reports sent' would look like? Certainly not a straight line but deviations from the average (or trend) line(s) might be instructive. The numbers which are plotted would be useful for any such analysis - though good enough estimates could be made 'by eye' from the charts.

All the exciting stuff - the actual 'whys and wherefores' might be unknown (and perhaps unknowable apart from some special cases like the November downturn and gradual recovery) but some empirical relationships could be found. Personally, I wouldn't think that would be sufficient 'payback' for the effort. There are other aspects/conjectures that could be considered (though having nothing specific in mind at this time).

Share this post


Link to post
Share on other sites

Although I commented on it, I'm not really all that interested in why it appears so illogical at the moment, and idle speculation from anyone but the actual admins is probably a waste of time and electrons. :)

DT

Share this post


Link to post
Share on other sites
I agree I would have though there would have been at least a 1:1 :: submit:sent.

I believe I can get away with stating that this would not be true ... one item for instance, pointing out that other Forum sections/FAQ entries contain data about certain things that do not create an outgoing Report.

Share this post


Link to post
Share on other sites

Sorry I didn't mean to plant this question and run off. I worked a local election today in part spending the last 14 hours in small talk that would boar a 3 year-old. When my neurons walk up I will look at the last this again.

Share this post


Link to post
Share on other sites
...When my neurons walk up I will look at the last this again.
Ah, another case of "Mommy brain". Did it come about from talking to the candidates, or the voters/electors?

Share this post


Link to post
Share on other sites

Has anyone taken a look at the graph over the last few days? Big change...for the last three days, the green and the blue are in sync and it seems to make sense. Something got changed somewhere....

DT

Share this post


Link to post
Share on other sites
Ah, another case of "Mommy brain". Did it come about from talking to the candidates, or the voters/electors?

It was an uncontested election for all town sets so the candidates seemed to have taken the Hippocratic oath, i.e. do no harm. (translate as say nothing).

Explaining "vote for one" did seem to be a difficult concept for some to grasp. This is a small town and we were using paper ballets. As my mind sat in idle this accrued to me "Vote by placing an "X" in the box by the name you want to vote for. If you can't spell that you may place a check mark in the box."

What really atrophied my mind was all that time trapped with 7 other adults with nothing in common to discuss except TV shows (American Idle, dance with the "stars", reality shows), Hollywood types, ...

David T is right (of course) the blue and green lines seem to be in sync this week. This is really odd but without any hope of real information this seems to be just an idle exercise.

Share this post


Link to post
Share on other sites
...David T is right (of course) the blue and green lines seem to be in sync this week. This is really odd but without any hope of real information this seems to be just an idle exercise.
Very likely. Recalling that saying ... "If it seems to make sense then you're hopelessly confused."
...Explaining "vote for one" did seem to be a difficult concept for some to grasp. This is a small town and we were using paper ballets. As my mind sat in idle this accrued to me "Vote by placing an "X" in the box by the name you want to vote for. If you can't spell that you may place a check mark in the box."...
Ah, complexity was needed. You might have added the numeral "1" to the options and argued it out later with the tally people. I have mentioned elsewhere our forthcoming referendum. Seems we are required to write "No" or "Yes" to the (single) proposition but, for those who may find that a challenge, a check (tick) will suffice for "Yes" but a cross (X) does not signify "No", that is an invalid vote because it is deemed ambiguous in the circumstance. Quite right too :Dhttp://www.google.com.au/search?hl=en&...p;oq=check+mark

Share this post


Link to post
Share on other sites

Well whatever, but both the blue and green lines seem to be about a third lower this last month.

I have also noticed that since about the 9th September my personal spam volume which has averaged 110/day since January has dropped to 30/day.

That's nice but why ?

Are the actual numbers from spamcop statistics available so they can be smoothed ?

Share this post


Link to post
Share on other sites
...Are the actual numbers from spamcop statistics available so they can be smoothed ?
I guess we need Wazoo's input on that. The graphs are resident on JT's cesmail server but I can't recall it being ever spelled out where they are 'drawn' and where the data are resident.

Yes, since the beginning of August [pre-dating the implementation of the new version of SpamCop (v4.6) by several weeks] the trends are apparently downwards and the relationship of the blue 'reports sent' line to the green 'spam sent' bars has the reports consistently less than the spam - which is new thing, a totally different pattern, in the context of the past year. I may be kidding myself but half a lifetime spent analyzing such stuff makes me believe that, even by eyeball, both factors are the proverbial 'statistically significant with a high degree of confidence'. Reasons? Yes, that is the question. Plenty of room for conjecture but I have no useful explanation off-hand.

Share this post


Link to post
Share on other sites

The monthly (daily moving number) IronPort Threat Operations Center figures for "worldwide spam volume" are available at:

http://www.senderbase.org/home/detail_spam...een=&order= (chart and table). This does not show a reducing trend over the period. It is 'seeing' about 178 billion spam daily or roughly 2,000 times the volume that SC does over the same period. Presumably IronPort sees the spam which doesn't evade its filters/blocks whereas SC sees the stuff which

  • does evade a variety of blocks (including IronPort and others)
  • or is filter-bypassed (including IronPort and others)
  • or is picked up after filtering (including IronPort and others)

Share this post


Link to post
Share on other sites
[*]or is picked up after filtering (including IronPort and others)

I myself report a lot that was filtered by both SC and Postini or both..so that sounds as most plausible to me. Interestingly I got some bizare feedback from reported providers lately asking me to block entire IP ranges...

Share this post


Link to post
Share on other sites
... Interestingly I got some bizare feedback from reported providers lately asking me to block entire IP ranges...
Yes, that is interesting - they want you to do that, as opposed to doing their own (outwards) spam/port 25 blocking , or even (heaven forfend) regain control of their own networks? Bizarre indeed - but I guess it suits their economic model. I imagine it confirms SC continues to have effect despite there being 'better' BLs for dynamically-allocated IP addresses, especially when reporters such as yourself report stuff caught by your spam filters. Heck, the wasted bandwidth when providers don't do their job (as we imagine it). That's the problem with 'economics'.

Share this post


Link to post
Share on other sites

The monthly (daily moving number) IronPort Threat Operations Center figures for "worldwide spam volume" are available at:

http://www.senderbase.org/home/detail_spam...een=&order= (chart and table). This does not show a reducing trend over the period. It is 'seeing' about 178 billion spam daily or roughly 2,000 times the volume that SC does over the same period. Presumably IronPort sees the spam which doesn't evade its filters/blocks whereas SC sees the stuff which

  • does evade a variety of blocks (including IronPort and others)
  • or is filter-bypassed (including IronPort and others)
  • or is picked up after filtering (including IronPort and others)

Hm. Another topic (like the SC statistics) where we have no idea what is being measured or even what definition of spam is being used. Of course users of greylisting will never see or count a lot of spam and this includes me as a user of SpamCop Mail for direct-to-SC mail (10%). However, except for this I think none of may mail is filtered though it is tagged by some mail handlers (and Tiscali seems to be using Brightmail which for my profile inserts a header line "X-Brightmail: Message is probably unwanted (spam)" but doesn't tag.

I wanted yearly data and found Ironport had some.

http://www.senderbase.org/home/spam_watch

As did message labs

http://www.messagelabs.co.uk/intelligence.aspx

which don't correlate of course.

Postini should have some but I can't find anything.

Share this post


Link to post
Share on other sites
With only the graphs to look at and no long term numbers to digest, it seems to me that there has been a change in the relationship between the number of spam submitted and the number of reports sent.

This is speculation only, but perhaps the process has been changed, and priority is given to receipt of reports over sending out reports to ISPs, or additional checking is taking place prior to sending of messages to ISPs causing a lag. Either of these could depend on the workload of the sending/receiving processes.

The monthly (daily moving number) IronPort Threat Operations Center figures for "worldwide spam volume" are available at:

http://www.senderbase.org/home/detail_spam...een=&order= (chart and table). This does not show a reducing trend over the period. It is 'seeing' about 178 billion spam daily or roughly 2,000 times the volume that SC does over the same period. Presumably IronPort sees the spam which doesn't evade its filters/blocks whereas SC sees the stuff which

  • does evade a variety of blocks (including IronPort and others)
  • or is filter-bypassed (including IronPort and others)
  • or is picked up after filtering (including IronPort and others)

Strangely, my own experience reflects Spamcop's statistics, ie a decreasing trend. My own spam received is skewed, since I actively block traffic from APNIC registered blocks, South America, the Middle East, Africa, and the Former Soviet Union. Due to limited rule capacity on my mail server imposed by my service provider, I have to periodically edit the 130 rules, removing inactive blocks and inserting recently active blocks.

Prior to the active blocking, my daily spam received was usually a little over 100/day. Following implementation of the rules, that dropped immediately to around 30/day, increasing to about 40/day over 12 months or so, but over the last 4 months or so it has gradually dropped to around 15/day.

It looks to me as though there is a genuine fall-off in spam from my unfiltered sources (mainly Western Europe and North America). This doesn't appear to me to be the usual seasonal variation.

Cheers,

Joe

Share this post


Link to post
Share on other sites
..Strangely, my own experience reflects Spamcop's statistics, ie a decreasing trend. ...
Thanks Joe - I note the monthly IronPort numbers (for 28 Oct - 27 Nov 2009) are now also show a reducing spam tendency - which is statistically significant at the 1% level (which is significant, right enough - IronPort Trend November 2009). This holds up even when the (possibly atypical) high 28 October number is omitted. Something, indeed, seems to have changed in our world. The IronPort numbers may contain some unknowns but, when real-life experience of individual spamsuffers seems to confirm the decline, we can, probably, ignore those.

But, in accordance with the immutable laws of an unkind universe, I have a feeling in my waters that it is "costing us", somehow (probably financially through provider charges holding up though real volume costs decline). Well, who cares? As long as we get less spam and most of our 'goodmail' (as Mike Easter might call it). I can't remember a time before, other than those brief respites when a major botnet/zombie network might be closed down, when this has happened.

I do think I may open a bottle of dandelion wine.

[edit - added TL chart. {hic}

The numbers say that almost 28% of the daily change can be accounted for by a daily reduction of about 2 billion spam and that the odds of that happening entirely by chance are only about 1 in 400. We would be ready to suspect 'something' was happening once the odds got to as long as 1 in 20 so, with much longer odds, these figures are giving quite a strong hint indeed. It could (still) all be part of some majestic cycle, destined to reverse itself in good time. Periodicity in spam numbers has been seen before - but these reductions, apparently with no associated major incident, do seem a bit promising. As always, ICBW.]

Share this post


Link to post
Share on other sites
But, in accordance with the immutable laws of an unkind universe, I have a feeling in my waters that it is "costing us", somehow (probably financially through provider charges holding up though real volume costs decline). Well, who cares? As long as we get less spam and most of our 'goodmail' (as Mike Easter might call it). I can't remember a time before, other than those brief respites when a major botnet/zombie network might be closed down, when this has happened.

I do think I may open a bottle of dandelion wine.

I don't know about anyone else, but my spam was averaging about 15/day (following blocking) at the time of my last post. Over the last couple of weeks, it crept up to 24/day-ish, but I was still happy as my previous rate before the apparent drop-off was about 30/day. Today I got 35....

I think you can put the cork back on that dandelion wine.

Joe

Share this post


Link to post
Share on other sites
...I think you can put the cork back on that dandelion wine.
Too late! (I need little encouragement to drink and no open bottle is ever going to last a fortnight). Interestingly, the SenderBase-IronPort numbers continue their decline (the 'evidence' being stronger than ever with additional data). Maybe that 'decline' is spurious, some artefact of the IronPort methodology or maybe the message base 'sea' is so huge it can contain contradictory currents. Perhaps both of those.

To demonstrate my dedication to the gratuitous consumption of ethanol products I confess

  • the documented trend is an unsatisfactory model - interpolation implies a daily spam volume at Excel day zero (31/12/1899) of some 85 trillion and I'm sure we would have heard from our ancestors if that was the actual case
  • the IronPort tables incidentally infer a closely-linked decline in real e-mail/ham which decline is counter-intuitive and which linkage is very suspect
  • the daily relative change (% change from previous day) is almost totally random/uncorrelated over time, implying some sort of parametric effect in the absolute numbers that I don't quite understand but which certainly doesn't give much confidence in that apparent trend

So, if celebrations are premature, if our hopes are dashed once again, perhaps an angel weeps in heaven for us but in any event we need to keep our defences in place and return to the fray. But hope is a good thing. Despite my own spam numbers increasing (6 in November, 4 so far in December) I note that the presumably malicious 'payload' websites in those I have received lately all appear to have been closed down before I even received the spam. I see this through DNS records, my ever-vigilant ISP (bless his little white cotton socks) has taken to black-holing those places, making direct investigation difficult.

I must go, I have that fourth December spam to report.

Share this post


Link to post
Share on other sites
...Interestingly, the SenderBase-IronPort numbers continue their decline (the 'evidence' being stronger than ever with additional data). ...
Well, "interesting" for me because I don't understand the dynamic of it. Updated view of the spam data and trend, with similar tacked on below it for the ham/goodmail (illustrating the point about inferred ham:spam close linkage in the IronPort numbers). http://img23.imageshack.us/img23/4872/iron...oct8dec2009.jpg

The 24% increases 8-Dec/7-Dec are frankly incredible and (one might think) perhaps not even possible. But, anyway, FWIW, an indicated trend of 2 billion less spam globally per day, or 1% per day decline. Sadly, perhaps not borne out by SC reporter experience.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×