Help - Search - Members - Calendar
Full Version: Reporting Statistics
SpamCop Discussion > Discussions & Observations > SpamCop Lounge
Lking
With only the graphs to look at and no long term numbers to digest, it seems to me that there has been a change in the relationship between the number of spam submitted and the number of reports sent.

It seems to me that in the past the number of reports sent was always larger than the number submitted. Now, after last November, the relationship seems reversed.

I don't see how the closing down of the botnet in November would cause the change. I don't remember rumors about a change in the parser. Do I need new glasses?
rconner
Odd...I would have thought that the ratio would pretty much always be greater than 1:1. I do see in the 1yr graph that this also seems to have happened in May 2008.

Of course we don't really know exactly how either of the numbers in the equation are collected, that may be an issue.

-- rick

Lking
QUOTE(rconner @ May 3 2009, 11:16 AM) *
Of course we don't really know exactly how either of the numbers in the equation are collected, that may be an issue.

I agree I would have though there would have been at least a 1:1 :: submit:sent.
I would have thought the numbers were just a counting issue, but you may have a point. Step (1) when you see a result you don't understand is; check/understand the data.

Looking at the daily-weekly charts makes me wonder if it is a processing restriction problem. Is a processor coming offline at night?

If you give incoming/submits and building the SCBL priority over outgoing reports, it kind of explains the drop in the sent graph if you don't have enough processor power or bandwidth. When incoming drops the spooled outgoing can be sent and then counted.

But the same daily cycle in the sent line is there even with lower input levels earlier this week. If it is a processor or bandwidth constriction I wouldn't think the swing would be the same. Then again maybe the lower submit levels trigger more processing per input thus more output.

Rick I guess you are correct, we don't know all the numbers in the equation.
Farelf
The relative positions of the blue and the green on the stats charts and their frequent flip-flops have been an abiding mystery since the start. No public explanation has ever been offered - the general supposition being such might somehow provide aid and comfort to the enemy. Wazoo was told once (I seem to recall an ancient admission somewhere in these very pages). He can probably tell you but maybe has to shoot you immediately after. "The quest for knowledge was never easy," as my Auntie Betty would say. She was the one known to generations of her devoted pupils as 'Madam Lash'.
DavidT
The radical change in the relationship between the green and blue elements on the graph is indeed bizarre, and this is the kind of topic that would generally have gained traction in the SC newsgroups. dry.gif

DT
Farelf
QUOTE(DavidT @ May 4 2009, 09:00 PM) *
The radical change in the relationship between the green and blue elements on the graph is indeed bizarre, and this is the kind of topic that would generally have gained traction in the SC newsgroups. dry.gif
No reason not to explore/wonder some more about it here. Fundamental limitation is
QUOTE(http://www.spamcop.net/spamstats.shtml)
Total spam report volume

These graphs show the number of messages submitted as spam along with the number of reports consumated regarding those messages. This data reflects more about SpamCop's usage patterns than it does about the spam. These numbers now reflect only a small fraction of total spam being processed by SpamCop, but they are still representative of the total.
Seems to be certain periodicity about the charts at every scale (and the internet archive/wayback machine - http://web.archive.org/web/*/http://www.sp....shtml?spamyear - has charts for earlier times). Loading alone doesn't seem to explain it (especially WRT the November reductions, as already mentioned and given the above statement that the numbers are merely 'representative'.).

The 'green' and the 'blue', taken independently, look tantalizing like the periodicities seen in 'textbook' time series though that is probably coincidental (and I confess I'm not quite up for Fourier analysis anyway - though Rick probably is biggrin.gif). While loading seems to be deprecated as a 'cause' (if you can call it that), I wonder what the line for the sum of 'spam' plus 'reports sent' would look like? Certainly not a straight line but deviations from the average (or trend) line(s) might be instructive. The numbers which are plotted would be useful for any such analysis - though good enough estimates could be made 'by eye' from the charts.

All the exciting stuff - the actual 'whys and wherefores' might be unknown (and perhaps unknowable apart from some special cases like the November downturn and gradual recovery) but some empirical relationships could be found. Personally, I wouldn't think that would be sufficient 'payback' for the effort. There are other aspects/conjectures that could be considered (though having nothing specific in mind at this time).
DavidT
Although I commented on it, I'm not really all that interested in why it appears so illogical at the moment, and idle speculation from anyone but the actual admins is probably a waste of time and electrons. smile.gif

DT
Wazoo
QUOTE(Lking @ May 3 2009, 11:58 AM) *
I agree I would have though there would have been at least a 1:1 :: submit:sent.

I believe I can get away with stating that this would not be true ... one item for instance, pointing out that other Forum sections/FAQ entries contain data about certain things that do not create an outgoing Report.
Lking
Sorry I didn't mean to plant this question and run off. I worked a local election today in part spending the last 14 hours in small talk that would boar a 3 year-old. When my neurons walk up I will look at the last this again.
Farelf
QUOTE(Lking @ May 5 2009, 10:53 AM) *
...When my neurons walk up I will look at the last this again.
Ah, another case of "Mommy brain". Did it come about from talking to the candidates, or the voters/electors?
DavidT
Has anyone taken a look at the graph over the last few days? Big change...for the last three days, the green and the blue are in sync and it seems to make sense. Something got changed somewhere....

DT
Lking
QUOTE(Farelf @ May 4 2009, 11:38 PM) *
Ah, another case of "Mommy brain". Did it come about from talking to the candidates, or the voters/electors?

It was an uncontested election for all town sets so the candidates seemed to have taken the Hippocratic oath, i.e. do no harm. (translate as say nothing).

Explaining "vote for one" did seem to be a difficult concept for some to grasp. This is a small town and we were using paper ballets. As my mind sat in idle this accrued to me "Vote by placing an "X" in the box by the name you want to vote for. If you can't spell that you may place a check mark in the box."

What really atrophied my mind was all that time trapped with 7 other adults with nothing in common to discuss except TV shows (American Idle, dance with the "stars", reality shows), Hollywood types, ...

David T is right (of course) the blue and green lines seem to be in sync this week. This is really odd but without any hope of real information this seems to be just an idle exercise.
Farelf
QUOTE(Lking @ May 13 2009, 02:53 AM) *
...David T is right (of course) the blue and green lines seem to be in sync this week. This is really odd but without any hope of real information this seems to be just an idle exercise.
Very likely. Recalling that saying ... "If it seems to make sense then you're hopelessly confused."
QUOTE(Lking @ May 13 2009, 02:53 AM) *
...Explaining "vote for one" did seem to be a difficult concept for some to grasp. This is a small town and we were using paper ballets. As my mind sat in idle this accrued to me "Vote by placing an "X" in the box by the name you want to vote for. If you can't spell that you may place a check mark in the box."...
Ah, complexity was needed. You might have added the numeral "1" to the options and argued it out later with the tally people. I have mentioned elsewhere our forthcoming referendum. Seems we are required to write "No" or "Yes" to the (single) proposition but, for those who may find that a challenge, a check (tick) will suffice for "Yes" but a cross (X) does not signify "No", that is an invalid vote because it is deemed ambiguous in the circumstance. Quite right too biggrin.gif http://www.google.com.au/search?hl=en&...p;oq=check+mark
michaelanglo
Well whatever, but both the blue and green lines seem to be about a third lower this last month.

I have also noticed that since about the 9th September my personal spam volume which has averaged 110/day since January has dropped to 30/day.

That's nice but why ?

Are the actual numbers from spamcop statistics available so they can be smoothed ?

Farelf
QUOTE(michaelanglo @ Sep 20 2009, 07:30 AM) *
...Are the actual numbers from spamcop statistics available so they can be smoothed ?
I guess we need Wazoo's input on that. The graphs are resident on JT's cesmail server but I can't recall it being ever spelled out where they are 'drawn' and where the data are resident.

Yes, since the beginning of August [pre-dating the implementation of the new version of SpamCop (v4.6) by several weeks] the trends are apparently downwards and the relationship of the blue 'reports sent' line to the green 'spam sent' bars has the reports consistently less than the spam - which is new thing, a totally different pattern, in the context of the past year. I may be kidding myself but half a lifetime spent analyzing such stuff makes me believe that, even by eyeball, both factors are the proverbial 'statistically significant with a high degree of confidence'. Reasons? Yes, that is the question. Plenty of room for conjecture but I have no useful explanation off-hand.
Farelf
The monthly (daily moving number) IronPort Threat Operations Center figures for "worldwide spam volume" are available at:
http://www.senderbase.org/home/detail_spam...een=&order= (chart and table). This does not show a reducing trend over the period. It is 'seeing' about 178 billion spam daily or roughly 2,000 times the volume that SC does over the same period. Presumably IronPort sees the spam which doesn't evade its filters/blocks whereas SC sees the stuff which
  • does evade a variety of blocks (including IronPort and others)
  • or is filter-bypassed (including IronPort and others)
  • or is picked up after filtering (including IronPort and others)
dra007
QUOTE(Farelf @ Sep 20 2009, 05:26 AM) *
[*]or is picked up after filtering (including IronPort and others)

I myself report a lot that was filtered by both SC and Postini or both..so that sounds as most plausible to me. Interestingly I got some bizare feedback from reported providers lately asking me to block entire IP ranges...
Farelf
QUOTE(dra007 @ Sep 20 2009, 11:43 PM) *
... Interestingly I got some bizare feedback from reported providers lately asking me to block entire IP ranges...
Yes, that is interesting - they want you to do that, as opposed to doing their own (outwards) spam/port 25 blocking , or even (heaven forfend) regain control of their own networks? Bizarre indeed - but I guess it suits their economic model. I imagine it confirms SC continues to have effect despite there being 'better' BLs for dynamically-allocated IP addresses, especially when reporters such as yourself report stuff caught by your spam filters. Heck, the wasted bandwidth when providers don't do their job (as we imagine it). That's the problem with 'economics'.
michaelanglo
QUOTE(Farelf @ Sep 20 2009, 10:26 AM) *

The monthly (daily moving number) IronPort Threat Operations Center figures for "worldwide spam volume" are available at:
http://www.senderbase.org/home/detail_spam...een=&order= (chart and table). This does not show a reducing trend over the period. It is 'seeing' about 178 billion spam daily or roughly 2,000 times the volume that SC does over the same period. Presumably IronPort sees the spam which doesn't evade its filters/blocks whereas SC sees the stuff which
  • does evade a variety of blocks (including IronPort and others)
  • or is filter-bypassed (including IronPort and others)
  • or is picked up after filtering (including IronPort and others)


Hm. Another topic (like the SC statistics) where we have no idea what is being measured or even what definition of spam is being used. Of course users of greylisting will never see or count a lot of spam and this includes me as a user of SpamCop Mail for direct-to-SC mail (10%). However, except for this I think none of may mail is filtered though it is tagged by some mail handlers (and Tiscali seems to be using Brightmail which for my profile inserts a header line "X-Brightmail: Message is probably unwanted (spam)" but doesn't tag.

I wanted yearly data and found Ironport had some.

http://www.senderbase.org/home/spam_watch

As did message labs

http://www.messagelabs.co.uk/intelligence.aspx

which don't correlate of course.

Postini should have some but I can't find anything.
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2009 Invision Power Services, Inc.