Jump to content

FAQ Entry: SenderBase's "Magnitude" Explained


Jeff G.

Recommended Posts

SenderBase's "Magnitude" appears to be on a logarithmic scale using Base Ten, such that Estimated Daily Email Volume equals 1.34 x 10^Magnitude, as follows:

Magnitude 0 = 1.34 Estimated Daily Email Volume

Magnitude 1 = 13.4 Estimated Daily Email Volume

Magnitude 2 = 134 Estimated Daily Email Volume

Magnitude 3 = 1.34 Thousand Estimated Daily Email Volume

Magnitude 4 = 13.4 Thousand Estimated Daily Email Volume

Magnitude 5 = 134 Thousand Estimated Daily Email Volume

Magnitude 6 = 1.34 Million Estimated Daily Email Volume

Magnitude 7 = 13.4 Million Estimated Daily Email Volume

Magnitude 8 = 134 Million Estimated Daily Email Volume

Magnitude 9 = 1.34 Billion (Thousand Million in UK English) Estimated Daily Email Volume

The interval between displayed Magnitudes (an increase in Magnitude of 0.1) is the tenth root of ten, or approximately1.2589254117941672104239541063958 (as caclulated by a Pentium).

Edit 6-5-09 - note: email volume has increased greatly of the last few years with todays volume estimated closer to 200 Billion - please read the following posts

Link to comment
Share on other sites

  • 3 weeks later...

Similar to the Richter scale used to measure earthquakes, SenderBase's magnitude is a measure of message volume calculated using a log scale with a base of 10. The maximum theoretical value of the scale is set to 10, which equates to 100% of the world's email message volume (approximately 10 billion messages/day). Using our log scale, a one point increase in magnitude equates to a 10x increase in actual volume. For example, a domain with a magnitude of 5 would have estimated volume of 100,000/day while a sender with a magnitude of 6 would have an estimated daily volume of 1,000,000/day. The following table illustrates the percentage of Internet email associated with each magnitude:

10.0     100% 
9.0       10% 
8.0       1% 
7.0       0.1% 
6.0       0.01% 
5.0       0.001% 
4.0       0.0001% 
3.0       0.00001% 
2.0       0.0000001% 
1.0       0.00000001% 

  • Daily magnitude is a measure of how many messages a domain has sent over the last 24 hours. Similar to the Richter scale used to measure earthquakes, SenderBase's magnitude is a measure of message volume calculated using a log scale with a base of 10. The maximum theoretical value of the scale is set to 10, which equates to 100% of the world's email message volume (approximately 10 billion messages/day).
  • Monthly magnitude is calculated using the same approach as daily magnitude, except the percentages are calculated based on the volume of email sent over the last 30 days.
  • Estimated volume is SenderBase's "guess" of how many messages a domain is sending in a day, using a sampling of messages received by over 28,000 network owners.

Taken from http://www.senderbase.org/?page=help and http://www.senderbase.org/search?page=help_magnitude

Link to comment
Share on other sites

The SenderBase volume statistics are potentially a useful tool to remind doubting domain admins of the objective evidence of unusual activity on their servers. The notation of the exponents of email volume (as "magnitude") is an elegant and consistent way to express differences over a huge range which has a somewhat variable base over time (the total "emails sent" estimate), but most people will find it a little difficult to get a "feel" for what the magnitude numbers and their differences mean. The percentage change figures help but the actual change(s) in email message numbers represented will be even more readily appreciated.

To get that, the total volume of email (magnitude 10) needs to be known - k*10^10=total. The general SenderBase stats on http://www.senderbase.org/ provide sufficient information with both magnitude numbers and the matching "Estimated Daily Volume (m)"illions for the top 20 domains and the top 20 IP addresses.

The simplest estimate is found from the top domain - if (for example) magnitude 7.9 relates to 118.9 million messages, then k*10^7.9=118,900,000 or k=118,900,000/10^7.9 which is k=118,900,000/79,432,823 or k=1.496862 ...

A better estimate is found by calculating and averaging for all 40 results displayed - which in an actual case (early August 8) came to k=1.507289 ... Another estimate using all 40 pairs of data can be found from the slope coefficient of the linear regression (correlation) between these variables. Using Excel's LINEST spreadsheet function with the intercept forced to zero gave a result of k= 1.535364 ... for this (same) data. This, though more complicated, is the estimate with the least associated error of estimate.

For instance, the derivation of a 1890% difference between an IP address with a current daily average magnitude of 4.9 versus its average magnitude of 3.6 is a change from k*10^3.6 to k*10^4.9 messages per day. The following are for each of the estimates above:

1. Top Domain - increase from 5,959 to 118,900 (msg/d)

2. Average 40 - increase from 6,001 to 119,728

3. Regression - increase from 6,112 to 121,958

It is seen that in this instance the first and simplest estimate is close enough, the elaboration of "better" estimates of k adds little to the picture. (The simplest estimate is actually well within the probable error range of the "best" estimate.) The precision of the magnitude figures (1 decimal place) is insufficient to exactly replicate the percentage difference given in SenderBase - in other words, rounding errors are appreciable (though minor).

Note the estimated total daily volume in this example is variously 15.0 (U.S.) billion, 15.1 billion and 15.4 billion - k*10^10. This number is the basis of the day's magnitude calculations and is, as said, somewhat variable during the day and between days.

Link to comment
Share on other sites

Note the estimated total daily volume in this example is variously 15.0 (U.S.) billion, 15.1 billion and 15.4 billion - k*10^10.  This number is the basis of the day's magnitude calculations and is, as said, somewhat variable during the day and between days.

31454[/snapback]

However, it is rather inconsistent with "the world's email message volume (approximately 10 billion messages/day)" as stated on both http://www.senderbase.org/?page=help and http://www.senderbase.org/search?page=help_magnitude.
Link to comment
Share on other sites

However, it is rather inconsistent with "the world's email message volume (approximately 10 billion messages/day)" as stated on both http://www.senderbase.org/?page=help and http://www.senderbase.org/search?page=help_magnitude.

31458[/snapback]

You need to remember that the 10 billion number comes from a static FAQ which is out of date and has not been updated (probably not since it was first created) and then it was most likely a very rough average.

A note to Farelf, thankyou for a extremely well written explaination, probably a bit over many of our heads, but definately helps to clarify the issues and puts it into persepective, ie the exact numbers are unimportant, it is the trend in change of volume that is important which is provided in a simplied format.

An increase of 100,000 messages a day may be a very important indicator of a spam problem for an IP whose average is only 1,000; but totally meaningless for a high volume IP

Link to comment
Share on other sites

Thanks for comments guys. Yes it is a bit involved - maybe an abstract for the "final" FAQ, if it is worth including at all.

Certainly the total volume is dynamic (note %change figures are independent of actual total volume, which is assumed to be the same throughout the period considered in this approach). I have dealt with the daily volumes only - weekly could be calculated using a similar approach but the supposed constancy of total volume might be an issue.

Variation in total volume by the 3 approximation methods as above for a couple of later snapshots to illustrate dynamism:

17:30 GMT 08-Aug-2005

1. 14.6 (b m/d)

2. 15.0

3. 14.7

00:30 GMT 09-Aug-2005

1. 13.9

2. 14.8

3. 14.4

This is about a 4% difference in the most extreme case.

Link to comment
Share on other sites

However, it is rather inconsistent with "the world's email message volume (approximately 10 billion messages/day)" as stated on both http://www.senderbase.org/?page=help and http://www.senderbase.org/search?page=help_magnitude.

31458[/snapback]

To further clarify - the 10 billion m/d is impossible for the current magnitudes quoted together with matching message counts on the SenderBase entry page. It was, I am sure, simply a convenient number for explanatory purposes (or maybe - Lord help us because the difference would be mostly spam - it was a good approximation a few months ago).

I did say the rounding error on the magnitude figures is "appreciable though minor". I was forgetting these are exponentiated. 7.9 can actually be anything between 7.85 and 7.94999 .... Consequently the maximum error from this source, on the matching number of messages, is (very nearly) 10^0.1-1 or 25.89% (which applies to all magnitude numbers) - this is a bit more than minor! As a result, the previous figures quoted could all (just barely) be attributed to a "real" total message count of 14.55 billion. Over 1½ days the range of values for magnitude 7.9, based on that total count and allowing for maximum rounding error on both the magnitude number and the matching message count is 11.78%, well within the scope. However, at the bottom end of the scale, magnitude 6.1 (the most consistent minimum in the period), the reported values vary by a maximum 30.3% with a median of 23.5%.

After just 1½ days, it is looking very unlikely that *any* static number is used and certainly not 10 billion. It was looking to me like the count is dynamic (like the real world) and further analysis is not persuading me to the contrary view. The actual volatility may be a little less than is indicated by the available "deconstruction" methods (because of the rounding errors) but I remain of the view that the treatment is useful.

[update] Incidental - won't bother with a new post. Further analysis appears to confirm the SenderBase volumes are indeed dynamic, even in the short run.

SENDERBASE - DECONSTRUCTION TO TOTAL EMAIL VOLUME				----------- PAIRED DIFFERENCES (AS STANDARD ERRORS) ----------					
CASE	"DATE TIME	"	ESTIMATE (LR)	PROB. ERROR	"1	"	"2	"	"3	"	"4	"	"5	"	6
1	07-Aug-2005 Late GMT	15,501,324,250	± 88,361,842	"0	"	-2.290258953	-10.25276948	-18.45891555	-8.775687532	-12.79216103
2	08-Aug-2005 Early GMT	15,353,642,311	± 95,602,102	2.477919946	"0	"	-8.272567858	-15.91817697	-7.022663427	-10.25261346
3	08-Aug-2005 20:30 GMT	14,736,680,458	± 110,571,352	12.82977538	9.567875519	"0	"	-5.303887968	0.300838747	0.356699937
4	09-Aug-2005 00:30 GMT	14,428,388,820	± 86,177,135	18.00252708	14.34887779	4.13374585	"0	"	3.960342935	5.658101628
5	09-Aug-2005 07:30 GMT	14,762,024,348	± 124,900,490	12.40453631	9.1748412	-0.339824976	-5.73990742	"0	"	-0.079115125
6	10-Aug-2005 03:10 GMT	14,757,423,578	± 86,217,550	12.48173154	9.246190178	-0.278135287	-5.660755191	0.054612374	"0	"

SenderBase data (a sample of a population) is used to estimate SenderBase total email volume (the population) by the correlation method mentioned previously. Probable errors give the range where 50% of the estimates are expected to fall. Probable error is a fraction (0.67449 ...) of the Standard error. The paired differences are like Case 2, column "1": (Case 1 estimate - Case 2 estimate)/(Case 1 Standard Error). This is a shortcut, not totally rigorous, but it should be useful/close enough since really fine discrimination is not required. Where the difference in standard errors is less than -3 or more than +3 it is unlikey that the "real" (population) volumes represented by these estimates could be the same - the odds are about 1 in 370 at that point and rise rapidly thereafter. Accordingly, it seems the actual volumes behind the SenderBase data are changing (fluctuating) rapidly, if not continuously.

There are a number of unknowns (particularly how well the SenderBase total volume mirrors what is actually happening in the world) but, as supposed, the (very accessable) SenderBase figures most probably can be converted to give a useful indication of email numbers from specific IP addresses - and the short-term changes to that trafic.

It might even be useful to try to relate total volume estimates to peaks and troughs in SpamCop reporting.

Link to comment
Share on other sites

  • 1 year later...

One problem though ... Senderbase is flawed. According to them, we have a magnitude of 3.5 which would mean we're sending between 7.37 Thousand Estimated Daily Email Volume -- yet the stats of our Exchange server from PerfMon say we've send about 10,000 in the last 3 weeks. Our ISP's web traffic data supports this as we're not chewing up any additional bandwidth.

So, that would suggest that Senderbase is flawed. Especially, when I've written to them multiple times and they've refused to reply. That says even more about this company / product. <_<

Link to comment
Share on other sites

...So, that would suggest that Senderbase is flawed. Especially, when I've written to them multiple times and they've refused to reply. That says even more about this company / product. <_<
Thanks for the data - your observations are of interest and, I think, similar to/supportive of the suspicions of a number of members. As to what the failure ('refusal' makes a judgement) to reply means I don't think we can say (beyond, obviously, they have no-one in the role of flackcatcher). Even Wazoo (our forum Admin) has commented on occasion that he has had no reply to inquiry when clearly it would have been in their interests to take the initiative in response and especially when raised in "these precincts".
Link to comment
Share on other sites

Note: this post is still under construction.

Edit, basicly gave up on the project and decide to make post visible simply to get rid of the hidden post due to the current situtation - dbiel 9-20-07

The following does not directly address pgreenway's complaint, which is a separate issue, but is being posted to show that trying to calculate any specific volume for a single magnitude is impossible. Its primary value is to indicate changes in relative volume. Please not that what follows is simply additional proof of that previously presented by Farelf but relying solely on Senderbase own calculations of volume and use a simple comparison between multiple days

One problem though ... Senderbase is flawed. According to them, we have a magnitude of 3.5 which would mean we're sending between 7.37 Thousand Estimated Daily Email Volume
There is one major flaw in your logic which is magnitude 3.5 does not = any constant quantity. It is a representation of the percent of estimated current daily email traffic. What would be helpful is if they would post the daily total traffic that was being used to calculate the magnitude number.

Additionally the rounding factor becomes quite large when magnitudes are listed only in tenths.

If you are into math, take the magnitudes and total volume of several of the largest and middle and smaller mail servers and calculate what the total traffic volume is. Do it for several different days and you will be very surprised with the wide spread in the answers you get for the total traffic. The following is a sample from todays listing listing daily magnitude and daily volume. You will notice that the magnitude is a constant 7.8 but the volume ranges from a high of 102.3 million to a low of 86.3 million with is over an 18.5% spread

Note: I include the next higher and next lower magnitude to use as a limiter to allow for future comparisons.

Magnitude - Volume - date: March 22, 2006
---7.9------ 123.2 telecomitalia.it Netsiel S.p.A. network unknown 
---7.8-------102.3 charter.com CHARTER COMMUNICATIONS NSP 
---7.8-------100.5 proxad.net Proxad / Free SAS NSP 
---7.8--------98.2 ttnet.net.tr Turk Telekom unknown 
---7.8--------97.0 hinet.net CHTD, Chunghwa Telecom Co., Ltd. NSP 
---7.8--------95.8 163data.com.cn CHINANET-ZJ Hangzhou node network unknown 
---7.8--------87.2 telesp.net.br TELECOMUNICACOES DE SAO PAULO S.A. - TELESP ISP 
---7.8--------86.3 bezeqint.net ADSL-CUSTOMER-CONNECTION NSP 
---7.7--------78.6 veloxzone.com.br Telemar Norte Leste S.A. ISP 

An additional range indicating more than 20.4% difference in volume but the same magnitude

Magnitude - Volume - date: March 22, 2006
---7.1--------17.2 pppool.de freenet Cityline GmbH unknown 
---7.0--------16.5 touchtelindia.net Infrastructer unknown 
---7.0--------15.6 cox.net COX COMMUNICATIONS unknown 
---7.0--------15.3 layeredtech.com Cable &amp; Wireless unknown 
---7.0--------15.0 sify.net Satyam Infoway Pvt.Ltd. unknown 
---7.0--------14.7 earthlink.net Earthlink Network ISP 
---7.0--------14.2 siol.net SiOL d.o.o NSP 
---7.0--------14.0 dialog.net.pl Dialog Internet Services Customer DSL unknown 
---7.0--------14.0 seed.net.tw Digital United Inc. NSP 
---7.0--------13.8 covad.net Covad Communications NSP 
---7.0--------13.7 etb.net.co ETB - Colombia unknown 
---7.0--------13.7 swbell.net Pac Bell Internet Services NSP 
---6.9--------13.0 ukrtel.net Ukrtelecom IP access network in Kremenchug unknown 

Note: future entries will exclude entries that do not help to determine the point at which a specific magnitude number changes.

Magnitude - Volume - date: March 25, 2006
---7.1--------16.8 alltel.net Central Telephone Company unknown 
---7.0--------16.8 siteprotect.com Hostway Corporation unknown 
---7.0--------13.5 swbell.net Pac Bell Internet Services NSP 
---6.9--------13.0 netcabo.pt TVCABO-Portugal Cable Modem Network NSP 

Link to comment
Share on other sites

One problem though ... Senderbase is flawed. According to them, we have a magnitude of 3.5 which would mean we're sending between 7.37 Thousand Estimated Daily Email Volume -- yet the stats of our Exchange server from PerfMon say we've send about 10,000 in the last 3 weeks. Our ISP's web traffic data supports this as we're not chewing up any additional bandwidth.

So, that would suggest that Senderbase is flawed. Especially, when I've written to them multiple times and they've refused to reply. That says even more about this company / product. <_<

Senderbase statistics are estimates (as there is no way for them to monitor your email sending habits directly) and as such are not an exact number. They should, in most cases, be accurate within one order of magnitude, which in this case they appear to be.

As far as contacting them, I have never had any luck getting them to fix errors in the ownership data they display on their site either. They do not appear to respond to or act on emails.

Link to comment
Share on other sites

  • 2 years later...

Noting SenderBase

  • is now using 200 billion messages/day (2x1011) as the basis of their exponential magnitude scale
  • persists with the error in their help page - http://www.senderbase.org/help/magnitude - (magnitudes 2 and 1 are each reduced one decimal place too many)

Anyway, conversions from magnitude (n) to number of messages is given by 2x10(n+1) - thus:

2.0 = 2,000
2.5 = 6,325
3.0 = 20,000
3.5 = 63,246
4.0 = 200,000
4.5 = 632,456
5.0 = 2,000,000
5.1 = 2,517,851
5.2 = 3,169,786
5.3 = 3,990,525
5.4 = 5,023,773
5.5 = 6,324,555
5.6 = 7,962,143
5.7 = 10,023,745
5.8 = 12,619,147
5.9 = 15,886,565
6.0 = 20,000,000
Note the apparent precision is spurious - round-off errors would be considerable but more importantly, as Will points out in his previous post, these are only estimates/projections based on the 'sampling' represented by SB monitoring of an unknown ('guessed-at') total volume. <_<

But the table should be useful withal. I know I have been guilty of being out by half an order of magnitude or so in conversions I have stated recently. No more.

Full second decimal place magnitude values can be determined for the whole range by reference to the expanded 5.1 - 5.9 range (2.1 = 2,518, etc.)

[edit] Oh yes - being an exponential scale, technically speaking there is no volume 0 - as magnitude tends to zero, number tends to 20. Magnitude -1.3 would about equal 1 message, I suppose, but SB doesn't use negatives and 'part messages' below 1 are (anyway) nonsensical. To all practical intents, I'm sure zero magnitude is taken as/meant to signify zero messages. Just a technicality.

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...