Jump to content


  • Content Count

  • Joined

  • Last visited

Everything posted by PeterJ

  1. PeterJ

    smtp server

    I have not used OE for a while, but with my current ISP (Ameritech DSL) and Thunderbird (v0.5) as my mail client, I have no issues sending via their SMTP with my "spamcop address" specified. The reason it works is because my ISP requires SMTP authentication and once I plug the appropriate account information into the Thunderbird mail client, it works fine. Just wanted to use this as an example because it might be worthwhile to: 1) Confirm if your ISP's SMTP server can or does use authentication for SMTP access and if they do, or can turn it on for you, then: 2) Confirm that you have your mail client configured properly OE should alllow one to enter SMTP authentication information along with the server name.
  2. PeterJ

    Bayesian Filtering?

    Excellent. Thanks for devoting some time towards this. I would not have even dreamed of it. Having read up a bit on SpamAssassin these days and not noticing it anywhere as a common implementation, I would not have brought it up.
  3. PeterJ

    Bayesian Filtering?

    I think it could be used safely, just follow her suggestions: I'm not stuck on the "chickenpox" rule set by any means, but I wanted to bring up that one because clearly SpamCop's implementation of SpamAssassin missed a spam that contained this content as ob1db posted. JT was suggesting that percentages of spam slipping through would be useful for people to post, but also useful (assuming JT is interested {and has time} in tweaking SpamAssassin on our behalf) would be what kinds of spam our slipping past SpamAssassin. Then educated decisions could be made on what rule sets may need to be added or modified to SpamAssassin. Either way, it would be cool to hear what rule sets are currently in place for our mail setup.
  4. PeterJ

    Bayesian Filtering?

    JT -- Did you have any comments on custom rule sets?, specifically the ones I mentioned here: The reason I bring it back up is that if one wanted to stop a spam (using SpamAssassin) containing: V+a+lium , XA+n+ax \ V1[at]gRa < At|:v[at]n # Som+a+ . Pnt:e:rmin The "chickenpox" rule set is designed to do so. Here is the desription from the "Jennifer's Set" link above: Can you elaborate on the current rule sets that are being used (I know I have seen "big evil" somewhere in a message I received) so that as users report what percentage of spam is slipping through we can get a clearer picture? All -- Just to stick up for bayesian filtering a little, I do think it is effective. Many people are getting 99% filtering success by using it and some are getting 99.9%. As I stated in a previous post I do not necessarily expect to get SpamAssassin or "bayesian filtering" as a feature of the email service I pay for. It does work especially well at the client level, however it has no repercussion on the Spammers themselves. Recently I enjoyed watching several hours worth of discussion on bayesian filtering and other spam fighting techniques at the MIT spam Conference website(videos here), if anyone cares to check it out, it is pretty interesting. There were several presenters at the conference who can back up the fact that "bayesian filtering" is not being greatly defeated by spammers currently. Spammers use of many extra words in spam (some call it "word salad") has had little effect to date on bayesian filters. John Graham-Cumming (primary author of POPFile) has an interesting presentation on bayesian filtering that can be found about 1 hour and 16 minutes into the "Afternoon 1" video at the above link. Regarding SpamAssassin's efficiacy, many suggest (at least on the SpamAssassin general mailing list) that a combination of custom rule sets (many using Jennifer's Sets) and bayes is best. NOTE: Jennifer's site appears to be down at the moment
  5. PeterJ

    Bayesian Filtering?

    I agree that PopFile and K9 are great programs, but briefly on the topic of K9 being better than PopFile, it is worth taking a look at what Robin Keir's own FAQ has to say on the matter: Robin Keir's - K9 - FAQ To anyone who may be interested in using either of the two programs it is relevant that K9 is only designed for use on Windows.
  6. PeterJ

    Bayesian Filtering?

    Assuming that SpamCop's implementation of SpamAssassin is not optimal, what I was trying to suggest earlier (probably did a poor job at this) is that there are other ways that SpamAssassin can be tweaked besides relying on "bayesian filtering" to be the holy grail. The first thing that comes to mind is that perhaps updated or new rule sets can be added by JT to the SpamAssassin configuration. These may already be in use by SpamCop, but some popular rule sets (from perusing the SpamAssassin NG) are blackhair, popcorn, chickenpox, and weeds. Recently the blackhair and popcorn rule sets were combined into one, more info on these here: Jennifer's Sets A quick look into SpamAssassin history shows that it has been around for about 3 years and has had bayesian capabilites since v2.50 about a year ago. With this in mind I am merely suggesting if SpamCop's SpamAssassin is "broken" then the best way to "fix" it is to improve all areas possible, not "just" by using bayesian based filtering. Now, lets assume that JT has time and interest in setting up Bayesian filtering via SpamAssassin, then I see 3 main methods available after reading a bit about SpamAssassin at their Wiki and other places: 1) [easiest] Turn on bayesian filtering and enable autolearning. This would allow SA to autolearn ham and spam based on it's own scoring. The tolerances that are used as the defaults are <0.1=ham and >6.0=spam. As I understand the documentation this "setup" is on by default, so perhaps JT decided against this at some point in the past (JT?) These defaults oF 0.1 and 6 can be modifed by the administrator if desired, but given that it is "autolearning" this method is not as effective as "training." 2) [harder] Turn on bayesian filtering and disable autolearning, instead rely on training of a "shared corpus" by all SpamCop users. This has promise as being effective and also not excessively hard to implement. I liked the comments at end of this one page of the SpamAssassin Wiki (referenced: SiteWideBayesFeedback) I like this idea, because I can picture where it could be mostly be automated from the administration side and users get most of the benefit of bayesian filtering even if they do not each have their own corpus and their own "ham" and "spam" tokens. Users would not have to spend too much time training, only on false-positives and false-negatives related to "SpamAssassin" 3) [hardest] Turn on bayesian filtering and disable autolearning, instead each user would have their own corpus and any training they conduct will affect only their corpus. This should be the most effective, but also the hardest to implement. Perhaps each user could have IMAP folders named "SA MissedSPam" and "SA NotSpam" Or name them without "SA" if one wanted them to be more universal. Seems complicated....well it is over my head. Last thoughts: As a long time SpamCop Email user I expect my account to provide me with reporting and filtered mail by the SpamCop BL, but have for the most part considered JT's implementation of SpamAssassin as a bonus. I am not knocking it, I guess I am just saying thanks to JT for even bothering to set it up at all...he did not have to. Hopefully this thread can continue to bring more opinions on bayesian filtering and SpamAssassin from the SpamCop mail user base as we can only benefit. JT - is the SpamAssassin setup using any of the custom lists I mentioned above? If not please consider at least the new "blackhair+popcorn" rule set. Thanks much.
  7. PeterJ

    Bayesian Filtering?

    [mainly in response to ob1db] Being no expert on SpamAssassin by any means, hopefully what I have to offer here is correct. I think you are misunderstanding the capabilities of SpamAssassin somewhat. Looking at their website one can see that SpamAssassin can utilize many different methods of detection in a variety of combinations. It is up to a given system administrator to decide what combinations of these methods they would like to implement. SpamAssassin's bayesian capabilities are only one of the variety of methods that can be implemented. That being said, I still think it is worthwhile to consider adding bayesian methods to SpamCop's current implementation of SpamAssassin. It may not be as simple as just turning "something on", because one has to consider the storage and maintenance of the corpus that will be used. Also, will the corpus be shared among all SpamCop users or would each user have an individual corpus (the latter would be more complicated.) SpamCop's current SpamAssassin implementation is effective for me, but that does not mean it cannot be improved. I think I keep my setting on "2" or "3", is there any reason yours is set to "5"? Have you received too many false positves when you set it lower?
  8. Thats Cool. I wonder how effective the "bayesian" portion of the SpamCop SpamAssassin filtering is since it does not receive any feedback...? Do you know if SpamCop's implementation of SpamAssassin utilizes Vipul's Razor?
  9. Perhaps related to this discussion is that many Bayesian (or machine learning) processes are decoding URLs and probably most of them handle IPs, so that a particular IP address could end up with a high "spam value" and thus be filtered. I admit it is not blocking per se, but it has promise if one is not as concerned with bandwidth. It will be interesting to see how blocklists and machine learning concepts team up from here into the future to stop spam... As a spamcop email user via IMAP I have not toyed with Bayesian based filtering even though I am using Thunderbird (now 0.5) I follow the development of POPFile pretty closely as I am waiting for IMAP support and then I will probably give it a try. EDIT: I was just thinking that a reletively young software project named "TarProxy" may be of interest to "jeffc", coupled with a corpus containing decoded URL information...