Jump to content

Attachment filtering


mrmaxx

Recommended Posts

:excl::excl::excl::excl:

I'm seeing a lot of spam which comes in the form of PDF attachments to otherwise empty or nonsensical messages. I have done some tweaking of my filters, but it's not working nearly well enough to catch even the majority of the PDF spam.

Is there any way that you guys (i.e. JT) could add an option to filter on the type of attachment, or just add a filter option to hold messages that have any attachment? The vast majority of email that I get has no need for attachments, and I don't mind going hunting on the rare occasion that I need an attachment. I also have alternate email addresses which do not go through SC that I can give out. So, please... add an option to automatically hold messages with PDF attachments or ANY attachment if that's too much trouble.

:excl::excl::excl::excl:

Link to comment
Share on other sites

Another approach to the query .. ask the Horde/IMP developers .... contact/FAQ data available in the SpamCop FAQ 'here'

Historically, lots of issues. Recall the days when ISPs took to rejecting anything with a number of file extensions, some still do .. then users took to complaining .... some ISPs still don't allow some file types at all .... some accept the e-mail but strip the attachmanets, some simply drop the e-mail, some really 'help' by killing the e-mail itself, but then generating a note to let you know that you once had an e-mail from somebody, but it's not available anymore ... on and on ... Typically, this wasn't the actual e-mail (server) application, but some filtering after the e-mail had been accepted, and this is where the variances came into play.

Remember, e-mail was originally designed around text only. This frigging HTML and MIME-type extension crap is what started all this mess ....

Link to comment
Share on other sites

Oh, yeah... And I'm having trouble now because I created a filter to filter out messages with the word "postcard" in the subject. Now that is causing me grief...filtering legit messages. *sigh* Sometimes you just can't win.

It would be really nice if there would be some sort of option in the filters to catch messages with a specific mime-type attachment. I tried to create a filter like that using the "user-defined headers" and it really doesn't seem to work. :(

Link to comment
Share on other sites

And I'm having trouble now because I created a filter to filter out messages with the word "postcard" in the subject. Now that is causing me grief...filtering legit messages. *sigh* Sometimes you just can't win.

Yep, that little problem has been a round since day one .. those great-big-companies that provided software filters to protect the kids in school .. who then couldn't finish their homework because so many web-sites were locked-out ....

It would be really nice if there would be some sort of option in the filters to catch messages with a specific mime-type attachment. I tried to create a filter like that using the "user-defined headers" and it really doesn't seem to work. :(

The catch is that a .PDF file isn't actually defined in the headers .... it's just one of the many possible attatchment/file types available under that all-inclusive line Content-Type: multipart/alternative; ... the file itself is "MIME Encapsulated' within the body .... and to all (normal?) e-mail server software, there is no .PDF file at all, it's just seen as (really wierd) text within the body contruct of the e-mail itself.

Link to comment
Share on other sites

I wonder... Somewhere in the BODY of the message, should be the boundary for that MIME part, which DOES contain some information about the file type I believe. Can you create a body filter in the spamcop webmail? (I don't use it myself, so don't know what options are available). If so, does anyone know if said filters on the body text are compared to the original body source, or just rendered "readable" version of the body that actually gets shown?

Guess I'll have to do a little more research on MIME encoding and just what actually exists in the unencoded body of the message when there are file attached....

Ok, some followup... Here is what I see in files with a pdf attachment:

------=_NextPart_[boundary code]
Content-Type: application/octet-stream;
	   name="filename.pdf"
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
	   filename="filename.pdf"

So, if you can do regular expressions in your filters, a filter for 'filename="*.pdf"' SHOULD catch any emails with a pdf attachment. Unfortunately, I'm willing to bet this is beyond the capability of the filters available in most mail applications. Also, it would require that the mail application do its search on the ENTIRE MIME encoded body, rather than just the text or html parts.

Link to comment
Share on other sites

Another approach to the query .. ask the Horde/IMP developers .... contact/FAQ data available in the SpamCop FAQ 'here'

Historically, lots of issues. Recall the days when ISPs took to rejecting anything with a number of file extensions, some still do .. then users took to complaining .... some ISPs still don't allow some file types at all .... some accept the e-mail but strip the attachmanets, some simply drop the e-mail, some really 'help' by killing the e-mail itself, but then generating a note to let you know that you once had an e-mail from somebody, but it's not available anymore ... on and on ... Typically, this wasn't the actual e-mail (server) application, but some filtering after the e-mail had been accepted, and this is where the variances came into play.

Remember, e-mail was originally designed around text only. This frigging HTML and MIME-type extension crap is what started all this mess ....

I took your advice and posted an "enhancement" bug on the bugs.horde.org website. Ticket #5643. Chuck Hagenbach had a question to which I don't know the answer. Can you please advise the answer to this question and then I can post the response, or maybe you (or someone else affiliated with SC webmail service) can follow up??? Here's Chuck's question:

What filtering backend (client side filters, procmail, etc.) do they use?

(If they don't know, you need to make this feature request through them and

have them contact us if they need to). If they use IMAP filtering or

something else that supports searching the whole message body and headers,

you can probably do this. Otherwise it may not be possible unless the

filter backend exposes a specific hook.

Anyway, here's the URL for the Ticket

Link to comment
Share on other sites

I took your advice and posted an "enhancement" bug on the bugs.horde.org website. Ticket #5643. Chuck Hagenbach had a question to which I don't know the answer. Can you please advise the answer to this question and then I can post the response, or maybe you (or someone else affiliated with SC webmail service) can follow up??? Here's Chuck's question:

Anyway, here's the URL for the Ticket

We use the "Ingo" filter plugin for Horde (http://www.horde.org/ingo/), with a standard IMAP backend. You may need to take this request to the Ingo developers instead of the Horde/IMP guys.

We looked into adding PDF spam detection earlier in the mail delivery chain, but the only known methods to detect it reliably are extremely CPU intensive, and would slow all mail delivery to a crawl. There is a lot of talk about this in the spam blocking channels, and hopefully a good solution will be available soon.

-Trevor

Link to comment
Share on other sites

We use the "Ingo" filter plugin for Horde (http://www.horde.org/ingo/), with a standard IMAP backend. You may need to take this request to the Ingo developers instead of the Horde/IMP guys.

We looked into adding PDF spam detection earlier in the mail delivery chain, but the only known methods to detect it reliably are extremely CPU intensive, and would slow all mail delivery to a crawl. There is a lot of talk about this in the spam blocking channels, and hopefully a good solution will be available soon.

Thanks. I have passed the info along to the Horde people. If they say it's something to take to the Ingo developers, I'll pass it along to them as well.

Link to comment
Share on other sites

Just got a response:

You can probably search for PDFs in the whole message, then. There is no

particular facility to search for attachments with IMAP search commands,

and while I'd be happy to hear from Spamcop and work with them on if some

sort of special-case rule makes sense, my opinion is that such a rule is

too specifically targeted to make sense for Ingo in general.

Your best option, of course, is to have a SpamAssassin (or other spam

filter engine) rule that recognizes these PDFs.

Any chance of that happening?

Link to comment
Share on other sites

Just got a response:

Any chance of that happening?

The SpamAssassin rule is what I said would be too processor intensive. We can block *all* incoming attached PDFs at SpamAssassin without any trouble, but it would be unfair to people who actually use PDFs often. There is a method to scan them for spam, but it involves converting the PDF, extracting the image from it, running OCR software on the image, then passing the text from the image into SpamAssassin... but that's just too much to do on all the traffic we receive.

It's very likely that a more advanced SpamAsssassin rule will come along soon that can filter a good chunk of the PDF spam out without having to interpret the embedded images, and when it does we'll certainly get it installed.

-Trevor

Link to comment
Share on other sites

Maybe I'm missing something, but I never said anything about the content of the PDF, I merely wanted to filter any emails with PDF attachments to the spam folder. Is that extensively resource intensive? Maybe I'm not a typical SC user, but I rarely get any email with attachments here, and if a few legitimate emails get filtered, it wouldn't be the end of the world as I can always go retrieve them from the held mail folder.

Link to comment
Share on other sites

I would guess the majority of users actually do receive attachments they want without having to go through their Held Mail. Out of 15 current messages in my inbox, I have 4 that have attachments I have been waiting for. I also rarely see the PDF spam in my inbox, maybe 2 per week.

I don't know how hard it would be to implement attachment blocking on a per user basis.

Link to comment
Share on other sites

Could SpamAssassin be configured to add an X-Header when a PDF is present? And then a filtering option added in the spamcop web interface that would simply drop anything containing that particular X-Header into Held Mail if the user so wanted? Would require a bit of coding I think, but it might be handy for other attachments types as well where some users want to filter them, and some don't.

Link to comment
Share on other sites

Could SpamAssassin be configured to add an X-Header when a PDF is present? And then a filtering option added in the spamcop web interface that would simply drop anything containing that particular X-Header into Held Mail if the user so wanted? Would require a bit of coding I think, but it might be handy for other attachments types as well where some users want to filter them, and some don't.

That sounds like an excellent idea... makes it easier to filter for those that want to block this type of spam.

Link to comment
Share on other sites

Could SpamAssassin be configured to add an X-Header when a PDF is present? And then a filtering option added in the spamcop web interface that would simply drop anything containing that particular X-Header into Held Mail if the user so wanted?

I'm not sure this would be ideal - except for those that don't mind either opening their webmail interface to trigger the filter or don't mind downloading the spam and having a filter in their mail program.

Perhaps a better SA option would be to have this as an optional setting in SA and add an SA score of, say, 4 or 5 for those that wanted it.

That said, personally I have no interest in blocking pdf attachments. Most that I receive are legitimate and the majority of those which are spam get caught because they come from sources already listed in the SCBL or one of the other rbls. Like Steven I get one or two of these per week through to my mailbox.

Andrew

Link to comment
Share on other sites

I'm not sure this would be ideal - except for those that don't mind either opening their webmail interface to trigger the filter or don't mind downloading the spam and having a filter in their mail program.

Perhaps a better SA option would be to have this as an optional setting in SA and add an SA score of, say, 4 or 5 for those that wanted it.

That said, personally I have no interest in blocking pdf attachments. Most that I receive are legitimate and the majority of those which are spam get caught because they come from sources already listed in the SCBL or one of the other rbls. Like Steven I get one or two of these per week through to my mailbox.

Be glad. I have about 5 or 6 email addresses that find their way back to my spam Cop account. I don't really know how many I get any more, because I can identify them pretty much off the bat, but I know I was getting a handful of them per day! I, for one, typically leave a browser window open 24/7 to SpamCop at home, and most of the time when I'm at work, I have a window open there as well. Unfortunately SC will log me off automagically after a certain amount of "idle time" so I have to log back in, but that's OK. As soon as I log in, the filters kick in and some of the spam gets moved. Unfortunately not all of it.

I already have my spam settings set to "1" and I'm still getting a ton of spam through to my inbox. I'd like it if we could make SA more aggressive in it's tagging of things, because I'm getting a lot of stuff that's immediately identifiable as spam, but it's not being caught by the filters. :(

Link to comment
Share on other sites

I already have my spam settings set to "1" and I'm still getting a ton of spam through to my inbox. I'd like it if we could make SA more aggressive in it's tagging of things, because I'm getting a lot of stuff that's immediately identifiable as spam, but it's not being caught by the filters. :(

Are you only using SA? I have my SA settings to 5 but also select all of the available BL's and, as stated, rarely see a PDF spam in the Inbox, without any client side filtering. Like you, I have about 6 accounts all forward into this account and I check my account at least every hour, though not non-stop.

Link to comment
Share on other sites

Are you only using SA? I have my SA settings to 5 but also select all of the available BL's and, as stated, rarely see a PDF spam in the Inbox, without any client side filtering. Like you, I have about 6 accounts all forward into this account and I check my account at least every hour, though not non-stop.

Nope. I have every option for filtering enabled, and I'm still getting tons of spam. Now, granted, the PDF spam may be gone, for now, I don't know... haven't actually read any spam lately, but I'm still getting tons of spam through the filters.

Link to comment
Share on other sites

Nope. I have every option for filtering enabled, and I'm still getting tons of spam. Now, granted, the PDF spam may be gone, for now, I don't know... haven't actually read any spam lately, but I'm still getting tons of spam through the filters.

Could you give some numbers please ?

My own (SA=3.0) for July are

4305 spams, 110 leakers (=2.6 %), 0 false positive(s),

The only 'cute trick' I use is to add bankofamerica.com, ru, tw, and such to my personal blacklist.

See the thread "how to use Spamcop Mail'.

Link to comment
Share on other sites

Frankly, I haven't been keeping count. I'll see what I can do tomorrow to count how many I get. However, I do know that just in a couple hours, I got 70 more in my "held" folder, not to mention another 5 that got through the filters, including bounces and auto-replies to non-existent email addresses on one of my three domains. :(

Link to comment
Share on other sites

Could you give some numbers please ?

My own (SA=3.0) for July are

4305 spams, 110 leakers (=2.6 %), 0 false positive(s),

The only 'cute trick' I use is to add bankofamerica.com, ru, tw, and such to my personal blacklist.

See the thread "how to use Spamcop Mail'.

Since last night, I've had 32 messages get through the filters. Granted about 2/3 of those were unsolicited bounce messages for emails *I* didn't send. In my held email folder I have almost 1200 messages overnight. In my inbox, the non-spam messages are 288.

Link to comment
Share on other sites

Sounds like you are using a catch-all address on your domain. If so, your numbers are going to be MUCH higher than typical, as you're going to pick up every piece of junk from every dictionary attack that goes on.

Yeah... I started out with my domains back before there was a huge problem with spam. Thus, I was using the "catchall" address to handle "targeted" email addresses. Doesn't work very well now. :(

Link to comment
Share on other sites

Yeah... I started out with my domains back before there was a huge problem with spam. Thus, I was using the "catchall" address to handle "targeted" email addresses. Doesn't work very well now. :(

Well that does explain the volume of junk you're receiving. :(

My brother's small company used a catchall mailbox since it allowed them to add users on their local system without having to configure the mail server. They still receive between 2,000 and 3,000 messages a day which are junk but they are now dropped at the SMTP stage by our mail server and only messages explicitly addressed to one of five members of staff are allowed into the 'system'.

It might be worth your while to make the switch.

Andrew

Link to comment
Share on other sites

Since last night, I've had 32 messages get through the filters. Granted about 2/3 of those were unsolicited bounce messages for emails *I* didn't send. In my held email folder I have almost 1200 messages overnight. In my inbox, the non-spam messages are 288.

1520 total messages

288 non-spam messages 18.9%

1200 spam messages 78.9%

32 false negative 2.1%

That sounds typical for any spam filtering system. Cut down on the total messages and your numbers of false negative will drop as well.

For the 14 domains I have access to, for the month of August - no stats available on false negative, but our users complain about everyting and we receive relatively few complaints about spam:

995,882 total messages

72,497 passed messages 7.3%

663,952 blocked messages 67.0%

255,087 quarantined messages 25.7%

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...