First off, I want to state that I realize that the main objective of SpamCop is to report the source of spam, and that identifying and reporting the spamvertized URLs contained within email is secondary. Nonetheless, because this feature is included in SpamCop, I believe that all attempts should be made to keep up with the methods employed by spammers to stop their criminal activities. SpamCop is an automated tool used to fight spam not only for its members but for the good of the worldwide internet community. In keeping with this goal, a simple modification to the SpamCop parsing engine should be made in order to allow it to detect URLs that are currently being missed. So, for the sake of this discussion, I would appreciate it if the argument of SpamCop's URL detection philosophy be left out of this topic.
For quite some time, spammers have been abusing the use of MIME (Multipurpose Internet Mail Extensions) email headers in an attempt to bypass detection and avoid anti-spam techniques. By adding a malformed MIME header line in an email, the spammer causes what essentially amounts to a broken encoding method for the email. The MIME type used by spammers for this purpose is known as the "Alternative Subtype". In the headers of the email, the spammer will add an incomplete Content-Type and boundary line that is commonly used for sending messages in both plain text and HTML format in a situation when it is unknown which format the email client supports. It is my understanding, based on this thread, that SpamCop cannot properly parse URLs contained in emails that include malformed and incomplete MIME headers. In this thread I will attempt to explain MIME to the best of my knowledge and put forth the argument that SpamCop should modify the parsing engine to allow it to detect and report URLs currently bypassed by exploiting this technique.
What is MIME?
Below is an example of the correct implementation of the MIME alternative subtype. I have included numbers prefixing the code to explain it's usage below.
1 MIME-Version: 1.0
2 Content-Type: multipart/alternative;
6 Content-Type: text/plain; charset="ISO-8859-1"
7 Content-Transfer-Encoding: 7bit
10 Content-Type: text/html; charset="ISO-8859-1"
11 Content-Transfer-Encoding: quoted-printable
In lines 1-3 are what is included in the headers of the email. Line 1 defines that the email includes a MIME section. Line 2 and 3 then set that the content is multipart and will include more than one encoding type. The boundary is a set of random characters and may include a timestamp or other information, it will tell the email client where to find and identify the MIME content type. In line 5 we see the boundary code again, prefixed by two hyphens. Lines 6-7 inform the email client that this section is made up of plain text, along with the character set and the encoding. After this is displayed in the body of the email, the message will be shown to the end user in plain text format. In line 9 we again see the boundary code and in 10-11 the content is now HTML. This would normally follow with the same message shown after the previous plain text version for HTML compliant email clients. As you can see, the purpose of the usage of this MIME encoding was to send the email to a client which the sender did not know if it would view (or prefer) the message in HTML or plain text.
How is MIME abused?
When a spammer incorrectly uses MIME, it is similar to using a broken or incomplete syntax. For example, when writing the code to create a link in HTML, the correct syntax would be to use <a href="http://www.website.xyz/">Click here</a>. However, when using a malformed MIME Content-type, it would be like leaving the trailing "</a>" off the end of the HTML a href code. When the email client first sees that the message is MIME encoded and then looks for its follow up boundary code to display the email message in it's preferred format for the reader. If it does not find this, it will do certain things depending upon how it's configured or setup. In most instances, it will simply display the email message without difficulties. An example of an invalid MIME alternative subtype simply looks like the following:
1 MIME-Version: 1.0
2 Content-Type: multipart/alternative; boundary="0-1466100096-1197442086=:47221"
3 Content-Transfer-Encoding: 8bit
Lines 1-3 are included in the headers of the spam email. As you can see, the implementation is correct. However, nowhere in the body of the email is there a boundary follow up code to let the email client know where to look for any content type that what's including in either plain text, HTML, or any other format. This could be caused by a poorly written email program or some other type of error, but in this case it is simply used in a malicious attempt to trick the email client from employing it's spam filters to check the body of the email or any URLs that may be included.
Where does SpamCop come in?
SpamCop trusts the MIME Content-Type/boundary and when the bogus lines are added in the headers it fouls up the parsing engine causing it to bypass or ignore any URLs, no matter how obvious they are to the reader. Why or how this happens, I do not know, as I am not familiar with the specific workings of the SpamCop parsing engine. When an email with bogus MIME Content-Type is passed through the parsing engine, the message
will show up, indicating that SpamCop has missed the URL(s) in the email.
Here are some examples:
The third and fourth links are both phishing emails, which is all the more reason that these URLs need to be reported.
Here is a previous discussion on this topic:
Parsing: Spamcop not finding links in email when there are links
Resources & References
MIME - Wikipedia
RFC 2387: The MIME Multipart/Related Content-type
RFC 2046: MIME Part Two: Media Types - 5.1.4. Alternative Subtype
Content boundary - Wikipedia