Jump to content

parse inconsistency


Recommended Posts

The parser consistently gives different results for a certain type of html formatting, depending on whether the spam is forwarded or copied into the window.

1 Forwarding spam including the 3d-mangled url <a href=3D"http://www.mega-health.net/cia/?sash"> results in resolving the link to to with reporting addresses

mail-abuse[at]nic.br and "Internal spamcop handling: (spambr)" (apparently spambr[at]admin.spamcop.net).

2. When the spam is copied into the window, the url is parsed as unresolvable. Subsequently copying the domain name alone into another window resolves it to the correct IP address but with only the reporting address mail-abuse[at]nic.br.

This difference has a practical consequence because a lot of spam urls are 3d-mangled, and I often use the copy method when the delays in the alert notifications following the forwards are so long that I don't wait for them. The copy procedure is already more time-consuming without the additional manual url search.

Link to comment
Share on other sites

In the specific piece of spam your example comes from .. Content-Type: is what? The "3D" is normally fallout from "Quoted-Printable" .. but your example is wrong, even for that. But to continue, in the specific spam, is there a line break showing on-screen that changes in the Forwarding / Copy/Paste operation as compared to the raw source of the spam content ??

Link to comment
Share on other sites

In the email header:

Content-Type: multipart/alternative;

Below the boundary:

Content-Type: text/html;

Content-Transfer-Encoding: quoted-printable

And I should have mentioned that

- the forwards that work are directly from the server via procmail.

- the copy approach that fails to get resolved links is for the split window form for Eudora

Meanwhile I have systematically experimented some more with this. The inconsistency in resolving "3D" links seems to be not "forward vs. copy" but "split vs single window".

If I copy the body into the "body window" the parser can't resolve the links with "3D". Sometimes it doesn't find the links at all, but that may be a different problem. The failure to resolve is independent of whether the lines for the second instance of Content-Type, etc. are omitted from the body.

If I include the leading boundary code line along with the body, the parser says to use the single window form. When I do that the results are inconsistent: "No source IP address found, cannot proceed" or often it works and properly resolves the "3D" links. But depending on the spam it sometimes takes more work to get the single window to work at all -- see below.

The two cases above are without changing the line breaks. There is a blank line before the boundary and a line break after the boundary in all the cases I have examined.

Depending on the spam, getting the copy method to works can be tricky. It sometimes requires omitting extraneous "outer" html tags that appear to be associated with nested html "bodies": an outer one for the whole email (as seen in Eudora view source) and in inner one for the email's actual body.

If the boundary and 2nd set of Content-Type and Content-Transfer-Encoding lines are omitted entirely or included in the header part of the split window, and the extraneous closing trailing "outer" html tags are omitted, then what appears to be the entire original email can be parsed from the split window, but the "3D" links are not resolved.

If the boundary (and 2nd set of Content-Type and Content-Transfer-Encoding lines) are included in the body part of the split window, the parser wants the single window version of the form. As long as all the apparent "outer html" tags are omitted, that will normally work and the links are resolved, but there can be more html tags to take care of, not just the closing tags.

Those subtleties are not required for all spams, and when they are it is a lot easier to do in practice once the patterns become apparent than the description sounds, but the end result is always that when the single window form is used the links resolve and when the split window form is used they don't.

What is the "3D" for and how does it get into the html? It seems to be appearing more frequently and is now in all the .cn hosted links I see, which appear to be the same spamming operation. A variation on the problem is the occasional appearance of and extra pair of slashes "//" in addition to the "3D" in the URL.

Link to comment
Share on other sites

There's another Topic in here (sorry, right now not a clue which one, maybe the one started by Julian on the Beta?) but there's been some dialog on line-length and wrapping issues. Possible conection.

Then there's another topic a bit closer to your situation, dealing with the spammer mving header lines around, specifically moving the first boundary line back up into the header. Recall that the Outlook / Eudora "hack" was to account for the fact that both of these products made the boundary lines disappear, thus the two-part form to work around the "lack" of boundary lines in the actual body. That you're talking about having some boundary lines to move around suggest that you're not dealing with a "standard" e-mail issue as handled by Outlook/Eudora.

In the past, others have talked of moving e-mails from Eudora over to an Outlook Express folder and reporting them from there. Have you seen / tried this? If nothing else, perhaps try to do one of these "move" operations so you can get a better look at the spam structure ..??

Link to comment
Share on other sites

I saw the posts (wherever they are) on line wrapping, etc. and they don't apply to these cases.

As for alternate views of the spam: No outhouse here -- all mail is piped in by indoor plumbing, with spam ultimately piped to dev/null aka cess pool aka spammer's mind.

I can easily see the original structure of the spam unprocessed by Eudora by looking at it in the archive file created on the server by procmail. The structure is the same as it is in the Eudora mailbox file (which only adds some extra x-headers). But Eudora's "View Source", the normal source for copying into spamcop, changes the structure when it encapsulates the entire email, including the headers, in its own html. It moves outer html opening and closing tags to the outside of the whole mail and creates its own html code for the headers. From what I have seen, the boundary is embedded in this html, not omitted.

The main point is the failure to parse links when copying into the split window.

A simple case of spam with html and "3D" links, but no html headers in the spam body, had this structure (as received in Eudora from SpamAssassin as plain text so Eudora's "View Source" is not required):

**mail headers (plain text)
Content-Type: multipart/alternative;
        boundary="[boundary code]"
**blank line
**blank line
--[boundary code]
Content-Type: text/html;
Content-Transfer-Encoding: quoted-printable
**blank line
**spam body html
**blank line
--[boundary code]

Using the method of split window copy:

1. If the boundary code line is omitted. The parser cannot resolve the "3D" links.

2. If the boundary code method is included in the body window, spamcop insists on using a single window.

3. If the boundary code method is included in the header window, the parser resolves the "3D" links.

The more complex example of spam with "3D" links described previously (3/19) had html headers in the body of the spam, including a spurious leading </head>. The structure in the original compared with Eudora's "View Source" is:

Eudora mailbox=linux mailbox                        Eudora View Source  
----------------------------                        ------------------        

                                                     **styles for mail headers
**mail headers (plain text)                         **mail headers (html)        
**blank line                                        **blank line
Content-Type: multipart/alternative;                Content-Type: multipart/alternative;     
        boundary="--[boundary code]"                        boundary="--[boundary code]"     
**xheaders &amp; staus (plain text)                     **xheaders &amp; status (plain text)
**blank line                                        **blank line
----[boundary code]                                 &lt;div&gt;----[boundary code]        
Content-Type: text/html;                            &lt;div&gt;Content-Type: text/html;     	 
Content-Transfer-Encoding: quoted-printable         &lt;div&gt;Content-Transfer-Encoding: quoted-printable   
**no line                                           &lt;br&gt;
**blank line                                        **blank line
&lt;html&gt;                                              **no line [html moved up to top]
&lt;/head&gt;                                             &lt;/head&gt;          
&lt;body&gt;                                              &lt;body&gt;            
**spam body html                                    **spam body html--       	 
**blank line                                        **blank line
&lt;/body&gt;                                             &lt;/body&gt;          
**blank line                                        **blank line
&lt;/html&gt;                                             &lt;br&gt; [html moved down to bottom]
**blank line                                        **no line        
----[boundary code]                                 ----[boundary code]       	 
                                                     &lt;/body&gt;&lt;/html&gt; [extra outer body close]

This complicated the copying, as described previously, but the main issue was that when copied into the split window in a way that would parse at all, the "3D" links could not be resolved, even though they are when using the single window (when it works at all) or forwarding directly from the server.

Link to comment
Share on other sites


This topic is now archived and is closed to further replies.

  • Create New...