Jump to content

Manual form reporting with HTML


enigma1

Recommended Posts

If I manually post HTML code in the reporting form in SC it seems to translate the HTML presumably for security but when it parses the body of the mail it doesn't revert back to process the actual html. The end result may involve invalid links because of the translated characters that may be assumed as spam links. For example:

ISP does not wish to receive reports regarding http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd - no date available

http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd has been appealed previously.

I tried in different ways copying it from a textarea directly into the SC form or copying it to the notepad first and then to the SC form so it's not the clipboard during the paste operation from my part.

Link to comment
Share on other sites

If I manually post HTML code in the reporting form in SC it seems to translate the HTML presumably for security but when it parses the body of the mail it doesn't revert back to process the actual html. The end result may involve invalid links because of the translated characters that may be assumed as spam links. For example:

Sorry, but I don't see enough data to try to sort out what you are really doing, much less the results you are trying to describe. Your 'code' example doesn't explain anything to me, rather just raising questions as to what you are actually "manually entering" ????

Link to comment
Share on other sites

If I manually post HTML code in the reporting form in SC it seems to translate the HTML presumably for security but when it parses the body of the mail it doesn't revert back to process the actual html.
Normally, when I submit data via the form, I am submitting entire e-mail packets, of which the HTML would only be a part. Typically, the HTML would be "embedded" within the packet using proper MIME delimiters (and possibly encoded using some form of MIME encoding).

As an experiment, I tried just pasting in the unencoded HTML part of a (nonspam) message into the VER form, but SpamCop rejected the entry because it could find no SMTP header (and it was correct -- there was none), so we didn't even get to the stage of deciding which links were good or bad. So, I'm puzzled as to what you submitted.

Just where are you getting the data that you are posting -- is it the complete raw text of a mail packet, or just a part? Did you alter the contents before pasting in? Can you post a tracking URL for such a message?

-- rick

Link to comment
Share on other sites

Normally, when I submit data via the form, I am submitting entire e-mail packets, of which the HTML would only be a part.

Yes I also do the same the whole thing. So say if you have outlook you take the headers you put them in the form then you click source on the mail you take the html and you post it. With other mailers this maybe in one step. Headers+Mail Content is retrieved and posted. The form will discard submission unless the headers are present and valid.

So with some emails after submitting the form I get the notices I mentioned earlier. And I cannot find the exact report that had the problem now, I posted several yesterday and deleted the original mails. I will update this thread with the tracking url the next time I see it.

Link to comment
Share on other sites

So say if you have outlook you take the headers you put them in the form then you click source on the mail you take the html and you post it. With other mailers this maybe in one step. Headers+Mail Content is retrieved and posted. The form will discard submission unless the headers are present and valid.

So much traffic on the use of Outlook, exampled by;

Outlook received header problem, -need to cease E-mail spam forwarding

Outlook 'Foward as Attachment' no longer authorized !!

SpamCop FAQ : SpamCop Parsing and Reporting Service : How do I get my email program to reveal the full, unmodified email? : Microsoft products : Microsoft Outlook (all versions)

Major point of issue is the version number of Outlook involved, followed by its configuration, the type and configuration of the e-mail server being connected to (i.e. is it an Exchange server or not?) on and on. More details needed that feed into the problem and its possible resolution.

Link to comment
Share on other sites

Thanks for the tracking link. I would expect to see the HTML portion contained within a MIME text/html part, but that isn't what your tracker shows.

Do you use the "two part" form, in which you paste the header in one part, and the body in another? This helps reconstitute the original message structure which Microsoft (and others) tend to clobber into uselessness. I know that I have to use this form when I report spam I get at work (using MS Outlook / Exchange), I don't know whether this is also required for other versions of Outlook (i.e., outlook express).

-- rick

Link to comment
Share on other sites

Egad Rick, you seem to have hit the nail(s) on the head.

Putting that spam into the 2-part form for submission gives me

http://www.spamcop.net/sc?id=z4897477139zb...fb1cf219956aeaz

- SC has not been tempted into parsing the HTML declaration as a link.

Alternatively, dummying in a mime declaration and boundaries and submitting via the single box submission form gives me

http://www.spamcop.net/sc?id=z4897468225z2...939ad0460c9efez

-again, no spurious analysis of http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd

So, enigma1, it looks like you are using Outlook and you need to hit the Select outlook/eudora workaround form link on the form and work thereafter in the 2-part form.

Link to comment
Share on other sites

Actually I did use the single form not the outlook one and it wasn't outlook the mailer. I think the problem comes when the HTML is perhaps not formatted the standard way from its origin. It still valid though at least with the w3c validator.

<!DOCTYPE html
  PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">

See the line break in the first line? Could be the reason of the problem but I still think is the SC form processing that does it.. You would expect the SC form to check if it's an <a> tag since the format is html not plain text. A browser will treat this first section (up to the dtd"> as a single tag for the doctype.

Link to comment
Share on other sites

Farelf, the 2 tracking urls you posted, do not show the line break right after the "doctype html". So somehow you removed it during the copy/paste operation? What I posted is exactly the content of the mail as it came through. I did not convert or remove the line breaks.

Link to comment
Share on other sites

Farelf, the 2 tracking urls you posted, do not show the line break right after the "doctype html". So somehow you removed it during the copy/paste operation? What I posted is exactly the content of the mail as it came through. I did not convert or remove the line breaks.
You're right - but you should still try the 2-part form. Here is the parse again, without my "correction" of the line break:

http://www.spamcop.net/sc?id=z4898591305z3...f7313e06b6df90z

Same result, it handles the parse correctly without getting caught up with the unexpected form of HTML declaration.

Not sure why using the 2-part form would be necessary if you're not using Outlook (or Eudora). But it seems to work. I do appreciate that the 2-part form is not as convenient to use as the single-part (not to mention that it is not yet apparent whether the 2-part form will correctly parse your "other" spam). But it is apparently a work-around for the HTML case, with the full text as it presents itself to your system. I am not specifically aware of others having the same difficulty but I guess there might be, which could explain the "does not wish to receive reports about w3.org/TR/xhtml1/DTD/xhtml1" seen in the handling of your parses - those supposed others are just not noticing it, or just not mentioning it (anyway, it is not even a factor with "Quick/VER" reporting).

In any event, SC staff may care to note something funny is going on. I don't think there is much development work going on with the parser but that's something we fellow users can only guess about.

Link to comment
Share on other sites

Looking at your last tracking this is the line which makes the difference.

Content-Type: text/html

The header entry seems to instruct the parser for html instead of the actual content of the mail. I just tried it with the original email I have and using the single form (not the 2 parts one) and I did not get the spurious link message. I inserted the content-type just after the last header line.

Still the original email headers did not include a content-type header. So is that inserted for the 2 parts form automatically by the SC processing?

Link to comment
Share on other sites

...Still the original email headers did not include a content-type header. So is that inserted for the 2 parts form automatically by the SC processing?
Yes it is, one of the things it does to make the 'bizaro headers' (as I seem to recall the SC founder called them) work. Of course we users are not allowed to make such changes to headers. I had never noticed that specific insertion before, it is otherwise pretty much a black box. You can verify any and all of this by trying it yourself. You always have the option of cancelling the report (if, for instance, you have already processed a report on the particular piece of spam). Still, I am learning too.:)
Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...