Jump to content

Remove info that can identify sender


laxxe

Recommended Posts

Hi!

I know SpamCop removes my mail address, when I'm sening a complain. I would like to add other things, that SpamCop should remove:

- My username (before [at])

- My domain (after [at])

- Other texts (that I can specify myself)

- Random text (this is a hard one to remove, but it's the random text, that's often seen in spam mails, so the spammer can identy the person who complained).

My Delivered-To headers look like this:

domain-username[at]domain.tld

Right now only username[at]domain.tld is being "converted" into an x, so the line would look like this to the spammer:

Delivered-To: domain-x

Since I'm the only person who uses this domain, the spammer can pretty easy identify me. He can guess on the .tld (top level domain, like .com), or just search his list for the domain.

Please note: The domainname before the [at] is without ".tld".

Example:

Delivered-To: frfgrgxdomain-username[at]frfgrgxdomain.com

becomes:

Delivered-To: frfgrgxdomain-x

I hope this makes sense to you. Can this be fixed? I don't like the spammers to know who I am - when some stupid ISPs forward the complain to them.

Keep up the good work!

Best regards,

Lasse

Link to comment
Share on other sites

I don't like the spammers to know who I am - when some stupid ISPs forward the complain to them.

I stopped munging data on my home account after thinking and reading about all the ways spammers can use to hide your information in their spam messages. Even the seemingly random characters in a spam could be a link in a database to the address the message was sent to.

The only way to be completely safe is to not send any reports. There is also the mole status, but that currently seems to be pretty useless except for giving the interested ISP's a count of the number of spams that were reported. Last I heard, the mole reports are weighted a value of 0 for inclusion in the bl.

I have never had a major attack against me and my spam level has not increased or decreased when compared to my work account which is munged through a free account. I'm sure some spammers have removed me from their lists (I get very little porn spam, which originally led me to spamcop) and probably have been added to other lists because of this information as well. I feel they have pretty much balanced themselves out.

Good luck.

Link to comment
Share on other sites

And only to add ... if you can figure out how to write a routine that works the miracles you ask, please share ....  moving to the "New Features / Suggestions" Forum ....

23292[/snapback]

What language do you want it in?

a) Remove any prefixes from wildcarded DNS, they are almost always identifiers - i.e. change the site to *.spammer.tld. Trivial work in any language, one potenialy recursive DNS query to decide upon.

B) Remove the "afiliate" postfix from sites' IRIs (e.g. ...tld/?randomString); They are often identifiers, and once a single one is known, the page can be tested to see if any is required or if random strings or null strings give the same responce. This is the most complex issue - depending on how "smart" it must be the total time is likely on the order of a week to create code that both works and is efficient.

c) Remove the recipients domain name everywhere it appears (including, but not limited to headers, msgids, and URIs) - make an exception for any idenitifiable ISP (vs. a buisiness or individual) - this requires a database list of known ISPs (not as hard as it seems for 95% of the cases - whois and the presence of a whois server or the method of IP allocation (i.e. direct allocation for a TL-Nic, swip, reallocation or assignment) "usually* will tell you the proper action. Again, farily simple, without knowing the code structure of the parser, I'd still guess under two days, no matter what/how the current implimentation is done.

d) Recognize the "hash buster code" and replace it with a marker like "Random words here" with a count of the lines, words and characters - again the "hash buster" ihas often had a unique checksum associated which allows tracking back to the recipient. Possibly noting whether dictionary words, common words or random strings were used. Likely between a half a day to a single long day depending on the existing implimentation and possible contraints (need to minimize time and memory usage for large hashes).

These four cases would potentially require a pair of whois lookups, two or three recursive DNS queries and some trivial string manipulations and arithmetic. Most of the results for users could be cached for a significant time (no spammer 120 sec. TTLs) and the whois data could be cached for at least a day and probably a week or more. The only "expensive": operation is the attempt to remove the fake or real "affiliate" codes - this might require two or three attempts to load the spammer's site's page and would have to be performed through a set of proxies to avoid detection of the particular machines used (and, of course the counter-measure of blocking them once identified) though this infomation also, can be effectively cached (likely for at least days).

If I guess in 'C' about 1 and a half weeks to both code and create the infastructure needed; For me, add about a half week for fitting into someone else's C++ and likely a week further for typical Perl (though some "clean" Perl would be about the same as for 'C' -- More obscure languages would depend depend on if there is a FLI and how much experience I have in them - many I know are long obsolete; e.g. Fortran [89]x would add several weeks, smalltalk or Lisp are unreasonable choices though Algol68 with library support might be simple and Pascal would be workable, Modula[23] would be similar).

This is not volunteering, but I might be persuaded to take this on over a period of time (one to two months total, to fit in in with work and sleep)..

Link to comment
Share on other sites

There doesn't seem to be a lot of exactness in your proposals.

a)...they are almost always

B)...They are often

c)...make an exception..."usually* will tell you

d)..."hash buster" ihas often had

What do you do for the cases where the above assumtions are incorrect and you have now reported to the wrong location?

Only the spammer really knows where the identifiers are located. Perhaps if we pass a law that says the spammers need to identify all their identifiers in a particular way....sorry.

Link to comment
Share on other sites

There doesn't seem to be a lot of exactness in your proposals.

What do you do for the cases where the above assumtions are incorrect and you have now reported to the wrong location?

Only the spammer really knows where the identifiers are located.  Perhaps if we pass a law that says the spammers need to identify all their identifiers in a particular way....sorry.

23299[/snapback]

Your are correct, I was overly vague, so I'll try to touch each point individually.

a) - almost always - : If the dns for xyz.abc.spammer.tld resolves through a wildcard to *.spammer.tld, there is no point in ever including the host or subdomain names, regardless of whether they contain identifiers or not; Merely noting that the hostname is wildcarded and changing the report so reflect that (with a *.spammer.tld or equivalent should always be acceptable).

B) - They are often - Again, assuming the same page is displayed with or without the "affiliate" code, no information is lost, and in many cases where an affiliate code is needed to access the URL, a random string of the same length usually works (there I go again with usually - I did say this is the toughest part of the problem); However after a single report, sucessive reports where the URL differs only by the affiliate ID string can be changed to all use a common one (preferably from a user who has given permission or from a "floating" spam trap).

c) - make an exception..."usually* will tell you - if the domain is registered at a residence (noted in many reistrations) or is a "known" ISP (ex. sbcglobal.com et. al, juno, verizon, etc.) the answer is obvious, also users could specify that the domain they use is personal, which would entirely avoid the problem.

d) -"hash buster" has often had - like the first case, no information is lost by munging the seemingly random strings which are otherwise unconnected to the message; A more difficult problem is the recent trend to append "jokes" or short quotes from literature to the text of the message and one might argue that in at least some cases I've seen, the added text *is* germane to the original subject (still, when I get two spams from the same source but to different accounts and this is the only difference, the matter becomes quite clear - nothing will be lost in the effectiveness of investigating the report, which should center around the advertized product not the gibberish added to beat systems like Razor, Pyzor and DCC, even when it doesn't contain any identifiers.

I'm fairly certain that none of my proposals would ever change who the reports go to (I never proposed changing any headers *before* those that identify a "personal" domain. And I only proposed changing the URL when it can be demonstrated that the host and/or subdomains are irrelavent. It is VERY important, that no changes occur before parsing, only in the actual reports sent out.

Link to comment
Share on other sites

Don't forgot the instances where they disguise your e-mail address with rot13.

23308[/snapback]

That and sha1 or md[45] hashs of your email *usually* are covered by the cases (B) and (d) I gave. Clearly, this is a growing trend (rot13 is old, and many tools will decode it - a transposed md5 encoding seems to be becoming popular - it is harder to recognize, easy to compute and the more advanced spammers will use a database to match the message to the recipient rather than just directly added something which would be subject to cryptoanalysis).

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...