Jump to content


Photo

KOI8-R charset. Add Unicode conversion for URL


  • Please log in to reply
5 replies to this topic

#1 efa

efa

    Advanced Member

  • Membera
  • PipPipPip
  • 149 posts

Posted 27 May 2011 - 01:05 PM

The "Resolving link obfuscation" engine, do not recognize "KOI8-R" charset for URL:
http://деˆевокƒпи.€„/
and so decode it as:
Decimal ampersand decode: http://45H52>:C?8.@D/

Here some tracking url:
http://www.spamcop.n...6c7c44dde27becz
http://www.spamcop.n...78f7e772a77bd8z
http://www.spamcop.n...bbf188ddeef938z

Edited by efa, 27 May 2011 - 03:34 PM.


#2 Farelf

Farelf

    What Life?

  • Membersph
  • PipPipPipPipPipPip
  • 6,674 posts

Posted 28 May 2011 - 02:48 AM

Not sure if conversion would help with those ones. Previous discussions on links in Cyrillic/KOI8-R characters noted a high incidence of "gaming" with DNS records. Looks like nothing has changed over all the years. What I see is:

"DeshevoKupi.rf moved to another domain OlaKupi.ru"

(which bizarrely came out of Google translate with no actual link specified). There is a deshevokupi.ru domain behind an active server too. Seems like they play by different rules.

Anyway do you want to add a request for parser development in the New Feature Requests? If so, will move this topic there but let's leave it where it is for a little longer should others be able to contribute more to the reporting help aspect.
Plus ca change, plus c'est la meme chose

#3 efa

efa

    Advanced Member

  • Membera
  • PipPipPip
  • 149 posts

Posted 28 May 2011 - 07:36 AM

Not sure if conversion would help with those ones.

xComplaint V.0.12.26e (bash scri_pt) implement the conversion to UTF-8 and then (on Linux) all worked well.
This is the code:
   # xComplaint released as GNU GPL v3
   # $s is the index for spam.txt. headers.txt is the mail head only.
   # Test is conducted for UTF-8 support
   charset=`grep "charset=" headers.txt`
   charset=`echo $charset | awk -F\" '{ print $2 }'`
   if (test "$debug" = 1) then { echo Original charset: "$charset"; } fi
   if (test "$UTF8" = "1") then   # this OS is UTF-8 capable?
	  if (test "$debug" = 1) then { echo UTF-8 capable OS; } fi
	  iconv -t UTF-8 spam"$s".txt > /dev/null 2>&1   # test if already in UTF-8
	  if (test $? = "1" ) then   # need a conversion to UTF-8 ...
		 if (test "$debug" = 1) then { echo Converting to UTF-8 ...; } fi
		 iconv -f $charset -t UTF-8 spam"$s".txt > temp.txt
		 mv temp.txt spam"$s".txt
	  fi
   fi


#4 Farelf

Farelf

    What Life?

  • Membersph
  • PipPipPipPipPipPip
  • 6,674 posts

Posted 28 May 2011 - 09:01 AM

Thanks efa - your skill easily exceeds mine. But the point is the target (DeshevoKupi.rf) has moved or is somehow obscured. SpamCop would not have found an address for the host even if the URI had been properly converted. In other instances going back years that seems to be the same story. But maybe not all the time. A problem is that spamvertized sites are not (never have been) a priority for SC.
Plus ca change, plus c'est la meme chose

#5 efa

efa

    Advanced Member

  • Membera
  • PipPipPip
  • 149 posts

Posted 12 June 2011 - 06:25 PM

its time to move from this section?

#6 Farelf

Farelf

    What Life?

  • Membersph
  • PipPipPipPipPipPip
  • 6,674 posts

Posted 12 June 2011 - 09:42 PM

Done.
Plus ca change, plus c'est la meme chose




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users