Those Crazy Belgians
* I believe it is important to point out that I see myself as a kindered spirit to the Great Lyle Zapato, and that I fully ascribe to his strongly held belief that Belgium doesn’t exist. Therefore, while I will - for convenience sake - describe the following attack as “having originated from Brussels Hoofdstedelijk Gewest, Belgium,” we all know that Belgium is, and has always been, a leftist ruse.
It all began with some Python code that wouldn’t run…
I have a bunch of Python code that I use to extract various information from my honeypots. One of those scripts dumps out a list of URIs being “advertised” by comment spammers on some of the fake comment pages in my web app honeypot. Generally, those URIs point to pages that have been added to unsuspecting websites (mostly those running WordPress, The WebApp Hacker’s BFF™). Generally, I try to notify as many of those folks as I can and, one day, I fully expect to be cannonized as the Patron Saint of the Hacked Website.
This morning, my script didn’t work. More precisely, it just hung…
After doing a bit of digging, I discovered that one comment in particular was causing things to go awry:
POST /comments HTTP/1.1\r\n
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\n
Accept-Encoding: gzip, deflate\r\n
Accept-Language: en-GB,en;q=0.5\r\n
Connection: keep-alive\r\n
Content-Length: 3100425\r\n
Content-Type: application/x-www-form-urlencoded\r\n
Dnt: 1\r\n
Host: <redacted>\r\n
Referer: http://<redacted>/index\r\n
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:46.0) Gecko/20100101 Firefox/46.0\r\n\r\n
comment=%C3%81%2F%C3%8C%C3%BA%7D%C3%8F%40%2C%C3%BD%C3%9D%C3%93_%C3%93%C3%89%C3%97_%C3%82%C3%8E%C2%BB%C2%A4
%C2%BD%C2%AA%C3%9C%C3%8F%C2%BA%C3%8B%C3%BE%C2%AC%C3%A5%3B%C2%A5%C2%A4%C3%BE%C3%B3%25%C3%A0%5C%C2%B2%C2%B5
%C2%B5%3E%C3%AA%C3%95%2B%C2%A1%C3%91%2B%C3%AF%C3%80%7B%C3%90%C3%AB%28%3D%C3%A6%C2%AB%C3%92_%C3%9A.%C3%87%C3
%A0%21%29%C3%B9%C3%8A%23%C3%8A%C3%9C%C3%BF%C3%A7%C2%B4%3F%C2%A9%7B%C3%99%C3%A7%C3%99%C2%B1%C2%B6%C3%96%C3%84
%C2%A7%C2%B8%C2%B1*%C2%B8%C3%B7%C3%92%C3%A4%C2%B6%C3%AB%C3%A1+%C2%AB%22%60%C3%94%C2%BD%60%5C%C3%AE%24%C3%BF
%C2%AF%21%C2%B1%C3%A3%C2%BD%C3%BF%24%C3%BB%C3%A8%C2%A8%C2%AC%3F%C2%B8%C2%AC%C2%B2%C2%B4%C2%A8%C3%94%C2%BD
%C2%A7*%C3%BB%60%C3%94%C3%9A%C3%86%C3%BD%3C%C3%A5%C3%B3%C3%8E%3F%C3%B6%C3%90%C3%8B%C3%8F%29%60%C2%BF%27%C3
%B1%C3%83%5C%C2%B8%C3%9D%40%C3%9D%C3%A7%C3%9C%C3%8A%C3%B8%21.%7E%60%C2%B2%C2%A4%7D%C2%BA%C3%A3%3D%C3%B0%C2
%BF%C2%AC%C2%B4%C3%A6%C3%88%7E%C3%9B%C2%B7%C2%A2%C3%A9%3D%C3%90%5E%C2%BB%C3%A6%C3%B0%5E%C3%A5%C3%9D%C2%AC%C3
.
.
.
%C2%BA%23%C2%A7%C3%AC%C3%B9%5C%C3%85%C2%A1%C3%B0%2C_%40%C3%A3%C3%92%3C%C3%B8%C3%AE%3A%C3%AF%C3%8E%C3%A7%C3
%B9%C3%B7%C3%80%C3%B0%C2%B1%C3%86%5C%3F%2B%C2%BC%60%C2%AA%C3%84%C2%B2%C2%BA%C3%B7%C2%A8%C2%A7%60%C2%BC%C2
%AB%C2%AF*%7D%C2%BE_%C3%96%C3%9A%5E%5D%C2%BD%C3%90%C3%85%C3%89%C3%B0*%C3%8E%C3%AE%C2%AF%21%C3%A0%C3%86%C3
%B0%C3%BA%28%C3%A8%C2%B8%C3%80%C3%92%7D%C3%83%C3%B1%C3%9A%C3%A4%C2%A5%C3%BD%C3%84%C3%B7%C3%99%C2%A6%29%28
%2B_%C3%9A%C3%95%26%C2%A1%C3%8F%C3%8D%C3%94&submit=Submit
Notice the “Content-Length” in there… Yep, that’s 3 MEGABYTES o’comments… somebody apparently has a lot of stuff to get off their chest. (Kinda like this: I got an Amazon Echo, and three days ago I asked, “Alexa, what does it take to make a woman happy?” and she hasn’t shut up since…)
So… what the heck is that? Well, at first glance, it looks to be a chunk of URL encoded data - the bulk of which represent non-ASCII values. (If you look closely, there are a few ‘+’ and ‘.’ characters in there…)
A little creative use of the Linux command line tools head
and tail
with negative parameters to the -c switch and I’d cut out only the URL encoded “comment” portion of the POST (waaaay easier than trying to deal with a 3MB file in a text editor…). I hacked together a little Perl code using URL::Encode, and turned all of those percent-encoded numbers back into a binary file in no time.
I opened up the binary file in a hex editor aaaaaand… nothing. It doesn’t look like any file type I’ve seen before.
I tossed it to the Linux file
command, and it said: UTF-8 Unicode text, with very long lines, with CRLF line terminators
Seriously?!? CRLF line terminators pretty much always means it originated in Windows-land. Just to be sure that file wasn’t pulling my leg, I threw together some Python code and “histogrammed” the byte frequency of the file:
0x0A = 29
0x0D = 29
0x21 = 4830
0x22 = 4726
0x23 = 4800
0x24 = 4746
0x25 = 4772
0x26 = 4715
0x27 = 4832
0x28 = 4727
0x29 = 4816
0x2A = 4757
0x2B = 9509
0x2C = 4723
0x2E = 4728
0x2F = 4869
0x3A = 4801
0x3B = 4693
0x3C = 4827
0x3D = 4785
0x3E = 4814
0x3F = 4758
0x40 = 4712
0x5B = 4797
0x5C = 4773
0x5D = 4724
0x5E = 4799
0x5F = 4765
0x60 = 4789
0x7B = 4902
0x7C = 4790
0x7D = 4834
0x7E = 4722
0x80 = 4645
0x81 = 4845
0x82 = 4925
0x83 = 4712
0x84 = 4686
0x85 = 4719
0x86 = 4766
0x87 = 4855
0x88 = 4705
0x89 = 4718
0x8A = 4608
0x8B = 4829
0x8C = 4662
0x8D = 4805
0x8E = 4742
0x8F = 4681
0x90 = 4715
0x91 = 4710
0x92 = 4800
0x93 = 4775
0x94 = 4752
0x95 = 4804
0x96 = 4716
0x97 = 4641
0x98 = 4579
0x99 = 4666
0x9A = 4717
0x9B = 4688
0x9C = 4780
0x9D = 4729
0x9E = 4717
0x9F = 4755
0xA0 = 4693
0xA1 = 9572
0xA2 = 9423
0xA3 = 9610
0xA4 = 9605
0xA5 = 9555
0xA6 = 9452
0xA7 = 9695
0xA8 = 9481
0xA9 = 9300
0xAA = 9562
0xAB = 9653
0xAC = 9464
0xAD = 4702
0xAE = 9557
0xAF = 9500
0xB0 = 9631
0xB1 = 9324
0xB2 = 9501
0xB3 = 9559
0xB4 = 9453
0xB5 = 9411
0xB6 = 9647
0xB7 = 9506
0xB8 = 9584
0xB9 = 9470
0xBA = 9506
0xBB = 9542
0xBC = 9691
0xBD = 9483
0xBE = 9507
0xBF = 9535
0xC2 = 143203
0xC3 = 303418
Hmmmm… So it looks like file
is right about the CRLF stuff, but - not to disparage file
too much - I’ve had file
blow sunshine up my skirt a few too many times in the past to completely trust that this is really a well-formed UTF-8 file. And so, we need to “whip out” a somewhat obscure Linux command just to be sure…
Many of you may never have installed the Linux “moreutils” package (see here for “moreinfo” on “moreutils”). Based on the name, you can probably tell that it contains a whole bunch more Unix utilities… and among them is a little gem called isutf8
.
isutf8
does pretty much what you would expect… it’ll tell you if a file is, indeed, well-formed UTF-8.
On most sane Linux distros, you can install the moreutils package using a simple sudo apt-get install moreutils
.
Running isutf8
is amazingly complex:
localhost ~ » isutf8 evilstuff.bin
localhost ~ »
“What the heck is that?,” I hear you cry, “It didn’t do anything!”
Welcome to Unix-land… around here, we tend to be a little terse. Deal with it… (i.e. unless isutf8
bitches about the file NOT being UTF-8, you can assume that it’s UTF-8).
So! It’s UTF-8 text! I open it up in a UTF-8 capable editor aaaaaaand…
Gibberish… It’s frickin’ gibberish:
So I jumped through all of those hoops just to find out some idiot from (the fictional country of) Belguim decided to POST frickin’ gibberish as comment spam.
If you have any other notions about what this might be, please tweet me @tliston.
-TL
Tom Liston
Owner, Principal Consultant
Bad Wolf Security, LLC
Mastodon: @tliston@infosec.exchange
Twitter (yes, I know… X): @tliston
May 12, 2016