Cryptojacking & The Wild Wild West of the Bitcoin Space
  2 years, 11 months ago
If you've landed on this page because you saw a strange message on a completely different website then followed a link to here, drop a note to the site owner and let them know what happened. If, on the other hand, you're on this page because you're interested in reading about the illicit use of cryptomining on compromised websites and how through fortuitous circumstances, I now own coinhive.com and am doing something useful with it, read on.
You know how people don't like ads? Yeah, me either (at least not the spammy tracky ones that invade both your privacy and your bandwidth), but I also like free content on the web and therein lies the rub; how do content producers monetise their work if they can't put ads on pages? Well naturally, you "Monetize Your Business with Your Users' CPU Power" which was Coinhives's modus operandi. That's a link to the last snapshotted version on archive.org because if you go to coinhive.com today, you'll see nothing. The website is dead. However, it's now owned by me and it's just sitting there doing pretty much nothing other than serving a little bit of JavaScript. I'll come back to that shortly, let's return to the business model of Coinhive:
So, instead of serving ads you put a JavaScript based cryptominer on your victi... sorry - visitors - browsers then whilst they're sitting there reading your content, you're harvesting Monero coin on their machine. They're paying for the CPU cycles to put money into your pocket - ingenious! But there were two massive problems with this and the first one is probably obvious: it's a sleazy business model that (usually unknowingly) exploits people's electricity bills for the personal gain of the site operator. It might only be exploiting them a little bit (how much power can an in-browser JS cryptominer really draw?), but it still feels super shady. The second problem is that due to the anonymous nature of cryptocurrency, every hacker and their dog wanted to put Coinhive on any sites they were able to run their own arbitrary JavaScript on.
I'll give you a perfect example of that last point: in Feb 2018 I wrote about The JavaScript Supply Chain Paradox: SRI, CSP and Trust in Third Party Libraries wherein someone had compromised a JS file on the Browsealoud service and injected the Coinhive script into it. In that blog post I included the code Scott Helme had de-obfuscated which showed a very simple bit of JavaScript, really just the inclusion of a .js file from coinhive.com and the setting of a 32-byte key. And that's all an attacker needed to do - include the Coinhive JS, add their key and if they wished, toggle a few configurations. That's it, job done, instant crypto!
And then Coinhive was gone. (Also - "the company was making in an estimated $250,000 per month" - crikey!) The site disappeared and the domain stopped resolving. Every site that had Coinhive running on it, either by the design of the site owner or at the whim of a cryptojacker, stopped mining Monero. However, it was still making requests to the domain but without the name resolving anywhere, the only signs of Coinhive being gone were errors in the browser's developer tools.
In May 2020, I obtained both the primary coinhive.com domain and a few other ancillary ones related to the service, for example cnhv.co which was used for their link shortener (which also caused browsers to mine Monero). I'm not sure how much the person who made these available to me wants to share so the only thing I'll say for now is that they were provided to me for free to do something useful with. 2020 got kinda busy and it was only very recently that I was finally able to come back to Coinhive. I stood up a website and just logged requests. Every request resulted in a 404, but every request also went into a standard Azure App Service log. And that's where things got a lot more interesting.
Firstly, the high-level stats and as I was routing through Cloudflare, it was super easy to look at the volume of requests first:
That's a substantial number of requests; peaking at 3.63M in a day for a service that doesn't even exist anymore. But the number that really impressed me (if "impressed" is the right word here...) was the number of unique visitors per day:
Daaaamn! More than 2 years after Coinhive was gone and the miner is still embedded in enough places to be serving more than 100k unique visitors per day. Whoa. I wonder where they're all coming from?
Just for context, Have I Been Pwned (which sees about 200k visitors per day) has a geographical distribution as follows:
I'm loath to draw stereotypical conclusions about the association of hackers to Russia and China, but it's a bit inescapable here. Later on, when I analysed the various URLs that were injecting Coinhive, there was (anecdotally) a strong presence of Russian and Chinese websites.
Moving on, here's a typical log entry captured once I stood up the empty website:
#Fields: date time s-sitename cs-method cs-uri-stem cs-uri-query s-port cs-username c-ip cs(User-Agent) cs(Cookie) cs(Referer) cs-host sc-status sc-substatus sc-win32-status sc-bytes cs-bytes time-taken
2021-03-27 02:59:32 COINHIVE GET /lib/coinhive.min.js X-ARR-LOG-ID=061e55e4-6380-4e88-a7f6-d4ea53071b71 443 - 172.69.166.8 Mozilla/5.0+(Linux;+Android+8.0.0;+ATU-LX3)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/88.0.4324.181+Mobile+Safari/537.36 ARRAffinitySameSite=0e2a05ba3fa8945356c52a5d6a03ef6078571b96359db7d489b1580040a9fdec https://lookedon.com/ coinhive.com 404 0 2 470 1424 15
The JS file being requested is how Coinhive was usually embedded in a site. The IP is Cloudflare's (remember, they're a reverse proxy so it's their IP the website receives) and the response code is 404 as there was no resource to return. The referrer is the interesting one because this tells us where the script was requested from, in this case a website at lookedon.com. A quick glance at that site at the time of writing and yeah, that's a cryptominer in the HTML source:
Before we go any further delving into the ins and outs of cryptominers, I strongly recommend watching this video by Hugo Bijmans and Christian Doerr from the Delft University of Technology presenting at the USENIX Security Symposium a couple of years ago. It's only 21 minutes long and it gets straight to the point:
There's also a much more comprehensive paper from Hugo and colleagues titled Inadvertently Making Cyber Criminals Rich: A Comprehensive Study of Cryptojacking Campaigns at Internet Scale. If you want to go much deeper, have a good read through this. (Incidentally, I've been in touch with Hugo and we're discussing how to best use the data I'm logging for both research and defensive purposes.)
I pulled down several days of logs beginning 2021-03-27 and imported them into a DB where I could analyse things more easily (8.9M rows in total). I looked firstly at the content that was being requested (all subsequent figures exclude the cnhv.co link shortener domain unless otherwise stated):
The prevalence of the JavaScript miners is no surprise, and the Delft guys talk about the role of the WebAssembly (.wasm) in their paper. There were references to WASM in the original Coinhive script, but of course nobody has been loading that for quite some time so I can only assume it's being embedded by other means. The logs don't have a referrer on any of the WASM entries either so it's not clear where the requests are originating from:
2021-03-27 02:59:42 COINHIVE GET /lib/cryptonight.wasm X-ARR-LOG-ID=3aebf86b-116e-4ce6-a8a5-2c3d9911a2c4 443 - 162.158.167.112 Mozilla/5.0+(Windows+NT+10.0;+Win64;+x64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+apollohobo/1.0.0+Chrome/58.0.3029.110+Electron/1.7.9+Safari/537.36 - - coinhive.com 404 0 2 470 1205 15
I put this to Hugo and he made the following comment:
When the coinhive.min.js script was downloaded from the original Coinhive server, it analyzed the target system, determined the number of CPU threads available and afterwards downloaded that WASM file to be distributed amongst a number of WebWorkers (often the same number of threads available). That WASM file is where the actual mining is taking place, as WebAssembly can be executed almost as fast as native machine code. The name of that particular WASM file changed a few times in the past, which explains why there is a variation of Wasm files requested. Some websites serve the coinhive.min.js script from their own server (to prevent detection), but the WebWorkers will still request the WASM file from the original Coinhive domain, as the JS script has barely changed.
Next up is the referrer, but due to there being many different paths on the same site serving the miner, I grouped by host and reported on that:
I found more than 41k unique domains in the referrer header. That's many tens of thousands of websites still attempting to embed Coinhive. Taking a look at webtruyenonline.com (the root of the site featured heavily in the referrers), the site is in Vietnamese but upon visiting it, I couldn't find any sign whatsoever of Coinhive. Not in the page source, not requested from a downstream dependency and not on other deeper links in the referrers either. Looking closer at the log entries, a pattern emerged with the user agents, so I filtered those out and grouped them:
They're all mobile devices. I changed the UA string I was sending to the first one above and reissued the request, but there was still no Coinhive in the HTML response. Is it tailoring the response based on the location of the requesting IP? I connected NordVPN to Vietnam and tried again. Nothing. I even proxied my own iPhone through Fiddler then VPN'd to Vietnam and still nothing. That's a totally standard UA for Safari on iOS 14.4.1 too, so no clues there as to what circumstances are causing these requests. Hugo also had some thoughts on this one:
During our follow-up research on cryptojacking, we discovered that 1.4M MikroTik routers were serving cryptojacking scripts as they were routing Web traffic, geographically focussed on Brazil and Indonesia. It could be that a Vietnamese MikroTik router is still infected and somehow manages to inject the script into that particular (popular) website.
I'll take this opportunity to make a quick call back to Here's Why Your Static Website Needs HTTPS. Securing the transport layer isn't just about protecting sensitive information, it's also about protecting the integrity of the content and assuming Hugo is right here, this is a beautiful demonstration of the necessity of HTTPS everywhere. More on this in their paper titled Just the Tip of the Iceberg: Internet-Scale Exploitation of Routers for Cryptojacking.
The next host name at aahora.org, however, was a totally different story and sure enough, there's Coinhive:
I don't intend to keep going through the top hits, the point is that the presence of requests in the logs doesn't always map cleanly to the presence of Coinhive on the site in the referrer. I can think of a variety of reasons for this but suffice to say there were still a heap of sites attempting to embed a cryptominer in the browser.
There was just one more thing I was interested in - what can we tell about concerted cryptojacking campaigns based on the data? I mean how many of these log entries were from sites running the same Coinhive key which would indicate the same actor behind each site? So, I wrangled up a little crawler script and started scraping each unique site looking for the presence of Coinhive. It's a basic one (it merely looks for a Coinhive key in the HTML source so misses keys embedded in another JS file or an iframe), but it's sufficient for my requirements here. I crawled the first 375k URIs based on the most prevalent in the logs, pulled back the keys and recorded the number of unique URIs and host names they appeared on. I found over 3k unique keys, here's the top 10:
Doing a bit of Googling for the keys, I found 2 interesting things and the first one relates to the second key ("FgW..."):
Well, that supports Hugo's earlier thesis. As prevalent as that key may be having appeared on 103 unique hosts, it could just be a single infected router. Further supporting the MikroTik theory was that every single URL was served unencrypted over HTTP:
In this particular case there was a heavy bias towards "sahara" domains. These relate to Subarta Roy, a name I only knew after recently watching the Netflix series Bad Boy Billionaires, of which he is apparently one. Is there someone within the corrupt billionaires org running an infected router that's cryptojacking all their non-secure requests? I suspect that's just one of many curious mysteries within the data set.
That third key - "w9W..." - has a Google hit that links directly back to the Dutch research paper:
It's hard to draw any conclusion other than that there remains a large number of compromised websites out there hosting Coinhive even now that Coinhive is dead. They're no longer mining crypto, of course, however these sites are still embedding JavaScript on them from a domain I control so...
The modal is embedded directly from script served by the site I stood up. The link goes through to this blog post and the message can be easily dismissed by folks who just want to browse the site. I thought carefully about this approach; did I really want to modify other people's websites? I want site owners to know there's a high likelihood they've been compromised, problem is how do you do that otherwise when we're talking about tens of thousands of sites? I've done enough disclosures over enough years to know that even doing this once is painful, but if I was to write just a little bit of JavaScript instead...
Oh - and while we're here let's just let that sink in for a moment: I can now run whatever JavaScript I want on a huge number of websites. So, what could I do with JavaScript? I could change where forms post to, add a key logger, modify the DOM, make external requests, redirect to a malicious file and all sorts of other very nasty things. That's the power you hand over when you embed someone else's JS in your own site and that's precisely why we have subresource integrity. I linked earlier to my post on SRI as it related to the Browsealoud incident and this situation right here is as good a demonstration as ever as to why verifying the integrity of external assets is so important.
So, what's the fix? Well, it depends, and to answer that we need to go back to the preso from Delft uni guys, in particular this slide:
If the miner is owner-initiated then firstly, shame on them, and secondly, just remove it. If it was their conscious decision to embed the miner in the first place, they can then remove it of their own free volition. But much more prevalent than this is malicious activity, in fact it accounts for the vast majority of instances once you consider both third-party software compromises and compromises of the primary website itself:
The answer here is twofold, and the first part is obviously to either remove the compromised code or third-party library then, of course, fix the underlying vulnerability (i.e. change weak admin credentials that allowed it to be placed on the site in the first place). But we also have the technology to ensure the crypto code could never have run on the site in the first place, and that brings me back to CSPs.
Let's take one of the most commonly occurring websites in my logs, lookedon.com. This service enables you to "find interesting pictures" and like most modern web sites, it embeds all sorts of content types from different places by design. For example, there's a bunch of the following:
- Images loaded from pbs.twimg.com
- Script loaded from connect.facebook.net
- Fonts from fonts.gstatic.com
- Other content from its own domain, for example style sheets and more images
There are other content types loaded from other locations but for the sake of simplicity, let's just work with this list for now. Using a content security policy, I can define the content types and locations the browser is allowed to make requests to and anything that deviates from that known good state is blocked. Here's what a policy for the 4 points above would look like:
default-src 'none'; img-src 'self' pbs.twimg.com; script-src-elem connect.facebook.net; font-src: fonts.gstatic.com; style-src 'self';
The CSP is then returned as a response header for any pages on the website. I can emulate this behaviour by injecting it into the site with FiddlerScript then inspecting the response in Chrome's dev tools:
Nothing is allowed to load from anywhere (default-src 'none') then only the explicitly defined content types are allowed to load from the explicitly defined locations. As such, here's what happens when the browser is asked to embed the Coinhive script:
It's rejected. There's only one place this website can embed scripts from and that's connect.facebook.net. What we're doing here is creating an "allow list" of all the things that we as the website developer know to be good, and then allowing the browser to block everything else. Someone drops a crypto miner on your site via any of the methods mentioned above and wammo! Nothing happens 😊
But wait, there's more: wouldn't it be great to know when this happens? In the Browsealoud situation from earlier, I'd love to know as soon as a third-party service or library I depend on starts doing something unexpected. And I'd especially like to know if someone drops malicious script onto my own site, so let's tweak the CSP a bit:
default-src 'none'; img-src 'self' pbs.twimg.com; script-src-elem connect.facebook.net; font-src: fonts.gstatic.com; style-src 'self'; report-uri https://troyhunt.report-uri.com/r/d/csp/enforce;
All I've added here is the report-uri directive right at the end and this brings me to... Report URI!
Full disclosure: Report URI is a service Scott and I run together and there's a bunch of both free and commercial stuff. But having said that, what I'm about to delve into is equally applicable whether you use our service or stand up your own reporting endpoint; this is about the browser technology rather than a product pitch. (And yes, we know report-uri is deprecated but it's supported extensively unlike report-to which is still patchy, but we own the domain name anyway 🙂)
When you use the report-uri feature in a content security policy, violations in the policy can be automatically sent to an endpoint of your choosing. In the example above I have my own subdomain on report-uri.com and I've configured that host as the location my visitors' browsers should send any violation reports. Subsequently, when we test this on lookedon.com we see the following report sent by the browser:
Imagine that you as the person responsible for this site received the violation report above; it tells you that the document-uri (the root of lookedon.com) tried to embed the blocked-uri (the Coinhive JS file) and that it violated the effective-directive that is "script-src-elem". In other words, someone is trying to put Coinhive on your site, but the browser has blocked it and you've just had this report personally delivered to you. Neat! Put a CSP on your site and report violations, it's one of the best defences going for a whole bunch of typical web attacks.
Last thing: the code that now runs on coinhive.com is available on GitHub and I'm happy to take pull requests. I'd love it if folks could work out a way to serve something useful in response to the WASM requests and I'm certainly open to any suggestions re cleaning up that JS or doing anything else useful to help both individuals and site owners alike. And no, don't tell me to just put my own cryptominer in there 🙂
Edit: Less than half a day after publishing this, I received a pull request with a full WASM implementation that will show the same message to any browser directly calling a .wasm file. Massive thanks to Chad Baxter for doing this!