It started innocently enough, a member of staff opened an email from their personal email account with a subject line “Confirmation of payment”. Unbeknownst to the unfortunate recipient the attachment contained malware from the Dyre banking trojan family (specifically Troj/Dyreza-FP). One week, 500 hours of staff time, and 128 infected systems later we’ve learnt a lot of lessons the hard way. Learn from our experience (we have).
- 12:27 The attachment is opened and its contents executed
- 12:44 The first reports of a suspicious email start to come in
- 13:12 Major security incident declared
- 13:23 The first customer communication notifies of the nature of the threat
This was a virus our anti-virus solution (Sophos) didn’t detect (nor did 54/57 vendors monitored by virustotal) so the first we knew about it was user reports that they’d received a suspicious email. So the first challenges were to answer the following questions;
- What does this malware do?
- How does it spread?
So, in the absence of helpful information like the virus definitions we get from Sophos, how do you find out what a virus does …?
… You have to run it.
But you can’t do that, it’ll infect your network!
Enter the sandbox
The clock is ticking, something is happening to systems and data on your network, and you need to know what. As we learnt (the hard way) this is a lot harder than it looks. We easily setup a physical host (this virus won’t run on a host with a single CPU, hence virtual machines might not be effective). We ran the malware, and used procmon.exe from the Microsoft Sysinternals Suite to record what the virus did. How did that go?
- We ended up with a crap load of data which we weren’t really equipped to understand.
- Some of the conclusions we drew were not accurate, and led to wasted time while we followed some red herrings (for example that it writes to dll files in C:\Windows\System32, which later we confirmed it did not).
We did have more luck with free online options;
- www.virustotal.com (as well as checking AV definitions it also has behavioural information)
- www.hybrid-analysis.com (hosted by Payload Security it uses VxStream Sandbox and Hybrid Analysis to give a comprehensive report including packet capture, file and registry activity)
- www.malwr.com (based on Cuckoo Sandbox, this also gives network analysis, registry and file activity, however the report isn’t quite as fully featured or as easy to read as the VxStream output)
- 15:53 We have analysis from Virustotal and Hybrid Analysis and our own home grown analysis, and thanks to that information we’re now tracking infected systems through systems connecting to 18.104.22.168 (a Dyre botnet controller).
- 17:04 It’s been just over 4.5 hours since we were infected. We now have a virus definition from our vendor, Sophos, which we start pushing out immediately. We’re also using PowerShell to delete messages from Exchange mailboxes, and preparing loan systems to deploy to affected users. The list of affected machines stands at 27, and growing.
Managing an outbreak
As we found out, there are a number of logistical challenges to managing a growing list of infected machines, one which our Service Desk software (LANDesk) wasn’t really capable of managing. We ended up reverting to Excel to list the affected systems, their status (infected, suspected infected, loan deployed, cleaned), the location, the user, the assigned analyst, priority etc.
Ideally you need all this information in an easily edited list viewable on a single page (not just ticket by ticket), easy to sort and filter, and able to handle multiple simultaneous editors. In the end we ended up with 128 systems listed in an Excel spreadsheet, with data aggregated from multiple sources (LANDesk Service Desk, KACE, Sophos Management Console).
The initial communication was sent within an hour of infection, and we sent a further two updates as more information came to light, including advice for those who may have been affected. This was particularly pertinent in this case as the risk was not to data on the University network, but individual’s personal internet banking credentials.
What we did learn from this was that any advice to users on mitigating information security threats should be carefully validated. If you give advice which you either have to change, or later on render obsolete, then that has a negative impact on your integrity as a service provider, so make sure anything you say about the impact of malware or steps to take to deal with it checked before they are sent out.
The response from our customer base was good, the majority of our user base were happy with the speed of our response, and they appreciated the extent to which we had to prioritise this incident over normal service requests and incidents.
We do know that our recovery option in this case led to some disruption for users who lost application settings (e.g. wireless profiles, browser settings etc.) and we’ll look at that when we do our ‘lessons learnt’ exercise.
We now have a lot of thinking to do about security incident response planning, damage limitation, gathering and analysing evidence and data and system recovery. There are a number of sources we’ll be calling on (just Google “security incident response”) and some of the lessons learnt will feed into the development of an Information Security Management System we’re working under the framework of the ISO 27001:2013 standard.