Monday, November 18, 2013

Virus Bulletin 2013 @ Berlin , Germany : My First International Security Conference As a Speaker


Virus Bulletin 2013 , one of the top computer security conferences was held this year in Berlin , Germany from 2nd - 4th October 2013 , and I was invited to speak on one of the research that I did on Behavioural method to detect HTTP botnets. Detection method precisely focuses on couple of key areas :
  • How do we differentiate traffic generated by automated clients from the human initiated traffic.
  • How do we examine and differentiate outbound HTTP traffic generated from the legitimate sources like web browsers , from the malicious botnet Command and Control traffic over HTTP protocol
  • Monitoring of the idle host based on the volume of the traffic generated and determining its suspiciousness on the basis of repetitive connections to C&C server.
I also discussed about the algorithmic approach and how do we apply on the network perimeter to detect botnet activity with high degree of accuracy . Entire blog post can be found over here and over here on the McAfee Labs blogs as well . Pretty interesting read and result oriented approach .

This was one of the very few presentations audience really found interesting . Although it was on the final day of the conference and the audience was relatively less then previous two days which was kind of little disappointing . 3rd Oct evening was the fantastic drinks party followed by a gala dinner and sensational German dance performances in the evening ended the day. So lesser audience on the last day of the conference wasn't too surprising to me.
On the other side , several presentations on the technical stream was interesting . One of them that got my attention was a talk from Microsoft guys , revealing the major problem that the AV industry could have started facing today . The presentation was about attacks on the AV Automation systems itself. Today , industry heavily rely on telemetry data and sample sharing between the vendors to be able to quickly respond to the 0 day threats. AV automation systems are primarily builded to auto-classify lakhs of malware samples received everyday and generate automatic signatures . Attackers have now started to probe these automation systems to find the loopholes in automatic signature generation  and exploit them by injecting the specifically crafted clean files into the telemetry system and poisoning them . Imagine the mess that it can cause due to significant volume of such crafted files received via telemetry.


























Entire presentation from Microsoft speakers can be viewed here : Working together to defeat attacks against AV automation

One of the other presentations that I found interesting was from F-secure titled "Statistically effective protection against APT attacks " talking about the research that they did on several available exploit mitigation methods and which one is most effective in preventing exploits from executing shellcode . Research talks about how effective are the mitigation methods like Application sandboxing , Client application hardening , Memory handling mechanisms for exploit prevention and Network hardening and which one is most effective against some of the in the wild exploits . Kind of useful research .

Overall , it was a fantastic conference and got the opportunity to socialize and meet lot of people out there sharing ideas and talking about lot of stuff ..

All the slides of VB 2013 presentations are available here

Saturday, November 2, 2013

Periodic Command Pull From C&C Servers Paves The New Way to Detect Botnets


HTTP has been predominantly used by recent botnets and APTs as their primary channel of communication with the Command and Conrol servers . This number has significantly shown increase in the last few quarters. One of the research shows  that more than 60% of the botnets use HTTP protocol to achieve C&C communication and the number has kept increasing. Below distribution shows the popularity and dominance of HTTP protocol among the top botnet families.














Couple of  apparent reasons to use HTTP as a primary channel for C&C is that it cannnot be blocked on the network since it carries major chunk of the internet traffic today . Secondly , it is not hard , but nearly impossible to differentiate the legitimate HTTP traffic from the malicious one on the network perimeter unless you have the known signatures for it. This makes HTTP even more popular among malware authors.

Industry is well aware of the fact that traditional signature based approaches are no longer a solution to today's sophistication level of the threats and limitations with this has driven the shift of focus from signatures to behaviour. But we need to answer the question : What are those suspicious behaviours we should look for on the network ?

Before we answer that question , I'd like to throw some light on the typical lifecycle of botnet command and control over HTTP.

  • Botnets would typically connect to a small number of C&C domains . It may try to resolve too many domains over the short period of time when it does DGA kind of stuff , but once it successfully resolves a domain and connects, it will stay connected to the same domain for its lifetime
  • Once connected , it will send either HTTP GET / POST request to the specific resource ( URI ) of the C&C server as the registration / phone home communication.
  • It will execute the command received from the C&C server OR will either sleep for the fixed interval of time before connecting back again and pull the command from C&C.
  • Subsequently, it will connect to the server at fixed / stealth intervals and will keep pulling commands or might send keepalive messages to announce its existance periodically.
What we learn out of this behaviour is that botnets typically work in a “pull” fashion; they continuously fetch commands from the control server, either at fixed intervals or at stealth level . Quick example to demonstrate this behaviour is Zeus . Below is traffic snapshot of Zeus communicating with control server every 6 seconds.

Zeus C&C communication over the network








We realize that machine infected with botnet communicating with the control server periodically is the automated traffic . Since this behaviour can also be exhibited from the legitimated software and websites, another questions comes up here : How do we differentiate browser / human initiated traffic from the automated traffic ?  Certain facts that we can definitely rely on : 
  • It is abnormal for most users to connect to a specific server resource repeatedly and at periodic intervals. There might be dynamic web pages that periodically refresh content, but these legitimate behaviors can be detected by looking the server responses.
  • The first connection to any web server will always have response greater than 1KB because these are web pages. A response size of just 100 or 200 bytes is hard to imagine under usual conditions.
  • Legitimate web pages will always have embedded images, JavaScript, tags, links to several other domains, links to several file paths on the same domain, etc. These marks the characteristics of the normal web pages.
  • Browsers will send the full HTTP headers in the request unless it is intercepted by MiTM tools that can modify / delete headers.
All of the above facts allows us to think about the specific behaviour we can look on the network : Repetitive connections to the same server resource over HTTP protocol

Assume that we choose to monitor a machine under idle conditions–when the user is not logged on the machine–we can distinguish botnet activity with a high level of accuracy. We think about monitoring the idle host because that's the period where the traffic volume is less. It is kind of relatively easy to identify idle host due to the nature of the traffic that it generates ( usually version updates , version checks , keepalives etc ..). We 'd never expect the idle host to generate traffic to yahoo.com or hotmail.com.

Under these conditions , if the machine is infected with Spyeye botnet , traffic will look like this:






Notice that Zeus ( in the previous screenshot ) / Sypeye connects to one control domain and keeps sending HTTP POST every 6 and 31 seconds respectively to a specific server resource. Algorithmically, while idle, we’d deem a host's activity suspicious when :
  1. The number of unique domains a system connects to is less than a certain threshold
  2. The number of unique URIs a system connects to is less than a certain threshold
  3. For each unique domain, the number of times a URI is repetitively connected to is greater than a certain threshold
Assuming the volume of traffic from the host is less, If we take the preceding conditions in a window of say two hours, we might come up with following:
  1. Number of unique domains = 1 (less than the threshold)
  2. Number of unique URIs connected = 1 (less than the threshold)
  3. For each unique domain, the number of times a unique URI is repetitively connected to = 13 (greater than threshold)
 The approach however, does not mandates that repetitive activity should be seen at these fixed intervals. If we choose to monitor within a larger window, we could detect more stealthy activities. The following flowchart represents a possible sequence of operations.
















The first few checks are important to determine whether the host isn't talking too much. First, Total URI > threshold determines that we have enough traffic to look into. Next, Total Domain access >/= Y determines that the number of domains accessed is not too large. The final check is to see if Total unique URIs < Z. The source ends up on the suspicious list if we believe it has generated repetitive connections.

For instance, if the Total URIs = 30, Total Domain access = 3, and Total Unique URI accessed = 5, we guarantee a repetitive URI access from the host. Now if the number of repetitive accesses to any particular URI crosses the threshold (for example, 1 URI accessed 15 times within a window), we can further examine the connection and apply some of heuristics to increase our confidence level and eliminate false positives. Some heuristics we can apply:
  • Minimal HTTP headers sent in the request
  • Absence of UA/referrer headers
  • Small server responses and  lacking structure of usual web page
  • Domain registration time and perhaps reputation as well.
Let’s look at an example of SpyEye sending minimal HTTP headers without a referrer header:









I implemented the proof of concept for this approach and I could detect the repetitive activity with relative ease.

















Applying this method over several top botnet families exhibiting similar behaviour , I could detect them with medium to high level of confidence.















Behavioral detection methods will be the key to detecting next-generation threats. Given the complexity and sophistication of the recent advanced attacks, such detection approaches can address threats proactively–without waiting for signature updates–and will prove to be much faster.