C. Dietrich, C. Rossow, N. Pohlmann.,
“CoCoSpot: Clustering and Recognizing Botnet Command and Control Channels using Traffic Analysis”.
A Special Issue of Computer Networks On “Botnet Activity: Analysis,
Detection and Shutdown”,
Elsevier, July 2012.
We present CoCoSpot, a novel approach to recognize botnet command and control channels solely based on traffic analysis features, namely carrier protocol distinction, message length sequences and encoding differences. Thus, CoCoSpot can deal with obfuscated and encrypted C&C protocols and complements current methods to fingerprint and recognize botnet C&C channels. Using average-linkage hierarchical clustering of labeled C&C flows, we show that for more than 20 recent botnets and over 87,000 C&C flows, CoCoSpot can recognize more than 88% of the C&C flows at a false positive rate below 0.1%.
A defining characteristic of a bot is its ability to be remote-controlled by way of command and control (C&C). Typically, a bot receives commands from its master, performs tasks and reports back on the execution results. All communication between a C&C server and a bot is performed using a specific C&C protocol over a certain C&C channel. Consequently, in order to
instruct and control their bots, bot masters – knowingly or not – have to define and use a certain command and control protocol. The C&C protocol is thus considered a bot-inherent property. Historically, bots used cleartext C&C protocols, such as plaintext messages transmitted using IRC or HTTP. However, a C&C channel relying on a plaintext protocol can be detected
reliably. Methods such as payload byte signatures as shown by Rieck et al. or heuristics on common C&C message elements such as IRC nicknames as proposed by Goebel and Holz are examples for such detection techniques. To evade payload-based detection, botnets have evolved and often employ C&C protocols with obfuscated or encrypted messages as is the case
with Waledac , Zeus , Hlux , TDSS , Virut  and Feederbot , to name but a few. The change towards encrypted or obfuscated C&C messages effectively prevents detection approaches that rely on plaintext C&C message contents. In this article, we take a different approach to recognize C&C channels of botnets and fingerprint botnet C&C channels based on traffic analysis properties. The rationale behind our methodology is that for a variety of botnets, characteristics of their C&C protocol manifest in the C&C communication behavior. For this reason, our recognition approach is solely based on traffic analysis. As an example, consider a C&C protocol that defines a specific handshake – e.g., for mutual authentication – to be performed in the beginning of each C&C connection. Each request and
response exchanged during this handshake procedure conforms to a predefined structure and length, which in turn leads to a characteristic sequence of message lengths. In fact, we found that in the context of botnet C&C, the sequence of message lengths is a well-working example for
traffic analysis features. Table 1 shows the sequence of the first 8 messages in four Virut C&C flows and two Palevo2 C&C flows. Whereas Virut exhibits similar message lengths for the first message (in the range 60-69) and a typical sequence of message lengths at positions five to eight, for Palevo, the first three message lengths provide a characteristic fingerprint.