BotHound

BotHound is an automatic attack detector and botnet classifier. Its purpose is to create a historical classification of the attacks with detailed information regarding the attackers (country-based, time-based, etc.).

In brief, what BotHound does is:

  • detecting anomalies in traffic
  • clustering the IPs into groups of botnet or non-botnet IPs, and
  • classifying the attacks.

More in detail, BotHound detects and classifies the attacks using the anomaly-detection and machine-learning tool Grey Memory. When GreyMemory detects an anomaly, the BotHound attack classifier reacts to the detection and starts gathering live information from the Deflect network in order to compute a behaviour vector for all visitors of the network. BotHound groups the clients’ IPs in different clusters in order to profile the group of malicious visitors and then feeds all the behaviour vectors of bot IPs into a classifier to detect if the botnet has a history of attacking the Deflect network in the past. Finally, BotHound generates a report based on the conclusions, sends it to Deflect’s sysops and gets feedback to improve its classification performance.

Definitions

  • Session - an IP and a vector of features values recorded and calculated during a period of the IP’s activity
  • Feature - an individual measurable property of a session
  • Incident - a set of sessions recorded during a time interval
  • Attack - a subset of sessions in an incident which was labeled as an attack
  • Botnet - a list of IPs that participated in similar attacks

Modules

Bothound scheme

Bothound scheme

  • Anomaly Detector - monitors anomalies in traffic and triggers incident recording in case of anomalies. Incident recording (gathering live information from the Deflect network) in BotHound is triggered by anomaly detector tool Grey Memory. Currently, Grey Memory is monitoring only one channel: the percentage of failed HTTP requests.
  • Session Computer - requests detailed logs from Elastic Search and Sniffles for each incident, creates sessions and calculates session features (inherited from Learn2ban. Feature selection is extremely important and has a huge impact on clustering. Currently feature selection is not automated and is performed manually as a part of the clustering process.
  • Clustering & Attack Detection - semi-supervised tool for labeling a subset of incident sessions as an attack. Currently clustering involves human interaction. The user visually examines a 10-dimensional feature space for every incident and selects features for clustering. Then, using different clustering methods (DBSCAN and K-means), the user visually chooses and labels the group of sessions (clusters) with a manually chosen attack ID. We use an approach which we call “double clustering”: we use DBSCAN in a first iteration and then, in a second iteration, we use K-means for dealing with suspected clusters from the first iteration.
  • Botnet Database - a database for all the confirmed attacks and botnets with all the correspondent metrics (IPs, features, history, analytics, comments).