Learn2ban

Overview

The goal of this project is to introduce intelligent, adaptive features to the Fail2Ban system. While regular expressions work to frustrate most brute force attacks, they are still vulnearable to even minor changes in the pattern or content of requests sent during a Denial of Service attack. By simply rotating the User Agent field or tuning the timeframe of attacks, assailants are able to work around the traditional Fail2Ban system.

Using Machine Learning techniques, the system can make decisions about whether or not a given request is spurious. This is contingent upon a range of given information and increases the number of dimensions that can be considered at once. It is not necessary to simply match a recurring pattern; rather, based on a broad range of example data, Learn2Ban is capable of highlighting and capturing significant patterns of behaviour. These patterns act as finger prints which identify a DDoS attack.

Machine Learning in brief

In the case of Learn2Ban, our use of the term Machine Learning refers to a process of Classification. For any given set of data, we want to determine algorithmically which pieces of information represent genuine requests to our servers and which are malicious attempts to damage the system. Machine Learning works by training the system to correctly Classify requests, based on presenting it with the sample data of past DDoS attacks. The greater the number and variety of sample data presented, the broader the capability of the model will be. In other words, the more types of DDoS attacks the system sees, the more accurately it will be able to deflect them.

To this end, eQualit.ie has collected a wide range of example logs that detail the many different narratives of DDoS attacks. These logs are annotated, initially using Fail2Ban filters, to distinguish between GENUINE and MALICIOUS requests. The annotated logs are then presented to the Learn2Ban system and a model is constructed and tested against attacks it has never seen before. The model is measured for initial accuracy and then multiple experiments are run to refine it until the model is regarded as no longer improvable. The goal is to create a model that is very good at catching specific attacks and still be able to recognise new patterns when they first arise.

Learn2Ban

Learn2Ban has been designed to work as an additional filter for the Fail2Ban system, building on the existing protocols for blacklisting, banning and blocking attacks. This new extension of its capabilities does not affect the standard operability for Systems Administrators and Operations Systems. Once a suspect IP address has been identified, the predefined rules for (temporarily or permanently) banning the IP are enforced through the Fail2Ban system. Learn2Ban has the added advantage of reducing administration overheads, since it is no longer necessary for Admins to notice attacks - either through system alerts or, in a worst-case scenario, through systems going offline. In its place, Learn2ban proactively detects attacks and responds accordingly to the utilised patterns.

Implementation Details

At present, Learn2Ban is using the Support Vector Machine classification algorithm; however, we have designed the system so that it is completely agnostic of what classifier it uses. It can use any classifier supported by the Sklearn Python ML toolkit. This would include Neural Networks, Naive Bayes classifiers, Decision Tree learning and more.

To ensure that the Learn2Ban tool searches for the most relevant patterns, we have drawn from several academic and industry papers in order to identify the key features which are generally applied by a Distributed Denial of Service attack, specifically at the HTTP level. The Learn2Ban tool collects these features from all available log files and from these features we can construct a model which represents the general characteristics of a DDoS attack at the HTTP layer.

The primary features which Learn2Ban examines are, at present:

  • Average request interval - this feature essentially captures the rate of request. An abnormally high rate would be a strong indication that the associated IP is performing an attack
  • User Agent cycling - in order to frustrate attempts by traditional regex or pattern-based system, such as Fail2Ban, attackers will generally rotate the User Agent they make per each request. This has the effect of limiting the efficacy of a hard-coded regex. By considering this behaviour in the context of the associated IP’s overall behaviour, Learn2Ban aims to see through this obfuscation.
  • Page depth request - human users will generally browse through a site in ways which lead to a deeper link exploration than an automated bot is inclined to do. Thus if an IP is generally only requesting certain front-end pages rather than digging into the site, this feature will highlight that behaviour.
  • HTML to Image/CSS ratio - again, unlike a human user, bots will not usually cause requests to load additional content such as images or CSS, since the request is neither focused widely nor browser-based in general.
  • Variance of request interval - in this case, bots are again unmasked by their fundamental behaviour. Unless the attacker has the foresight to build in an algorithmic random-wait interval between requests, the request interval variance will once again expose the attack. It is quite difficult to model the stochastic behaviour over a human user, so given a large enough sample set. a pattern will generally emerge in the behaviour of a bot.
  • Payload size - if an IP is consistently requesting large files - such as PDFs - this is a good indication that it is an intentional attack to waste server resources.

These features work together to build up a pattern of behaviour that attempts to delineate between legitimate website users and algorithmic bots designed to bring down a site or a server. To a certain degree, this acts like a reverse Turing test. DDoS attacks cannot generally hide their fundamental behaviour, relying instead on sheer volume of requests to achieve their goals. Thus by utilising the pattern recognition and Machine Learning capabilities of Learn2Ban, it is possible to further weaken the efficacy of most DDoS attacks. This method of handling DDoS attacks is also evolutionary, in so far as whenever the attacks adapt to a new degree of sophistication, so too does the model to frustrate them.

Future Work

Some of the next steps for this project we would like to develop

  • new models based on more diverse examples of DDoS attacks
  • an extension to the set of features we have developed to cover more variations in attack
  • a test for new classifiers
  • a way to facilitate communication between instances of Learn2Ban, allowing data on current attacks to spread to other vulnerable nodes