Baskerville

Baskerville is a complete pipeline to receives as input incoming web logs, either from a Kafka topic, from a locally saved raw log file, or from log files saved to an Elasticsearch instance. It processes these logs in batches, forming request sets by grouping them by requested host and requesting IP. It subsequently extracts features for these request sets and predicts whether they are malicious or benign using a model that was trained offline on previously observed and labelled data. Baskerville saves all the data and results to a Postgres database, and publishes metrics on its processing (e.g. number of logs processed, percentage predicted malicious, percentage predicted benign, processing speed etc) that can be consumed by Prometheus and visualised using a Grafana dashboard.

Baskerville Schematic

Baskerville Schematic

As well as an engine that consumes and process web logs, a set of offline analysis tools have been developed for use in conjunction with Baskerville. These tools may be accessed directly, or via two Jupyter notebooks, which walk the user through the machine learning tools and the investigations tools, respectively. The machine learning notebook comprises tools for training, evaluating, and updating the model used in the Baskerville engine. The investigations notebook comprises tools for processing, analysing, and visualising past attacks, for reporting purposes.

Baskerville Pipeline

Baskerville Pipeline

A brief overview of the current state of the Baskerville project is here, and the full in-depth documentation is available here.

You can also download some presentation slides here.