Skip to content

DeepLog is a deep neural network model built using LSTM (Long Short-Term Memory) that processes system logs as natural language sequences. It's designed to automatically learn normal operational patterns and detect anomalies when new log entries deviate from these established patterns.

Notifications You must be signed in to change notification settings

JacobHess03/DeepLog

Repository files navigation

Thesis Work: Anomaly Detection in System Logs using Deep Learning

This project implements DeepLog, a deep neural network model based on LSTM (Long Short-Term Memory) for anomaly detection in system logs. The DeepLog approach treats system logs as natural language sequences, automatically learning patterns during normal operation and flagging anomalies when new logs deviate from these learned patterns. Description

System logs are a fundamental resource for debugging and monitoring the performance of a computer system. In this work, DeepLog:

Learns log patterns under normal conditions.
Detects anomalies when logs don't follow the learned patterns.
Allows for incremental and online model updates to adapt to emerging new patterns.
Generates workflows from logs to effectively diagnose anomalies and analyze their causes.

The model is implemented in PyTorch and includes functionalities for:

Loading and preprocessing logs (from local files or an S3 bucket).
Training the LSTM model in a distributed manner (supports multi-GPU/multi-node training).
Saving and loading the trained model.
Functions to serialize input and output for deployment, for example, on Amazon SageMaker.

Project Structure

Imports and Configuration: Handles logging and sets the logging level for debugging.
Model Definition (class Model): Implements an LSTM model with linear layers for detecting anomalous events in logs.
Generate Class: Responsible for loading logs, which can be read from local files or an AWS S3 bucket.
    generate(): Converts log sequences into datasets suitable for training.
    init_line() and readline(): Manage log access depending on the execution mode.
Data Loader and Auxiliary Functions:
    _get_train_data_loader(): Prepares the DataLoader for training.
    _average_gradients(): Auxiliary function for distributed training.
Training and Saving Functions:
    train(): Main function to train the model.
    save_model(): Saves the model and necessary information for restoration.
    model_fn(): Function for loading the model during inference.
Inference Functions:
    input_fn(): Deserializes input data.
    predict_fn(): Predicts the next event and determines if it's anomalous.
    output_fn(): Serializes the prediction output.
Main Script: Uses argparse to define parameters (batch size, number of epochs, window size, etc.) and starts the model training.

Requirements

Python 3.7+
PyTorch
boto3 (if using logs from AWS S3)
Argparse and other standard Python modules

Execution Instructions

Configure the environment:
Ensure all dependencies are installed and that, if using logs from AWS S3, credentials are correctly configured.

Start training:
Execute the main script, passing the desired parameters. For example:
Bash

python deep_log.py --batch-size 64 --epochs 50 --window-size 10 --input-size 1 --hidden-size 64 --num-layers 2 --num-classes <NUM_CLASSES> --num-candidates <NUM_CANDIDATES> --local True

Note: This thesis work involved studying, modifying, and adapting the authors' original model, then evaluating the results of these changes. MIT License

Copyright (c) 2019 Yifan Wu

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

About

DeepLog is a deep neural network model built using LSTM (Long Short-Term Memory) that processes system logs as natural language sequences. It's designed to automatically learn normal operational patterns and detect anomalies when new log entries deviate from these established patterns.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages