GitHub - JacobHess03/DeepLog: DeepLog is a deep neural network model built using LSTM (Long Short-Term Memory) that processes system logs as natural language sequences. It's designed to automatically learn normal operational patterns and detect anomalies when new log entries deviate from these established patterns.

Thesis Work: Anomaly Detection in System Logs using Deep Learning

This project implements DeepLog, a deep neural network model based on LSTM (Long Short-Term Memory) for anomaly detection in system logs. The DeepLog approach treats system logs as natural language sequences, automatically learning patterns during normal operation and flagging anomalies when new logs deviate from these learned patterns. Description

System logs are a fundamental resource for debugging and monitoring the performance of a computer system. In this work, DeepLog:

Learns log patterns under normal conditions.
Detects anomalies when logs don't follow the learned patterns.
Allows for incremental and online model updates to adapt to emerging new patterns.
Generates workflows from logs to effectively diagnose anomalies and analyze their causes.

The model is implemented in PyTorch and includes functionalities for:

Loading and preprocessing logs (from local files or an S3 bucket).
Training the LSTM model in a distributed manner (supports multi-GPU/multi-node training).
Saving and loading the trained model.
Functions to serialize input and output for deployment, for example, on Amazon SageMaker.

Project Structure

Imports and Configuration: Handles logging and sets the logging level for debugging.
Model Definition (class Model): Implements an LSTM model with linear layers for detecting anomalous events in logs.
Generate Class: Responsible for loading logs, which can be read from local files or an AWS S3 bucket.
    generate(): Converts log sequences into datasets suitable for training.
    init_line() and readline(): Manage log access depending on the execution mode.
Data Loader and Auxiliary Functions:
    _get_train_data_loader(): Prepares the DataLoader for training.
    _average_gradients(): Auxiliary function for distributed training.
Training and Saving Functions:
    train(): Main function to train the model.
    save_model(): Saves the model and necessary information for restoration.
    model_fn(): Function for loading the model during inference.
Inference Functions:
    input_fn(): Deserializes input data.
    predict_fn(): Predicts the next event and determines if it's anomalous.
    output_fn(): Serializes the prediction output.
Main Script: Uses argparse to define parameters (batch size, number of epochs, window size, etc.) and starts the model training.

Requirements

Python 3.7+
PyTorch
boto3 (if using logs from AWS S3)
Argparse and other standard Python modules

Execution Instructions

Configure the environment:
Ensure all dependencies are installed and that, if using logs from AWS S3, credentials are correctly configured.

Start training:
Execute the main script, passing the desired parameters. For example:
Bash

python deep_log.py --batch-size 64 --epochs 50 --window-size 10 --input-size 1 --hidden-size 64 --num-layers 2 --num-classes <NUM_CLASSES> --num-candidates <NUM_CANDIDATES> --local True

Note: This thesis work involved studying, modifying, and adapting the authors' original model, then evaluating the results of these changes. MIT License

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
data/OpenStack		data/OpenStack
openstack_result		openstack_result
.gitattributes		.gitattributes
README.md		README.md
__init__.py		__init__.py
deeplog.py		deeplog.py
predict.py		predict.py
preprocess.py		preprocess.py
test_abnormal		test_abnormal
test_normal		test_normal
train		train
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

About

Uh oh!

Releases

Packages

Languages

JacobHess03/DeepLog

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages