Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions .flake8
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
[flake8]
max-line-length = 120
extend-ignore = E203, W503
exclude =
.git,
__pycache__,
build,
dist,
.venv,
.mypy_cache,
.pytest_cache
max-complexity = 15
select = B,C,E,F,W,T4,B9
58 changes: 58 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
name: LogFlow CI

on:
push:
branches: [ main, dev/main ]
pull_request:
branches: [ main ]

jobs:
quality-checks:
name: Linting & Type Checking
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python 3.12
uses: actions/setup-python@v5
with:
python-version: "3.12"
cache: 'pip'
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -e ".[dev]"
- name: Formatting (Isort & Black)
run: |
isort --check-only .
black --check .
- name: Linting (Flake8)
run: |
flake8 .
- name: Type Checking (Mypy)
run: |
mypy .

unit-tests:
name: Unit Tests & Coverage
needs: quality-checks
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python 3.12
uses: actions/setup-python@v5
with:
python-version: "3.12"
cache: 'pip'
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -e ".[dev]"
- name: Run Tests
run: |
pytest tests --junitxml=test-report.xml --cov=logflow --cov-report=xml --cov-report=term
- name: Upload Test Results
uses: actions/upload-artifact@v4
with:
name: test-results
path: test-report.xml
if: always()
137 changes: 137 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
# Usually these files are written by a python script, and contain python code for
# the collector to be able to find dependencies.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
.pybuilder/
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
# For a library or package, you might want to ignore these files since the user is
# responsible for their own environments.
# .python-version

# pipenv
# According to pypa/pipenv#1181, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if you are working with a library, you should perhaps
# include Pipfile in your version control system.
# Pipfile.lock

# poetry
# Similar to Pipfile.lock, it is recommended to include poetry.lock in version control.
# poetry.lock

# pdm
# Similar to Pipfile.lock, it is recommended to include pdm.lock in version control.
# pdm.lock

# runtime-configuration-files
# Ignore these files for now since they are project-specific.
# .env
# .venv
# venv/
# ENV/
# env.bak/
# venv.bak/

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

# pytype static type analyzer
.pytype/

# Cython debug symbols
cython_debug/

# PyCharm
.idea/

# VS Code
.vscode/

# Project-specific ignores
.venv/
demo_logs/
experiment_logs/
logs/
30 changes: 30 additions & 0 deletions FUTURE_DEVELOPMENT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# LogFlow: Future Development & Roadmap

This document outlines the strategic enhancements and next-level features for **LogFlow**, intended to solidify its position as the premier logging solution for High-Performance Computing (HPC) and Machine Learning (ML).

---

## 1. JSON Structured Logging (Observability)
**Goal:** Make logs machine-readable for modern observability stacks (ELK, Splunk, Grafana Loki).
- **Implementation:** Add a `serialize=True` option to file sinks.
- **Benefit:** Allows for easy parsing, filtering, and aggregation of distributed training logs in centralized dashboards.

## 2. Automatic Experiment Context (ML Lifecycle)
**Goal:** Automatically inject ML-specific metadata into every log record.
- **Implementation:** Create a context manager/provider for `epoch`, `step`, and `experiment_id`.
- **Benefit:** Eliminates the need for manual `logger.bind` in every function; logs automatically carry their training context.

## 3. Rich Framework Interoperability
**Goal:** Deep integration with specialized ML frameworks.
- **Implementation:** Specialized adapters for TensorFlow (`absl`), PyTorch Lightning, and JAX.
- **Benefit:** Preserves framework-specific metadata (component names, internal timestamps) while maintaining a unified UI.

## 4. Dynamic Reconfiguration (Runtime Control)
**Goal:** Adjust log levels without restarting long-running training jobs.
- **Implementation:** Use Unix signals (e.g., SIGHUP) or a file watcher to reload configuration.
- **Benefit:** Allows developers to increase verbosity (e.g., INFO -> DEBUG) to diagnose a mid-training anomaly on the fly.

## 5. Performance Optimization (Zero-Copy)
**Goal:** Further reduce the impact of logging on the "Critical Path" of training.
- **Implementation:** Explore zero-copy serialization or specialized background threads for high-volume metric logging.
- **Benefit:** Ensures that logging overhead never impacts GPU utilization or training throughput.
87 changes: 87 additions & 0 deletions Jenkinsfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
pipeline {
agent any

environment {
// Local virtual environment within the Jenkins workspace for portability
VENV_PATH = "${WORKSPACE}/.venv"
VENV_BIN = "${VENV_PATH}/bin"
}

stages {
stage('Initialize') {
steps {
echo 'Creating Isolated Virtual Environment...'
sh "python3 -m venv ${VENV_PATH}"

echo 'Installing Dependencies in Editable Mode...'
sh "${VENV_BIN}/pip install --upgrade pip"
sh "${VENV_BIN}/pip install -e .[dev]"
}
}

stage('Linting') {
parallel {
stage('Black') {
steps {
sh "${VENV_BIN}/black --check logflow tests examples"
}
}
stage('Isort') {
steps {
sh "${VENV_BIN}/isort --check-only logflow tests examples"
}
}
stage('Flake8') {
steps {
// Clean up previous reports
sh "rm -f flake8.txt flake8-report.xml"
// Use || true to prevent the stage from stopping before the report is generated
sh "${VENV_BIN}/flake8 logflow tests examples --tee --output-file=flake8.txt || true"
// Convert report to JUnit XML
sh "if [ -f flake8.txt ]; then ${VENV_BIN}/flake8_junit flake8.txt flake8-report.xml; fi"
}
post {
always {
// Archive the report if it was generated
junit allowEmptyResults: true, testResults: 'flake8-report.xml'
}
}
}
}
}

stage('Type Check') {
steps {
sh "${VENV_BIN}/mypy logflow tests examples"
}
}

stage('Unit Tests') {
steps {
sh "${VENV_BIN}/pytest tests --junitxml=test-report.xml --cov=logflow --cov-report=xml:coverage.xml --cov-report=term"
}
post {
always {
// Archive and display JUnit test results
junit allowEmptyResults: true, testResults: 'test-report.xml'

// Display Coverage in Jenkins UI using Code Coverage API Plugin
recordCoverage tools: [[parser: 'COBERTURA', pattern: 'coverage.xml']]
}
}
}

}

post {
always {
echo 'LogFlow Pipeline Complete.'
}
success {
echo 'Project is healthy and ready for publication.'
}
failure {
echo 'Build failed. Please check linting or test failures.'
}
}
}
45 changes: 45 additions & 0 deletions RATIONALE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# LogFlow: Rationale & Architectural Comparison

## Executive Summary
**LogFlow** is a modern, multiprocess-safe logging library specifically engineered for High-Performance Computing (HPC) and Machine Learning (ML) environments. While general-purpose logging libraries exist, LogFlow bridges the gap between raw logging primitives and the specialized needs of distributed training (e.g., PyTorch DDP, TensorFlow Distribution).

---

## The Landscape: Existing Alternatives

| Library | Mechanism | ML/Distributed Suitability | Pros | Cons |
| :--- | :--- | :--- | :--- | :--- |
| **Standard `logging`** | Lock-based (Thread-safe) | **Low**. Requires complex `QueueHandler` setup for MP. | Zero dependencies, built-in. | Extremely verbose setup for MP; no built-in rank awareness. |
| **`Loguru`** | `enqueue=True` (Queue-based) | **Medium**. Great UI/UX, but no native rank/DDP logic. | Beautiful output, thread/MP safe, easy rotation. | Not aware of SLURM/DDP ranks; requires manual wrapping for ML. |
| **`Concurrent-Log-Handler`** | File Locking (`fcntl`/`flock`) | **Low**. Slow on network filesystems (NFS). | Simple to drop-in for standard logging. | High latency; prone to "lock-stale" issues on some clusters. |
| **`Lightning/Accelerate`** | Framework-specific wrappers | **High** (but locked-in). | Automatic rank-0 filtering. | Tied to specific training frameworks; hard to use in standalone scripts. |

---

## Why LogFlow? (The Gap)

Existing solutions force ML engineers to choose between **ease of use** (Loguru) and **robust distributed logic** (Lightning). LogFlow provides both.

### 1. Unified Distributed Awareness
LogFlow automatically detects the execution environment (SLURM, TorchRun, MPI) and adjusts its behavior.
- **The "Log Storm" Problem:** In a 128-GPU cluster, standard loggers write 128 identical lines.
- **LogFlow Solution:** Intelligently filters console output to Rank 0 while ensuring all Ranks can optionally write to unique or shared persistent files with atomic safety.

### 2. Framework Interoperability
ML projects often use a mix of libraries (PyTorch, TensorFlow, JAX, HuggingFace). Each has its own logging style.
- **LogFlow Solution:** Automatically intercepts standard `logging`, `warnings`, and `absl` (TensorFlow) calls, redirecting them into a single, unified, and color-coded stream.

### 3. Startup-Consistent Rotation
ML experiments are iterative.
- **The Problem:** Standard loggers append to old files or overwrite them, making it hard to find the start of "Experiment #42".
- **LogFlow Solution:** Implements **Startup Rotation**. Every time a script starts, the old log is archived with a timestamp, and a fresh log is created. This ensures 1:1 mapping between a run and a log file.

### 4. Zero-Latency "Enqueue"
By utilizing a dedicated background process for log sinking, LogFlow ensures that the main training loop (the "Critical Path") is never blocked by I/O operations, even when writing to slow network storage.

---

## Design Goals for Implementation
- **Developer First:** `from logflow import get_logger` should be the only line needed.
- **Framework Agnostic:** Works perfectly in a pure Python script, a Jupyter notebook, or a massive SLURM cluster.
- **Structured by Default:** Optional JSON output for integration with ELK or custom ML dashboards.
Loading