Gearlux · Gearlux · Mar 1, 2026 · Feb 28, 2026 · Feb 28, 2026 · Feb 28, 2026
diff --git a/.flake8 b/.flake8
@@ -0,0 +1,13 @@
+[flake8]
+max-line-length = 120
+extend-ignore = E203, W503
+exclude =
+    .git,
+    __pycache__,
+    build,
+    dist,
+    .venv,
+    .mypy_cache,
+    .pytest_cache
+max-complexity = 15
+select = B,C,E,F,W,T4,B9
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -0,0 +1,58 @@
+name: LogFlow CI
+
+on:
+  push:
+    branches: [ main, dev/main ]
+  pull_request:
+    branches: [ main ]
+
+jobs:
+  quality-checks:
+    name: Linting & Type Checking
+    runs-on: ubuntu-latest
+    steps:
+    - uses: actions/checkout@v4
+    - name: Set up Python 3.12
+      uses: actions/setup-python@v5
+      with:
+        python-version: "3.12"
+        cache: 'pip'
+    - name: Install dependencies
+      run: |
+        python -m pip install --upgrade pip
+        pip install -e ".[dev]"
+    - name: Formatting (Isort & Black)
+      run: |
+        isort --check-only .
+        black --check .
+    - name: Linting (Flake8)
+      run: |
+        flake8 .
+    - name: Type Checking (Mypy)
+      run: |
+        mypy .
+
+  unit-tests:
+    name: Unit Tests & Coverage
+    needs: quality-checks
+    runs-on: ubuntu-latest
+    steps:
+    - uses: actions/checkout@v4
+    - name: Set up Python 3.12
+      uses: actions/setup-python@v5
+      with:
+        python-version: "3.12"
+        cache: 'pip'
+    - name: Install dependencies
+      run: |
+        python -m pip install --upgrade pip
+        pip install -e ".[dev]"
+    - name: Run Tests
+      run: |
+        pytest tests --junitxml=test-report.xml --cov=logflow --cov-report=xml --cov-report=term
+    - name: Upload Test Results
+      uses: actions/upload-artifact@v4
+      with:
+        name: test-results
+        path: test-report.xml
+      if: always()
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,137 @@
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+
+# C extensions
+*.so
+
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+share/python-wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+
+# PyInstaller
+#  Usually these files are written by a python script, and contain python code for
+#  the collector to be able to find dependencies.
+*.manifest
+*.spec
+
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.nox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+*.py,cover
+.hypothesis/
+.pytest_cache/
+cover/
+
+# Translations
+*.mo
+*.pot
+
+# Django stuff:
+*.log
+local_settings.py
+db.sqlite3
+db.sqlite3-journal
+
+# Flask stuff:
+instance/
+.webassets-cache
+
+# Scrapy stuff:
+.scrapy
+
+# Sphinx documentation
+docs/_build/
+
+# PyBuilder
+.pybuilder/
+target/
+
+# Jupyter Notebook
+.ipynb_checkpoints
+
+# IPython
+profile_default/
+ipython_config.py
+
+# pyenv
+#   For a library or package, you might want to ignore these files since the user is
+#   responsible for their own environments.
+# .python-version
+
+# pipenv
+#   According to pypa/pipenv#1181, it is recommended to include Pipfile.lock in version control.
+#   However, in case of collaboration, if you are working with a library, you should perhaps
+#   include Pipfile in your version control system.
+# Pipfile.lock
+
+# poetry
+#   Similar to Pipfile.lock, it is recommended to include poetry.lock in version control.
+# poetry.lock
+
+# pdm
+#   Similar to Pipfile.lock, it is recommended to include pdm.lock in version control.
+# pdm.lock
+
+# runtime-configuration-files
+#   Ignore these files for now since they are project-specific.
+# .env
+# .venv
+# venv/
+# ENV/
+# env.bak/
+# venv.bak/
+
+# mypy
+.mypy_cache/
+.dmypy.json
+dmypy.json
+
+# Pyre type checker
+.pyre/
+
+# pytype static type analyzer
+.pytype/
+
+# Cython debug symbols
+cython_debug/
+
+# PyCharm
+.idea/
+
+# VS Code
+.vscode/
+
+# Project-specific ignores
+.venv/
+demo_logs/
+experiment_logs/
+logs/
diff --git a/FUTURE_DEVELOPMENT.md b/FUTURE_DEVELOPMENT.md
@@ -0,0 +1,30 @@
+# LogFlow: Future Development & Roadmap
+
+This document outlines the strategic enhancements and next-level features for **LogFlow**, intended to solidify its position as the premier logging solution for High-Performance Computing (HPC) and Machine Learning (ML).
+
+---
+
+## 1. JSON Structured Logging (Observability)
+**Goal:** Make logs machine-readable for modern observability stacks (ELK, Splunk, Grafana Loki).
+- **Implementation:** Add a `serialize=True` option to file sinks.
+- **Benefit:** Allows for easy parsing, filtering, and aggregation of distributed training logs in centralized dashboards.
+
+## 2. Automatic Experiment Context (ML Lifecycle)
+**Goal:** Automatically inject ML-specific metadata into every log record.
+- **Implementation:** Create a context manager/provider for `epoch`, `step`, and `experiment_id`.
+- **Benefit:** Eliminates the need for manual `logger.bind` in every function; logs automatically carry their training context.
+
+## 3. Rich Framework Interoperability
+**Goal:** Deep integration with specialized ML frameworks.
+- **Implementation:** Specialized adapters for TensorFlow (`absl`), PyTorch Lightning, and JAX.
+- **Benefit:** Preserves framework-specific metadata (component names, internal timestamps) while maintaining a unified UI.
+
+## 4. Dynamic Reconfiguration (Runtime Control)
+**Goal:** Adjust log levels without restarting long-running training jobs.
+- **Implementation:** Use Unix signals (e.g., SIGHUP) or a file watcher to reload configuration.
+- **Benefit:** Allows developers to increase verbosity (e.g., INFO -> DEBUG) to diagnose a mid-training anomaly on the fly.
+
+## 5. Performance Optimization (Zero-Copy)
+**Goal:** Further reduce the impact of logging on the "Critical Path" of training.
+- **Implementation:** Explore zero-copy serialization or specialized background threads for high-volume metric logging.
+- **Benefit:** Ensures that logging overhead never impacts GPU utilization or training throughput.
diff --git a/Jenkinsfile b/Jenkinsfile
@@ -0,0 +1,87 @@
+pipeline {
+    agent any
+
+    environment {
+        // Local virtual environment within the Jenkins workspace for portability
+        VENV_PATH = "${WORKSPACE}/.venv"
+        VENV_BIN = "${VENV_PATH}/bin"
+    }
+
+    stages {
+        stage('Initialize') {
+            steps {
+                echo 'Creating Isolated Virtual Environment...'
+                sh "python3 -m venv ${VENV_PATH}"
+
+                echo 'Installing Dependencies in Editable Mode...'
+                sh "${VENV_BIN}/pip install --upgrade pip"
+                sh "${VENV_BIN}/pip install -e .[dev]"
+            }
+        }
+
+        stage('Linting') {
+            parallel {
+                stage('Black') {
+                    steps {
+                        sh "${VENV_BIN}/black --check logflow tests examples"
+                    }
+                }
+                stage('Isort') {
+                    steps {
+                        sh "${VENV_BIN}/isort --check-only logflow tests examples"
+                    }
+                }
+                stage('Flake8') {
+                    steps {
+                        // Clean up previous reports
+                        sh "rm -f flake8.txt flake8-report.xml"
+                        // Use || true to prevent the stage from stopping before the report is generated
+                        sh "${VENV_BIN}/flake8 logflow tests examples --tee --output-file=flake8.txt || true"
+                        // Convert report to JUnit XML
+                        sh "if [ -f flake8.txt ]; then ${VENV_BIN}/flake8_junit flake8.txt flake8-report.xml; fi"
+                    }
+                    post {
+                        always {
+                            // Archive the report if it was generated
+                            junit allowEmptyResults: true, testResults: 'flake8-report.xml'
+                        }
+                    }
+                }
+            }
+        }
+
+        stage('Type Check') {
+            steps {
+                sh "${VENV_BIN}/mypy logflow tests examples"
+            }
+        }
+
+        stage('Unit Tests') {
+            steps {
+                sh "${VENV_BIN}/pytest tests --junitxml=test-report.xml --cov=logflow --cov-report=xml:coverage.xml --cov-report=term"
+            }
+            post {
+                always {
+                    // Archive and display JUnit test results
+                    junit allowEmptyResults: true, testResults: 'test-report.xml'
+
+                    // Display Coverage in Jenkins UI using Code Coverage API Plugin
+                    recordCoverage tools: [[parser: 'COBERTURA', pattern: 'coverage.xml']]
+                }
+            }
+        }
+
+    }
+
+    post {
+        always {
+            echo 'LogFlow Pipeline Complete.'
+        }
+        success {
+            echo 'Project is healthy and ready for publication.'
+        }
+        failure {
+            echo 'Build failed. Please check linting or test failures.'
+        }
+    }
+}
diff --git a/RATIONALE.md b/RATIONALE.md
@@ -0,0 +1,45 @@
+# LogFlow: Rationale & Architectural Comparison
+
+## Executive Summary
+**LogFlow** is a modern, multiprocess-safe logging library specifically engineered for High-Performance Computing (HPC) and Machine Learning (ML) environments. While general-purpose logging libraries exist, LogFlow bridges the gap between raw logging primitives and the specialized needs of distributed training (e.g., PyTorch DDP, TensorFlow Distribution).
+
+---
+
+## The Landscape: Existing Alternatives
+
+| Library | Mechanism | ML/Distributed Suitability | Pros | Cons |
+| :--- | :--- | :--- | :--- | :--- |
+| **Standard `logging`** | Lock-based (Thread-safe) | **Low**. Requires complex `QueueHandler` setup for MP. | Zero dependencies, built-in. | Extremely verbose setup for MP; no built-in rank awareness. |
+| **`Loguru`** | `enqueue=True` (Queue-based) | **Medium**. Great UI/UX, but no native rank/DDP logic. | Beautiful output, thread/MP safe, easy rotation. | Not aware of SLURM/DDP ranks; requires manual wrapping for ML. |
+| **`Concurrent-Log-Handler`** | File Locking (`fcntl`/`flock`) | **Low**. Slow on network filesystems (NFS). | Simple to drop-in for standard logging. | High latency; prone to "lock-stale" issues on some clusters. |
+| **`Lightning/Accelerate`** | Framework-specific wrappers | **High** (but locked-in). | Automatic rank-0 filtering. | Tied to specific training frameworks; hard to use in standalone scripts. |
+
+---
+
+## Why LogFlow? (The Gap)
+
+Existing solutions force ML engineers to choose between **ease of use** (Loguru) and **robust distributed logic** (Lightning). LogFlow provides both.
+
+### 1. Unified Distributed Awareness
+LogFlow automatically detects the execution environment (SLURM, TorchRun, MPI) and adjusts its behavior. 
+- **The "Log Storm" Problem:** In a 128-GPU cluster, standard loggers write 128 identical lines. 
+- **LogFlow Solution:** Intelligently filters console output to Rank 0 while ensuring all Ranks can optionally write to unique or shared persistent files with atomic safety.
+
+### 2. Framework Interoperability
+ML projects often use a mix of libraries (PyTorch, TensorFlow, JAX, HuggingFace). Each has its own logging style.
+- **LogFlow Solution:** Automatically intercepts standard `logging`, `warnings`, and `absl` (TensorFlow) calls, redirecting them into a single, unified, and color-coded stream.
+
+### 3. Startup-Consistent Rotation
+ML experiments are iterative. 
+- **The Problem:** Standard loggers append to old files or overwrite them, making it hard to find the start of "Experiment #42".
+- **LogFlow Solution:** Implements **Startup Rotation**. Every time a script starts, the old log is archived with a timestamp, and a fresh log is created. This ensures 1:1 mapping between a run and a log file.
+
+### 4. Zero-Latency "Enqueue"
+By utilizing a dedicated background process for log sinking, LogFlow ensures that the main training loop (the "Critical Path") is never blocked by I/O operations, even when writing to slow network storage.
+
+---
+
+## Design Goals for Implementation
+- **Developer First:** `from logflow import get_logger` should be the only line needed.
+- **Framework Agnostic:** Works perfectly in a pure Python script, a Jupyter notebook, or a massive SLURM cluster.
+- **Structured by Default:** Optional JSON output for integration with ELK or custom ML dashboards.