Skip to content

Latest commit

 

History

History
436 lines (339 loc) · 12 KB

File metadata and controls

436 lines (339 loc) · 12 KB

Production Logging Guide - @prompter Training Data Collection

Purpose: Comprehensive guide to production logging system for DSPy training data collection

Status: ✅ Operational (2025-10-29)


Architecture Overview

User Request → @prompter Agent → System Prompt Generation
                    ↓
            Production Logger
                    ↓
      JSONL Log File (YYYY-MM-DD.jsonl)
                    ↓
        Feedback Collection (Interactive)
                    ↓
     Training Data Export (DSPy Format)
                    ↓
          Retraining Pipeline

Data Flow

  1. Request → User invokes @prompter with domain/deliverable requirements
  2. Generation → @prompter generates optimized system prompt (tracked: generation time, length, sections)
  3. Logging → Production logger captures input/output/metadata to JSONL file
  4. Feedback → Human reviews logs, marks deployed prompts, assigns validation scores
  5. Export → High-scoring logs (≥90%) exported to DSPy training format
  6. Retraining → New training examples improve next optimization cycle

Log Format Specification

JSONL Schema

{
  "timestamp": "2025-10-29T15:45:00Z",
  "agent": "prompter",
  "version": "1.0.0",
  "input": {
    "agent_domain": "legal",
    "deliverable_count": 15,
    "categories": ["Contract Review", "Compliance", "Risk Assessment"],
    "user_request": "Create optimized prompt for @legal with 15 deliverables"
  },
  "output": {
    "prompt_length": 6543,
    "sections": 9,
    "generation_time_ms": 2847,
    "truncated": false
  },
  "metadata": {
    "user": "michael",
    "session_id": "abc123",
    "model": "claude-sonnet-4-5-20250929",
    "lens_pipeline": "minimal"
  },
  "feedback": {
    "deployed": null,
    "validation_score": null,
    "user_rating": null,
    "notes": null
  }
}

Field Descriptions

Input Fields:

  • agent_domain - Target agent (marketing, finance, legal, seo, etc.)
  • deliverable_count - Number of template types requested
  • categories - How deliverables are grouped (extracted from user request)
  • user_request - Original prompt (PII-redacted, truncated to 500 chars)

Output Fields:

  • prompt_length - Generated system prompt character count
  • sections - Number of major sections (## headings)
  • generation_time_ms - Time from request to completion
  • truncated - Whether output was truncated

Metadata Fields:

  • user - User identifier (currently hardcoded to "michael")
  • session_id - Session tracking (currently null - no session support)
  • model - LLM model used for generation
  • lens_pipeline - Validation pipeline applied (minimal for @prompter)

Feedback Fields (populated by collect-feedback.py):

  • deployed - Whether prompt was deployed to production (true/false/null)
  • validation_score - Quality score 0-100 (null if not deployed)
  • user_rating - User satisfaction 1-5 (null if not deployed)
  • notes - Free-text feedback

Integration Points

Backend Integration (council.js)

Location: /home/michael/soulfield/backend/council.js lines 1453-1476

IF/THEN/BECAUSE Logic:

IF: @prompter execution completes successfully (id === 'prompter')
THEN: Log to production-logs/prompter/{date}.jsonl
BECAUSE: Captures usage patterns for retraining
DEPENDS ON: backend/services/prompter-logger.cjs exists
FAILURE MODES:
  - Logger throws error → Catch, log to stderr, continue
  - File write fails → Fallback to console.log() with [PROMPTER-LOG] prefix

Code Pattern:

if (id === 'prompter') {
  try {
    const { logPrompterUsage } = require('./services/prompter-logger.cjs');
    await logPrompterUsage({
      prompt: claudePrompt,
      output: out,
      startTime: startTime,
      metadata: {
        user: 'michael',
        session_id: null,
        model: agent.model || 'claude-sonnet-4-5-20250929',
        lens_pipeline: agent.lensPipeline || 'minimal'
      }
    });
    console.log('[council:prompter] Production usage logged for training data collection');
  } catch (logErr) {
    console.error('[council:prompter] Logging failed (non-fatal):', logErr.message);
  }
}

Non-Blocking Guarantee:

  • Logging wrapped in try/catch
  • Errors logged to console.error()
  • Agent execution continues regardless of logging success/failure

File Locations

Production Logs

Primary Storage:

/home/michael/soulfield/workspace/training-examples/production-logs/prompter/
├── 2025-10-29.jsonl
├── 2025-10-30.jsonl
├── 2025-10-31.jsonl
└── archive/
    ├── 2025-09-01.jsonl.gz
    └── 2025-09-02.jsonl.gz

Fallback Storage (if primary fails):

/tmp/prompter-logs/
└── YYYY-MM-DD.jsonl

Exported Training Data

DSPy Training Examples:

/home/michael/soulfield/workspace/training-examples/
├── marketing/
│   └── production-2025-10-29T15-45-00Z.json
├── finance/
│   └── production-2025-10-29T16-30-00Z.json
├── legal/
│   └── production-2025-10-29T17-15-00Z.json
└── other/
    └── production-2025-10-29T18-00-00Z.json

Privacy and Security

PII Redaction

Automatic Redaction (backend/services/prompter-logger.cjs:78-95):

function redactPII(text) {
  // Email addresses → [EMAIL]
  text = text.replace(/[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/g, '[EMAIL]');

  // Phone numbers → [PHONE]
  text = text.replace(/\b\d{3}[-.]?\d{3}[-.]?\d{4}\b/g, '[PHONE]');

  // SSN → [SSN]
  text = text.replace(/\b\d{3}-\d{2}-\d{4}\b/g, '[SSN]');

  // Credit cards → [CARD]
  text = text.replace(/\b\d{4}[- ]?\d{4}[- ]?\d{4}[- ]?\d{4}\b/g, '[CARD]');

  return text;
}

What Gets Redacted:

  • Email addresses → [EMAIL]
  • Phone numbers → [PHONE]
  • Social Security Numbers → [SSN]
  • Credit card numbers → [CARD]

Verification:

# Check production logs for PII leaks
grep -E '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}' workspace/training-examples/production-logs/prompter/*.jsonl
# Should return no results if redaction working

Retention Policy

IF/THEN/BECAUSE:

IF: Log older than 90 days AND no feedback
THEN: Delete or archive to cold storage
BECAUSE: Unfeedback'd logs have low training value
DEPENDS ON: manage-logs.py cron job
FAILURE MODES: Manual review before deletion to avoid data loss

Lifecycle:

  • 0-30 days: Active logs (uncompressed, hot storage)
  • 30-90 days: Compressed logs (.jsonl.gz, warm storage)
  • 90+ days: Archived or deleted (feedback'd logs kept, others deleted)

Troubleshooting

Problem: Logs not being created

Diagnosis:

# Check if directory exists
ls -la workspace/training-examples/production-logs/prompter/

# Check for permission errors
touch workspace/training-examples/production-logs/prompter/test.txt
rm workspace/training-examples/production-logs/prompter/test.txt

# Check server logs for error messages
tail -f /tmp/soulfield-debug.log | grep prompter-logger

Solutions:

  1. Create directory manually: mkdir -p workspace/training-examples/production-logs/prompter/
  2. Fix permissions: chmod -R 755 workspace/training-examples/production-logs/
  3. Check fallback location: ls /tmp/prompter-logs/

Problem: JSONL file corrupted

Diagnosis:

# Validate all lines are valid JSON
while IFS= read -r line; do
  echo "$line" | jq . > /dev/null || echo "Invalid JSON: $line"
done < workspace/training-examples/production-logs/prompter/2025-10-29.jsonl

Solutions:

  1. Remove corrupted lines manually
  2. Restore from backup (if available)
  3. Concurrent write issue → Logs now use append-only writes (atomic)

Problem: PII not being redacted

Verification Test:

# Create test log with PII
curl -X POST http://localhost:8790/chat -d '{
  "prompt": "@prompter Create prompt with email test@example.com and phone 555-123-4567"
}'

# Check if redacted
grep -i "test@example.com" workspace/training-examples/production-logs/prompter/*.jsonl
# Should return nothing

grep "[EMAIL]" workspace/training-examples/production-logs/prompter/*.jsonl
# Should return the redacted entry

Solutions:

  1. Update redactPII() regex patterns
  2. Add test case to prompter-logger.test.cjs
  3. Run tests: node backend/tests/prompter-logger.test.cjs

Usage Examples

Collect Feedback

# List all logs needing feedback
python3 workspace/training-examples/collect-feedback.py --list

# Review today's logs interactively
python3 workspace/training-examples/collect-feedback.py --review $(date +%Y-%m-%d)

# Review all pending logs
python3 workspace/training-examples/collect-feedback.py --review ""

Export Training Data

# Export all high-scoring logs (≥90%)
python3 workspace/training-examples/export-production-data.py

# Export with custom threshold
python3 workspace/training-examples/export-production-data.py --threshold 85

# Export specific domain only
python3 workspace/training-examples/export-production-data.py --domain marketing

# Export logs since specific date
python3 workspace/training-examples/export-production-data.py --since 2025-10-01

# Verbose output for debugging
python3 workspace/training-examples/export-production-data.py --verbose

Manage Logs

# View statistics
python3 workspace/training-examples/manage-logs.py --stats

# Compress logs older than 30 days
python3 workspace/training-examples/manage-logs.py --compress --days 30

# Archive logs older than 90 days
python3 workspace/training-examples/manage-logs.py --archive --days 90

# Health check for anomalies
python3 workspace/training-examples/manage-logs.py --health-check

# Dry-run before actual compression
python3 workspace/training-examples/manage-logs.py --compress --dry-run

Testing

Run Test Suite

# Full test suite (8 tests)
node backend/tests/prompter-logger.test.cjs

# Expected output:
# ✅ PASS: Logger writes to correct file path (YYYY-MM-DD.jsonl)
# ✅ PASS: JSONL format validates against schema
# ✅ PASS: PII redaction removes emails/phones/SSNs
# ✅ PASS: Disk full scenario logs to stderr and continues
# ✅ PASS: Permission error falls back to /tmp/
# ✅ PASS: Concurrent writes do not corrupt JSONL
# ✅ PASS: Statistics calculation works correctly
# ✅ PASS: Input parameter extraction works correctly
#
# === Test Summary ===
# Total Tests: 8
# Passed: 8
# Failed: 0

Manual Integration Test

# Start server
npm start

# In another terminal, invoke @prompter
curl -X POST http://localhost:8790/chat -d '{
  "prompt": "@prompter Create optimized prompt for @marketing with 35 deliverables"
}'

# Verify log created
ls -la workspace/training-examples/production-logs/prompter/$(date +%Y-%m-%d).jsonl

# View log contents
cat workspace/training-examples/production-logs/prompter/$(date +%Y-%m-%d).jsonl | jq .

Future Enhancements

Phase 1 (Current) ✅

  • Basic logging to JSONL
  • PII redaction
  • Feedback collection tool
  • Training data export
  • Log management (compress, archive, stats)
  • Test suite (8/8 passing)

Phase 2 (Next)

  • Session tracking (requires session management in council.js)
  • Automatic quality scoring (integrate with lens validation results)
  • Auto-approve high-quality logs (threshold-based)
  • Prometheus metrics export (for monitoring)

Phase 3 (Future)

  • Real-time feedback dashboard (web UI)
  • A/B testing support (compare prompt variations)
  • Automated retraining trigger (when N new examples collected)
  • Integration with CI/CD (validate before deployment)

Related Documentation

  • FEEDBACK-COLLECTION-WORKFLOW.md - Step-by-step feedback workflow
  • LOGGING-QUICK-REFERENCE.md - One-page cheat sheet
  • TRAINING-DATA-INVENTORY.md - Complete training data catalog
  • DSPY-ENVIRONMENT-SETUP.md - DSPy setup with logging verification
  • workspace/docs/Obsidian-v2/docs/reference/agents/prompter.md - @prompter reference

Last Updated: 2025-10-29 Maintainer: Michael Status: Production-ready, all tests passing