Purpose: Comprehensive guide to production logging system for DSPy training data collection
Status: ✅ Operational (2025-10-29)
User Request → @prompter Agent → System Prompt Generation
↓
Production Logger
↓
JSONL Log File (YYYY-MM-DD.jsonl)
↓
Feedback Collection (Interactive)
↓
Training Data Export (DSPy Format)
↓
Retraining Pipeline
- Request → User invokes @prompter with domain/deliverable requirements
- Generation → @prompter generates optimized system prompt (tracked: generation time, length, sections)
- Logging → Production logger captures input/output/metadata to JSONL file
- Feedback → Human reviews logs, marks deployed prompts, assigns validation scores
- Export → High-scoring logs (≥90%) exported to DSPy training format
- Retraining → New training examples improve next optimization cycle
{
"timestamp": "2025-10-29T15:45:00Z",
"agent": "prompter",
"version": "1.0.0",
"input": {
"agent_domain": "legal",
"deliverable_count": 15,
"categories": ["Contract Review", "Compliance", "Risk Assessment"],
"user_request": "Create optimized prompt for @legal with 15 deliverables"
},
"output": {
"prompt_length": 6543,
"sections": 9,
"generation_time_ms": 2847,
"truncated": false
},
"metadata": {
"user": "michael",
"session_id": "abc123",
"model": "claude-sonnet-4-5-20250929",
"lens_pipeline": "minimal"
},
"feedback": {
"deployed": null,
"validation_score": null,
"user_rating": null,
"notes": null
}
}Input Fields:
agent_domain- Target agent (marketing, finance, legal, seo, etc.)deliverable_count- Number of template types requestedcategories- How deliverables are grouped (extracted from user request)user_request- Original prompt (PII-redacted, truncated to 500 chars)
Output Fields:
prompt_length- Generated system prompt character countsections- Number of major sections (## headings)generation_time_ms- Time from request to completiontruncated- Whether output was truncated
Metadata Fields:
user- User identifier (currently hardcoded to "michael")session_id- Session tracking (currently null - no session support)model- LLM model used for generationlens_pipeline- Validation pipeline applied (minimal for @prompter)
Feedback Fields (populated by collect-feedback.py):
deployed- Whether prompt was deployed to production (true/false/null)validation_score- Quality score 0-100 (null if not deployed)user_rating- User satisfaction 1-5 (null if not deployed)notes- Free-text feedback
Location: /home/michael/soulfield/backend/council.js lines 1453-1476
IF/THEN/BECAUSE Logic:
IF: @prompter execution completes successfully (id === 'prompter')
THEN: Log to production-logs/prompter/{date}.jsonl
BECAUSE: Captures usage patterns for retraining
DEPENDS ON: backend/services/prompter-logger.cjs exists
FAILURE MODES:
- Logger throws error → Catch, log to stderr, continue
- File write fails → Fallback to console.log() with [PROMPTER-LOG] prefix
Code Pattern:
if (id === 'prompter') {
try {
const { logPrompterUsage } = require('./services/prompter-logger.cjs');
await logPrompterUsage({
prompt: claudePrompt,
output: out,
startTime: startTime,
metadata: {
user: 'michael',
session_id: null,
model: agent.model || 'claude-sonnet-4-5-20250929',
lens_pipeline: agent.lensPipeline || 'minimal'
}
});
console.log('[council:prompter] Production usage logged for training data collection');
} catch (logErr) {
console.error('[council:prompter] Logging failed (non-fatal):', logErr.message);
}
}Non-Blocking Guarantee:
- Logging wrapped in try/catch
- Errors logged to console.error()
- Agent execution continues regardless of logging success/failure
Primary Storage:
/home/michael/soulfield/workspace/training-examples/production-logs/prompter/
├── 2025-10-29.jsonl
├── 2025-10-30.jsonl
├── 2025-10-31.jsonl
└── archive/
├── 2025-09-01.jsonl.gz
└── 2025-09-02.jsonl.gz
Fallback Storage (if primary fails):
/tmp/prompter-logs/
└── YYYY-MM-DD.jsonl
DSPy Training Examples:
/home/michael/soulfield/workspace/training-examples/
├── marketing/
│ └── production-2025-10-29T15-45-00Z.json
├── finance/
│ └── production-2025-10-29T16-30-00Z.json
├── legal/
│ └── production-2025-10-29T17-15-00Z.json
└── other/
└── production-2025-10-29T18-00-00Z.json
Automatic Redaction (backend/services/prompter-logger.cjs:78-95):
function redactPII(text) {
// Email addresses → [EMAIL]
text = text.replace(/[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/g, '[EMAIL]');
// Phone numbers → [PHONE]
text = text.replace(/\b\d{3}[-.]?\d{3}[-.]?\d{4}\b/g, '[PHONE]');
// SSN → [SSN]
text = text.replace(/\b\d{3}-\d{2}-\d{4}\b/g, '[SSN]');
// Credit cards → [CARD]
text = text.replace(/\b\d{4}[- ]?\d{4}[- ]?\d{4}[- ]?\d{4}\b/g, '[CARD]');
return text;
}What Gets Redacted:
- Email addresses →
[EMAIL] - Phone numbers →
[PHONE] - Social Security Numbers →
[SSN] - Credit card numbers →
[CARD]
Verification:
# Check production logs for PII leaks
grep -E '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}' workspace/training-examples/production-logs/prompter/*.jsonl
# Should return no results if redaction workingIF/THEN/BECAUSE:
IF: Log older than 90 days AND no feedback
THEN: Delete or archive to cold storage
BECAUSE: Unfeedback'd logs have low training value
DEPENDS ON: manage-logs.py cron job
FAILURE MODES: Manual review before deletion to avoid data loss
Lifecycle:
- 0-30 days: Active logs (uncompressed, hot storage)
- 30-90 days: Compressed logs (.jsonl.gz, warm storage)
- 90+ days: Archived or deleted (feedback'd logs kept, others deleted)
Diagnosis:
# Check if directory exists
ls -la workspace/training-examples/production-logs/prompter/
# Check for permission errors
touch workspace/training-examples/production-logs/prompter/test.txt
rm workspace/training-examples/production-logs/prompter/test.txt
# Check server logs for error messages
tail -f /tmp/soulfield-debug.log | grep prompter-loggerSolutions:
- Create directory manually:
mkdir -p workspace/training-examples/production-logs/prompter/ - Fix permissions:
chmod -R 755 workspace/training-examples/production-logs/ - Check fallback location:
ls /tmp/prompter-logs/
Diagnosis:
# Validate all lines are valid JSON
while IFS= read -r line; do
echo "$line" | jq . > /dev/null || echo "Invalid JSON: $line"
done < workspace/training-examples/production-logs/prompter/2025-10-29.jsonlSolutions:
- Remove corrupted lines manually
- Restore from backup (if available)
- Concurrent write issue → Logs now use append-only writes (atomic)
Verification Test:
# Create test log with PII
curl -X POST http://localhost:8790/chat -d '{
"prompt": "@prompter Create prompt with email test@example.com and phone 555-123-4567"
}'
# Check if redacted
grep -i "test@example.com" workspace/training-examples/production-logs/prompter/*.jsonl
# Should return nothing
grep "[EMAIL]" workspace/training-examples/production-logs/prompter/*.jsonl
# Should return the redacted entrySolutions:
- Update redactPII() regex patterns
- Add test case to prompter-logger.test.cjs
- Run tests:
node backend/tests/prompter-logger.test.cjs
# List all logs needing feedback
python3 workspace/training-examples/collect-feedback.py --list
# Review today's logs interactively
python3 workspace/training-examples/collect-feedback.py --review $(date +%Y-%m-%d)
# Review all pending logs
python3 workspace/training-examples/collect-feedback.py --review ""# Export all high-scoring logs (≥90%)
python3 workspace/training-examples/export-production-data.py
# Export with custom threshold
python3 workspace/training-examples/export-production-data.py --threshold 85
# Export specific domain only
python3 workspace/training-examples/export-production-data.py --domain marketing
# Export logs since specific date
python3 workspace/training-examples/export-production-data.py --since 2025-10-01
# Verbose output for debugging
python3 workspace/training-examples/export-production-data.py --verbose# View statistics
python3 workspace/training-examples/manage-logs.py --stats
# Compress logs older than 30 days
python3 workspace/training-examples/manage-logs.py --compress --days 30
# Archive logs older than 90 days
python3 workspace/training-examples/manage-logs.py --archive --days 90
# Health check for anomalies
python3 workspace/training-examples/manage-logs.py --health-check
# Dry-run before actual compression
python3 workspace/training-examples/manage-logs.py --compress --dry-run# Full test suite (8 tests)
node backend/tests/prompter-logger.test.cjs
# Expected output:
# ✅ PASS: Logger writes to correct file path (YYYY-MM-DD.jsonl)
# ✅ PASS: JSONL format validates against schema
# ✅ PASS: PII redaction removes emails/phones/SSNs
# ✅ PASS: Disk full scenario logs to stderr and continues
# ✅ PASS: Permission error falls back to /tmp/
# ✅ PASS: Concurrent writes do not corrupt JSONL
# ✅ PASS: Statistics calculation works correctly
# ✅ PASS: Input parameter extraction works correctly
#
# === Test Summary ===
# Total Tests: 8
# Passed: 8
# Failed: 0# Start server
npm start
# In another terminal, invoke @prompter
curl -X POST http://localhost:8790/chat -d '{
"prompt": "@prompter Create optimized prompt for @marketing with 35 deliverables"
}'
# Verify log created
ls -la workspace/training-examples/production-logs/prompter/$(date +%Y-%m-%d).jsonl
# View log contents
cat workspace/training-examples/production-logs/prompter/$(date +%Y-%m-%d).jsonl | jq .- Basic logging to JSONL
- PII redaction
- Feedback collection tool
- Training data export
- Log management (compress, archive, stats)
- Test suite (8/8 passing)
- Session tracking (requires session management in council.js)
- Automatic quality scoring (integrate with lens validation results)
- Auto-approve high-quality logs (threshold-based)
- Prometheus metrics export (for monitoring)
- Real-time feedback dashboard (web UI)
- A/B testing support (compare prompt variations)
- Automated retraining trigger (when N new examples collected)
- Integration with CI/CD (validate before deployment)
- FEEDBACK-COLLECTION-WORKFLOW.md - Step-by-step feedback workflow
- LOGGING-QUICK-REFERENCE.md - One-page cheat sheet
- TRAINING-DATA-INVENTORY.md - Complete training data catalog
- DSPY-ENVIRONMENT-SETUP.md - DSPy setup with logging verification
- workspace/docs/Obsidian-v2/docs/reference/agents/prompter.md - @prompter reference
Last Updated: 2025-10-29 Maintainer: Michael Status: Production-ready, all tests passing