This guide explains how to collect logs and debug issues with the chunking strategy library, whether you're a regular user experiencing problems or a developer contributing to the project.
- Quick Start: Report a Bug
- Understanding Log Levels
- Collecting Debug Information
- Using Logging in Your Code
- Advanced Debugging
- Troubleshooting Common Issues
If you're experiencing an issue and want to report a bug, follow these steps:
# One command to collect everything needed for a bug report
chunking-strategy debug archive "Brief description of the issue you're experiencing"This creates a debug archive with all the information developers need to help you.
# Enable debug mode
chunking-strategy --debug chunk your-problem-file.txt
# Or for more control:
chunking-strategy --log-level debug --log-file debug.log chunk your-file.txt
# Collect debug information
chunking-strategy debug collect --description "What you were trying to do"The commands above create a .zip file containing:
- System information
- Log files
- Performance data
- Configuration used
- Error details
This file contains NO sensitive data - only technical information needed for debugging.
The library uses different log levels for different audiences:
silent: Only critical errorsminimal: Basic status updatesnormal(default): Standard user information with progress and results
# Examples
chunking-strategy --log-level minimal chunk document.pdf
chunking-strategy --quiet chunk document.pdf # Same as minimalverbose: Detailed progress and performance informationdebug: Comprehensive debugging informationtrace: Maximum verbosity for development
# Examples
chunking-strategy --verbose chunk document.pdf
chunking-strategy --debug chunk document.pdf
chunking-strategy --log-level trace --log-file trace.log chunk document.pdf# Enable debug mode with default log file
chunking-strategy debug enable
# Enable with custom log file
chunking-strategy debug enable --log-file my-debug.log# Basic collection
chunking-strategy debug collect
# With description and custom output location
chunking-strategy debug collect \
--description "Chunking fails on large PDF files" \
--output ./debug-archives/# Test all log levels and see what each produces
chunking-strategy debug test-loggingimport chunking_strategy as cs
# Enable debug mode programmatically
debug_dir = cs.enable_debug_mode()
print(f"Debug info will be collected in: {debug_dir}")
# Your chunking operations here
chunker = cs.create_chunker("sentence_based")
result = chunker.chunk("your content")
# Collect debug archive
debug_info = cs.create_debug_archive("Description of issue")
print(f"Debug archive created: {debug_info['debug_archive']}")import chunking_strategy as cs
# Configure logging for your use case
cs.configure_logging(
level=cs.LogLevel.VERBOSE,
file_output=True,
log_file="my_app_chunking.log",
collect_performance=True
)
# Use user-friendly logging in your application
cs.user_info("Starting document processing...")
chunker = cs.create_chunker("semantic")
result = chunker.chunk("content")
cs.user_success(f"Processing complete: {len(result.chunks)} chunks created")Use the user-friendly logging functions for clean, consistent output:
import chunking_strategy as cs
# User-facing messages (shown at normal log level)
cs.user_info("Processing started...")
cs.user_success("Processing completed successfully!")
cs.user_warning("Large file detected, this may take longer")
cs.user_error("Failed to process file")
# Developer debugging (shown at debug log level)
cs.debug_operation("file_processing", {
"file_size": 1024000,
"strategy": "semantic",
"parameters": {"threshold": 0.7}
})
# Performance tracking
cs.performance_log("chunking_operation", duration_seconds=1.23,
chunks_created=15, file_size_mb=2.1)
# Metrics logging
cs.metrics_log({
"quality_score": 0.85,
"processing_speed": "1.2MB/s",
"memory_usage": "150MB"
})Use standard Python logging in your modules:
# In your module
from chunking_strategy.logging_config import get_logger
logger = get_logger(__name__)
def your_function():
logger.info("Starting processing")
logger.debug("Detailed processing info")
logger.warning("Something might be wrong")
logger.error("Something went wrong")import chunking_strategy as cs
from pathlib import Path
# Advanced configuration
config = cs.LogConfig(
level=cs.LogLevel.DEBUG,
console_output=True,
file_output=True,
log_file=Path("detailed_debug.log"),
format_json=False, # Set to True for structured logging
collect_performance=True,
collect_metrics=True,
max_file_size="50MB",
backup_count=5
)
cs.configure_logging(config)import time
import chunking_strategy as cs
def monitored_chunking():
start_time = time.time()
# Your chunking operation
chunker = cs.create_chunker("semantic")
result = chunker.chunk(content)
# Log performance
cs.performance_log(
operation="semantic_chunking",
duration=time.time() - start_time,
input_size=len(content),
chunks_created=len(result.chunks),
strategy_used=result.strategy_used
)
return resultimport json
import chunking_strategy as cs
# Configure JSON logging for external log aggregation
cs.configure_logging(
level=cs.LogLevel.VERBOSE,
format_json=True, # Structured JSON output
log_file="chunking_metrics.jsonl"
)
# Your operations will now produce structured JSON logs
# suitable for Elasticsearch, Splunk, etc.Solution:
# Check your log level
chunking-strategy --log-level debug chunk yourfile.txt
# Or enable verbose mode
chunking-strategy --verbose chunk yourfile.txtSolution:
# Use quiet mode for minimal output
chunking-strategy --quiet chunk yourfile.txt
# Or set minimal log level
chunking-strategy --log-level minimal chunk yourfile.txtSolution:
# One command to collect everything
chunking-strategy debug archive "Describe your issue here"Solution:
# Enable performance collection
cs.configure_logging(
collect_performance=True,
collect_metrics=True
)Solution:
# Specify explicit log file location
chunking-strategy --log-file /path/to/my/logs.log chunk yourfile.txt
# Or check the debug command output
chunking-strategy debug enable --log-file debug.log- Use normal log level for day-to-day operations
- Use debug mode when experiencing issues
- Collect debug archives when reporting bugs
- Include descriptions when creating debug archives
- Use appropriate log levels - debug info shouldn't spam users
- Use structured logging for performance metrics
- Include context in debug operations
- Test your logging with different log levels
- Use minimal or normal log levels in production
- Enable file logging for troubleshooting
- Set up log rotation for long-running applications
- Monitor performance logs for optimization opportunities
# Normal usage - clean, minimal output
chunking-strategy chunk document.pdf
# With progress information
chunking-strategy --verbose chunk large-document.pdf
# Having issues? Create debug archive
chunking-strategy debug archive "PDF processing fails on page 10"# Detailed performance logging
chunking-strategy --log-level verbose --log-file performance.log \
batch-directory ./documents/ --parallel
# Analyze the logs
chunking-strategy debug test-logging# Maximum debugging information
chunking-strategy --log-level trace --log-file debug.log \
chunk test-file.txt --strategy my_new_strategy
# Or in code:import chunking_strategy as cs
cs.configure_logging(level=cs.LogLevel.DEBUG)
cs.debug_operation("new_feature_test", {
"feature": "my_new_strategy",
"test_file": "test-file.txt"
})
# Your development code hereimport logging
import chunking_strategy as cs
# Set up logging for your application
cs.configure_logging(
level=cs.LogLevel.NORMAL,
file_output=True,
log_file="myapp_chunking.log"
)
# Use in your application
def process_documents(file_paths):
cs.user_info(f"Processing {len(file_paths)} documents...")
for i, file_path in enumerate(file_paths):
cs.user_info(f"Processing {i+1}/{len(file_paths)}: {file_path.name}")
try:
chunker = cs.create_chunker("auto")
result = chunker.chunk(file_path)
cs.user_success(f"Created {len(result.chunks)} chunks")
except Exception as e:
cs.user_error(f"Failed to process {file_path}: {e}")
# Debug info automatically collected in debug mode
cs.user_success("All documents processed!")If this guide doesn't solve your issue:
- Create a debug archive with a detailed description
- Check existing issues on GitHub
- Open a new issue with the debug archive attached
- Include the session ID from the debug archive
The debug archive contains everything developers need to reproduce and fix your issue!