All notable changes to Percy will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Initialized Next.js 14 web interface with static export support
- Added Shadcn UI components and Tailwind CSS setup
- Added initial project structure for web interface
- Added data schema validation using Zod
- Added theme support with next-themes provider
- Added publish_handbook.py script to generate PDF handbook from markdown files
- Added automatic chapter organization and PDF compilation
- Added support for consistent PDF formatting using pandoc
- Added handbook_indexer.py script for RAG pipeline with FAISS vector search
- Added caching system for handbook embeddings to improve performance
- Added comprehensive tests for answer extraction in Chain of Thought responses
- Added Chain of Thought (CoT) reasoning capability with
--cotflag - Added CoT usage tracking in test results
- Added
generate_handbook.pyscript to generate a comprehensive amateur radio handbook using LLMs - Added support for generating topic-based study guides from question pools
- Added LLM-based content generation with OpenAI and Anthropic provider support
- Added markdown output format for generated handbook content
- Added batch_evaluate.py script for running multiple model evaluations
- Added YAML configuration support for batch evaluation parameters
- Added progress tracking and resume capability for batch runs
- Improved answer extraction logic to better handle various response formats
- Enhanced error handling for answer extraction
- Updated documentation with CoT usage examples
- Updated requirements.txt to include necessary LLM dependencies
- Enhanced logging configuration for better debugging and monitoring
- Support for evaluating non-OpenAI models in
evaluate_test.py. - Integrated
ChatAnthropicandChatOpenRouterfor Anthropic and OpenRouter providers. - Added
--providerargument to select different LLM providers (openai,anthropic,ollama,openrouter). - Implemented provider-specific logic for initializing and invoking LLMs.
- Enhanced image handling logic to accommodate limitations of different providers.
- Updated
README.mdwith instructions for setting up API keys for Anthropic, OpenRouter, and Ollama. - Added usage examples in
README.mdfor evaluating models from different providers. - New dependencies in
requirements.txtto support Anthropic and community integrations:langchain-anthropic>=0.0.5langchain-community>=0.0.11anthropic>=0.4.0
- Refactored
initialize_llmfunction to support multiple providers. - Updated
evaluate_test.pyto handle provider-specific message formatting and response handling. - Modified prompt creation to account for providers that do not support certain features (e.g., Anthropic models and images).
- Enhanced installation and usage documentation in
README.mdto reflect multi-provider support.
- Improved error handling when unsupported providers are selected.
- Added support for Extra class question pool (2024-2028)
- Added support for Extra class diagrams (pages 1-3) in evaluation script
- Added handling for multiple figures per diagram page in Extra class
- Added support for General class question pool (2023-2027)
- Added support for General class image G7-1.png in evaluation script
- Updated test filenames to include license class and version (e.g., general-2023-2027-test_20250113_130333.json)
- Modified extract_pool.py to automatically detect license class and version from filename
- Enhanced evaluate_test.py to handle both JPG and PNG image formats
- Improved figure reference detection in evaluate_test.py to handle variations (e.g., "Figure G7-1", "Figure G-7")
- Initial project structure and documentation
- README with project overview, folder structure, and usage instructions
- Script to convert DOCX question pools to JSON format (
extract_pool.py) - Script to generate randomized tests (
generate_test.py) that selects one question from each group - Virtual environment setup instructions
- Requirements.txt with python-docx dependency
- LangChain-based evaluation system (
evaluate_test.py) for testing LLM performance - Added dependencies: langchain, langchain-openai, python-dotenv, pydantic, pillow
- Added multimodal support for image-based questions (T1.jpg, T2.jpg, T3.jpg)
- Added langchain-community for image handling
- Added analysis script (
analyze_results.py) to determine pass/fail status based on license class criteria
- Made --input and --output parameters required in extract_pool.py
- Updated README with detailed usage instructions for extract_pool.py
- Modified generate_test.py to require explicit question pool file input
- Enhanced test output to include source question pool information
- Updated evaluate_test.py to use GPT-4 Vision for image-based questions
- Updated analysis script to handle different test lengths for each license class (35 questions for Technician/General, 50 for Extra)