Skip to content

-#10

Merged
dongkoony merged 68 commits into
docs/readme-v1from
main
Jan 20, 2026
Merged

-#10
dongkoony merged 68 commits into
docs/readme-v1from
main

Conversation

@dongkoony
Copy link
Copy Markdown
Owner

dongkoony and others added 30 commits November 17, 2025 15:58
- Updated Docker configuration to point to the new app directory.
- Changed the port mapping for the dashboard service.
- Added a new .dockerignore file to exclude unnecessary files.
- Implemented FastAPI endpoints for health checks and metrics summary.
- Introduced Pydantic models for response schemas.
- Enhanced database interaction with SQLAlchemy and added settings management.
- Created a Dockerfile for the dashboard service with necessary dependencies.
- Refactored existing code to improve structure and readability.
… integration

- Added a new schema for evaluation results using Pydantic.
- Implemented a basic rule-based evaluation function to assess LLM responses.
- Updated FastAPI endpoints to support evaluation logic and improved health check response.
- Refactored models to include new fields for evaluation results and adjusted database interactions.
- Enhanced .gitignore to include local configuration files.
…localization support

- Added a new API endpoint to fetch time series data for LLM metrics over a specified number of days.
- Introduced new Pydantic models for time series data representation.
- Enhanced the dashboard layout to include a language switcher for localization.
- Updated the overview component to display time series data with charts for quality trends, latency, and request volume.
- Integrated translation support for multiple languages in the dashboard.
- Enhanced web dashboard description to include new statistics cards, model performance charts, and recent activity previews.
- Updated API endpoint details to reflect pagination support for logs and evaluations.
- Improved clarity on model performance comparison features.
…n support

- Enhanced the overview section with updated statistics and time series charts.
- Added multilingual support with a language selection dropdown and localStorage integration.
- Clarified API endpoint details, including the addition of a new time series data endpoint.
- Integrated translation functionality into Evaluations, Logs, and Models components.
- Replaced static text with localized strings for titles, subtitles, and error messages.
- Enhanced user experience by providing multilingual support for various UI elements.
…e evaluation model

- Added support for LLM-as-a-Judge evaluation in the /evaluate-once endpoint, allowing for dynamic evaluation methods based on the judge_type parameter.
- Updated LLMEvaluation model to include optional fields for LLM-specific scores and raw judge responses.
- Improved error handling for evaluation failures, ensuring database rollback on exceptions.
- Added fields for score_instruction_following and score_truthfulness to the EvaluationRead model.
- Updated the evaluation endpoint to return new scoring metrics and raw judge responses, enhancing the evaluation data provided.
…oring metrics

- Introduced functions to determine judge type and display corresponding badges in the evaluations table.
- Added new columns for score instruction following and score truthfulness in the evaluations table.
- Updated localization files to include translations for new labels and judge types.
Merge pull request #4 from dongkoony/main
- Introduced new settings for automatic evaluation, including evaluation interval, batch size, and judge type.
- Added configuration options for Slack and Discord notifications, including webhook URLs and score thresholds.
- Added an async context manager for managing the lifespan of the FastAPI app, including starting and stopping a scheduler.
- Integrated logging to track the service's startup and shutdown processes.
- Removed redundant table creation code from the app startup as it is now handled in the lifespan context.
- Implemented a new notifier module to send alerts via Slack and Discord for low-quality evaluation results.
- Added functions to send notifications for low-quality alerts and batch evaluation summaries.
- Integrated logging for successful and failed notification attempts.
- Introduced a new scheduler module using APScheduler to automate the evaluation of LLM logs at specified intervals.
- Implemented functions for running batch evaluations, handling pending logs, and sending notifications for low-quality evaluations and summaries.
- Integrated logging for tracking evaluation processes and errors.
- Introduced a new utility function `get_pending_logs` to fetch LLM logs that are yet to be evaluated.
- The function filters logs by status, excludes already evaluated logs, and sorts them by creation date.
- Supports a configurable limit for the number of logs returned.
- Introduced health check endpoint tests for both the evaluator and gateway API services.
- Each test verifies that the health endpoint returns a 200 OK status and the expected JSON response.
- Added initialization files for test organization in both services.
- Introduced a new .flake8 configuration file to enforce coding standards.
- Set maximum line length to 127 and specified ignored error codes.
- Excluded common directories and files from linting to streamline the process.
- Introduced a new GitHub Actions CI workflow to automate linting, building, and testing for the services.
- Configured linting with flake8 and formatting checks with black for code quality enforcement.
- Set up Docker build tests for gateway-api, evaluator, and dashboard services.
- Added unit test placeholders for Gateway API and Evaluator services.
- Included frontend build checks for the dashboard service using Node.js and TypeScript.
- Added OPENAI_MODEL_JUDGE setting for specifying the judge model.
- Introduced batch evaluation scheduler settings, including ENABLE_AUTO_EVALUATION, EVALUATION_INTERVAL_MINUTES, EVALUATION_BATCH_SIZE, and EVALUATION_JUDGE_TYPE.
- Included NOTIFICATION_SCORE_THRESHOLD for setting notification criteria.
- Added `apscheduler` for scheduling tasks and `httpx` for making HTTP requests.
- These dependencies enhance the functionality of the evaluator service for batch evaluations and external API interactions.
…ty alerts

- Added a new `/metrics` endpoint for Prometheus to expose service metrics.
- Integrated low-quality alert notifications by invoking `send_low_quality_alert` after each evaluation.
- Committed evaluation results to the database immediately after processing each log.
dongkoony and others added 28 commits December 26, 2025 16:31
- Introduced a new JSON configuration file for the LLM Quality Observer dashboard in Grafana.
- Configured multiple panels to visualize metrics from Prometheus, including HTTP request rates, evaluation rates, and notification rates.
- Enhanced monitoring capabilities with detailed visualizations for LLM requests, evaluation scores, and scheduler runs.
- Set refresh interval and added relevant tags for better organization and accessibility.
…ation

- Introduced a detailed guide for setting up email notifications in the LLM Quality Observer system, covering prerequisites, configuration, and testing procedures.
- Included specific setup instructions for popular SMTP providers like Gmail, Microsoft 365, SendGrid, AWS SES, and Mailgun.
- Documented environment variables required for email notifications and provided troubleshooting tips for common issues.
- Added a metrics reference section to monitor email notification performance and delivery rates, enhancing observability and support for email-related metrics.
…tings

- Added example environment variables for SMTP configuration to facilitate email notifications.
- Included placeholders for SMTP host, port, username, password, and recipient emails to guide users in setting up email notifications.
- Introduced a new dashboard overview image to improve the visual representation of the LLM Quality Observer dashboard in Grafana.
- This addition aims to provide users with a clearer understanding of the dashboard layout and metrics displayed.
- Introduced comprehensive documentation for the LLM Quality Observer Grafana dashboard in both Korean and English.
- The guides cover dashboard access, structure, performance metrics, quality tracking, and troubleshooting steps, enhancing user understanding and usability.
- Included detailed PromQL query explanations and usage tips for effective monitoring and evaluation.
- Enhanced the README with a comprehensive overview of the LLM-Quality-Observer project, including its purpose, key features, and architecture.
- Updated service components section to reflect the current architecture and added detailed descriptions of each service.
- Improved installation and usage instructions, including environment variable configurations and service verification steps.
- Added a roadmap section outlining completed features and future plans for the project.
- Included security precautions and contributing guidelines to encourage community involvement.
- Introduced a detailed roadmap document outlining the development plans leading to the v1.0.0 release.
- Defined versioning and release policies, including semantic versioning and patch release protocols.
- Outlined minor release plans for upcoming features such as alerting, cost tracking, authentication, and performance improvements.
- Included a visualization of the roadmap and prioritized features by version, enhancing clarity for stakeholders.
- Documented risks and mitigation strategies to address potential challenges during development.
docs: add comprehensive roadmap for LLM Quality Observer development
- Enhanced the README to reflect the new features introduced in version 0.6.0, including Alertmanager integration and advanced analysis API.
- Updated the current version information and improved the service components section to include Alertmanager and its functionalities.
- Revised the monitoring section to highlight the integration of Alertmanager with Slack, Discord, and Email for advanced alerting capabilities.
- Revised the README to include new features from version 0.6.0, such as Alertmanager integration and advanced analytics API.
- Updated the current version information and enhanced the service components section to reflect the addition of Alertmanager and its functionalities.
- Modified the monitoring section to emphasize Alertmanager's integration with Slack, Discord, and Email for improved alerting capabilities.
- Introduced a comprehensive guide for the new Analytics API features in version 0.6.0, including detailed documentation for the `/analytics/trends`, `/analytics/compare-models`, and `/alerts/history` endpoints.
- Included query parameters, response schemas, and usage examples to facilitate user understanding and implementation.
- Enhanced the documentation with performance considerations and error handling guidelines for improved usability.
- Introduced two new Grafana dashboards: Advanced Analytics and Alert History.
- The Advanced Analytics dashboard includes various panels for monitoring quality scores, request rates, latency, and error rates, providing insights into model performance.
- The Alert History dashboard focuses on alert monitoring, displaying currently firing alerts, total active alerts, and alert frequency, enhancing visibility into system health.
- Updated Prometheus configuration to integrate Alertmanager and added alert rules for HTTP and LLM metrics, improving alerting capabilities.
…omparisons

- Added `/analytics/trends` endpoint to provide hourly breakdowns of quality trends, including average scores, latency, and error rates.
- Introduced `/analytics/compare-models` endpoint for detailed performance comparisons between models over a specified period, including success rates and latency percentiles.
- Implemented `/alerts/history` endpoint to retrieve and paginate alert history from Prometheus, enhancing monitoring capabilities.
- Updated schemas to support new response models for analytics and alert history.
- Introduced a detailed guide for the newly added Alert History & Monitoring and Advanced Analytics dashboards in Grafana.
- The guide includes an overview, panel configurations, usage scenarios, and metric requirements for each dashboard, enhancing user understanding and usability.
- Updated to reflect the latest features and functionalities available in version 0.6.0, providing clear instructions for effective monitoring and analysis.
- Updated the current version to v0.6.0 and revised the last updated date to January 2, 2026.
- Marked the completion of development for v0.6.0, highlighting the addition of advanced alerting and analytics features.
- Included checkmarks for completed major features such as Prometheus Alertmanager integration, advanced analytics capabilities, API improvements, and dashboard enhancements.
- Deferred technical debt resolutions to v0.7.0, ensuring clarity on future development priorities.
- Added a reference to the release notes for v0.6.0 for detailed feature descriptions.
…tification

- Introduced a new `alertmanager.yml` file to configure alert routing and notification settings.
- Defined global settings, including resolve timeout and default receiver.
- Established routing rules for critical, warning, and specific alerts, directing them to appropriate receivers.
- Configured receivers for critical alerts, warning alerts, operations team, and quality team, with placeholders for webhook and email configurations.
- Added inhibition rules to prevent duplicate alerts based on severity, enhancing alert management capabilities.
- Introduced a new README.md file for the Alertmanager configuration, detailing file structure, quick start instructions, and configuration components.
- Included sections on setting up webhook URLs for Slack and Discord, email configuration, and testing procedures.
- Provided guidelines for monitoring, troubleshooting, and security considerations related to Alertmanager, enhancing user understanding and implementation.
- Introduced a new README.md file detailing the structure and configuration of Prometheus Alert Rules.
- Included sections for HTTP, LLM, evaluation, and system alerts, outlining alert names, severity levels, conditions, and descriptions.
- Provided guidelines for modifying alert thresholds, adjusting wait times, adding new alerts, and validating configurations.
- Enhanced user understanding of alert management and monitoring practices within the Prometheus ecosystem.
- Introduced a new Alertmanager service in the Docker Compose setup, enabling alert management and notification capabilities.
- Configured Alertmanager with necessary command options, volume mounts for configuration files, and defined dependencies on Prometheus.
- Added a new volume for Alertmanager data to ensure persistent storage.
- Updated the Prometheus service to include a volume for alert configurations, enhancing overall monitoring setup.
- Introduced comprehensive release notes detailing the new features and enhancements in version 0.6.0, focusing on advanced alerting and analytics capabilities.
- Highlighted key features such as Prometheus Alertmanager integration, comprehensive alert rules across multiple categories, and new analytics API endpoints.
- Documented new Grafana dashboards for monitoring and analytics, along with configuration changes and upgrade instructions.
- Included performance metrics, security notes, and a roadmap for future development, ensuring users are well-informed about the latest updates and best practices.
- Introduced a detailed testing guide for LLM Quality Observer v0.6.0, outlining systematic testing procedures for new features and enhancements.
- Included sections on system requirements, basic validation, Alertmanager and Alert Rules testing, new API endpoint testing, Grafana dashboard verification, and integration scenarios.
- Provided performance testing guidelines and troubleshooting tips to ensure effective testing and validation of the system.
- Enhanced user understanding of the testing process and best practices for ensuring system reliability and performance.
- Introduced a new script to quickly validate core functionalities of version 0.6.0, including container status checks, service health checks, alert rules verification, and API endpoint testing.
- Implemented detailed logging for test results, including success and failure messages, to enhance troubleshooting and monitoring.
- The script covers performance checks and Grafana dashboard accessibility, ensuring comprehensive validation before production deployment.
- Aimed at streamlining the testing process and improving user confidence in system reliability.
- Added various badges for release status, license, stars, Docker readiness, Prometheus integration, and Grafana dashboards to both README.md and docs/README-main-us.md.
- Enhanced the project description to highlight its purpose as a production-ready MLOps platform for LLM quality monitoring.
Phase 1: Basic token tracking and cost calculation

Changes:
- Add token usage fields to llm_logs table (input/output/cached/reasoning)
- Add cost tracking fields (cost_input_usd, cost_output_usd, cost_total_usd)
- Create llm_model_pricing table for model pricing configuration
- Implement cost calculation logic in cost_utils.py
- Update llm_client.py to extract token information from API responses
- Modify /chat endpoint to save token and cost data
- Add architecture design document for v0.7.0
- Add AI agent documentation (agent/ directory)

Database:
- Migration script: scripts/migrate_v0.7.0.sql
- New table: llm_model_pricing with 7 initial models
- New columns in llm_logs: 8 fields (5 token + 3 cost)

API Changes:
- ChatResponse now includes usage and cost fields (optional)
- Backwards compatible: existing API calls work without changes

Docs:
- docs/ARCHITECTURE_v0.7.0.md: Complete v0.7.0 design document
- agent/: AI agent documentation for Claude Code and OpenAI Codex

Related:
- Issue: Cost tracking feature request
- Roadmap: v0.7.0 milestone
feat(v0.7.0): implement cost tracking and token usage
@dongkoony dongkoony merged commit 6c22f38 into docs/readme-v1 Jan 20, 2026
5 of 6 checks passed
dongkoony added a commit that referenced this pull request Jan 20, 2026
Merge pull request #10 from dongkoony/main
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant