- by dongkoony · Pull Request #10 · dongkoony/LLM-Quality-Observer

dongkoony · 2026-01-02T11:25:05Z

…environment configuration

- Updated Docker configuration to point to the new app directory. - Changed the port mapping for the dashboard service. - Added a new .dockerignore file to exclude unnecessary files. - Implemented FastAPI endpoints for health checks and metrics summary. - Introduced Pydantic models for response schemas. - Enhanced database interaction with SQLAlchemy and added settings management. - Created a Dockerfile for the dashboard service with necessary dependencies. - Refactored existing code to improve structure and readability.

… integration - Added a new schema for evaluation results using Pydantic. - Implemented a basic rule-based evaluation function to assess LLM responses. - Updated FastAPI endpoints to support evaluation logic and improved health check response. - Refactored models to include new fields for evaluation results and adjusted database interactions. - Enhanced .gitignore to include local configuration files.

Feat/dashboard v1

…localization support - Added a new API endpoint to fetch time series data for LLM metrics over a specified number of days. - Introduced new Pydantic models for time series data representation. - Enhanced the dashboard layout to include a language switcher for localization. - Updated the overview component to display time series data with charts for quality trends, latency, and request volume. - Integrated translation support for multiple languages in the dashboard.

- Enhanced web dashboard description to include new statistics cards, model performance charts, and recent activity previews. - Updated API endpoint details to reflect pagination support for logs and evaluations. - Improved clarity on model performance comparison features.

Feat/dashboard v1

…n support - Enhanced the overview section with updated statistics and time series charts. - Added multilingual support with a language selection dropdown and localStorage integration. - Clarified API endpoint details, including the addition of a new time series data endpoint.

- Integrated translation functionality into Evaluations, Logs, and Models components. - Replaced static text with localized strings for titles, subtitles, and error messages. - Enhanced user experience by providing multilingual support for various UI elements.

…e evaluation model - Added support for LLM-as-a-Judge evaluation in the /evaluate-once endpoint, allowing for dynamic evaluation methods based on the judge_type parameter. - Updated LLMEvaluation model to include optional fields for LLM-specific scores and raw judge responses. - Improved error handling for evaluation failures, ensuring database rollback on exceptions.

- Added fields for score_instruction_following and score_truthfulness to the EvaluationRead model. - Updated the evaluation endpoint to return new scoring metrics and raw judge responses, enhancing the evaluation data provided.

…oring metrics - Introduced functions to determine judge type and display corresponding badges in the evaluations table. - Added new columns for score instruction following and score truthfulness in the evaluations table. - Updated localization files to include translations for new labels and judge types.

PR

Merge pull request #4 from dongkoony/main

- Introduced new settings for automatic evaluation, including evaluation interval, batch size, and judge type. - Added configuration options for Slack and Discord notifications, including webhook URLs and score thresholds.

- Added an async context manager for managing the lifespan of the FastAPI app, including starting and stopping a scheduler. - Integrated logging to track the service's startup and shutdown processes. - Removed redundant table creation code from the app startup as it is now handled in the lifespan context.

- Implemented a new notifier module to send alerts via Slack and Discord for low-quality evaluation results. - Added functions to send notifications for low-quality alerts and batch evaluation summaries. - Integrated logging for successful and failed notification attempts.

- Introduced a new scheduler module using APScheduler to automate the evaluation of LLM logs at specified intervals. - Implemented functions for running batch evaluations, handling pending logs, and sending notifications for low-quality evaluations and summaries. - Integrated logging for tracking evaluation processes and errors.

- Introduced a new utility function `get_pending_logs` to fetch LLM logs that are yet to be evaluated. - The function filters logs by status, excludes already evaluated logs, and sorts them by creation date. - Supports a configurable limit for the number of logs returned.

- Introduced health check endpoint tests for both the evaluator and gateway API services. - Each test verifies that the health endpoint returns a 200 OK status and the expected JSON response. - Added initialization files for test organization in both services.

- Introduced a new .flake8 configuration file to enforce coding standards. - Set maximum line length to 127 and specified ignored error codes. - Excluded common directories and files from linting to streamline the process.

- Introduced a new GitHub Actions CI workflow to automate linting, building, and testing for the services. - Configured linting with flake8 and formatting checks with black for code quality enforcement. - Set up Docker build tests for gateway-api, evaluator, and dashboard services. - Added unit test placeholders for Gateway API and Evaluator services. - Included frontend build checks for the dashboard service using Node.js and TypeScript.

- Added OPENAI_MODEL_JUDGE setting for specifying the judge model. - Introduced batch evaluation scheduler settings, including ENABLE_AUTO_EVALUATION, EVALUATION_INTERVAL_MINUTES, EVALUATION_BATCH_SIZE, and EVALUATION_JUDGE_TYPE. - Included NOTIFICATION_SCORE_THRESHOLD for setting notification criteria.

- Added `apscheduler` for scheduling tasks and `httpx` for making HTTP requests. - These dependencies enhance the functionality of the evaluator service for batch evaluations and external API interactions.

Feat/dashboard v1

…ty alerts - Added a new `/metrics` endpoint for Prometheus to expose service metrics. - Integrated low-quality alert notifications by invoking `send_low_quality_alert` after each evaluation. - Committed evaluation results to the database immediately after processing each log.

- Introduced a new JSON configuration file for the LLM Quality Observer dashboard in Grafana. - Configured multiple panels to visualize metrics from Prometheus, including HTTP request rates, evaluation rates, and notification rates. - Enhanced monitoring capabilities with detailed visualizations for LLM requests, evaluation scores, and scheduler runs. - Set refresh interval and added relevant tags for better organization and accessibility.

…ation - Introduced a detailed guide for setting up email notifications in the LLM Quality Observer system, covering prerequisites, configuration, and testing procedures. - Included specific setup instructions for popular SMTP providers like Gmail, Microsoft 365, SendGrid, AWS SES, and Mailgun. - Documented environment variables required for email notifications and provided troubleshooting tips for common issues. - Added a metrics reference section to monitor email notification performance and delivery rates, enhancing observability and support for email-related metrics.

…tings - Added example environment variables for SMTP configuration to facilitate email notifications. - Included placeholders for SMTP host, port, username, password, and recipient emails to guide users in setting up email notifications.

- Introduced a new dashboard overview image to improve the visual representation of the LLM Quality Observer dashboard in Grafana. - This addition aims to provide users with a clearer understanding of the dashboard layout and metrics displayed.

- Introduced comprehensive documentation for the LLM Quality Observer Grafana dashboard in both Korean and English. - The guides cover dashboard access, structure, performance metrics, quality tracking, and troubleshooting steps, enhancing user understanding and usability. - Included detailed PromQL query explanations and usage tips for effective monitoring and evaluation.

…tion

- Enhanced the README with a comprehensive overview of the LLM-Quality-Observer project, including its purpose, key features, and architecture. - Updated service components section to reflect the current architecture and added detailed descriptions of each service. - Improved installation and usage instructions, including environment variable configurations and service verification steps. - Added a roadmap section outlining completed features and future plans for the project. - Included security precautions and contributing guidelines to encourage community involvement.

Feat/dashboard v1

- Introduced a detailed roadmap document outlining the development plans leading to the v1.0.0 release. - Defined versioning and release policies, including semantic versioning and patch release protocols. - Outlined minor release plans for upcoming features such as alerting, cost tracking, authentication, and performance improvements. - Included a visualization of the roadmap and prioritized features by version, enhancing clarity for stakeholders. - Documented risks and mitigation strategies to address potential challenges during development.

docs: add comprehensive roadmap for LLM Quality Observer development

- Enhanced the README to reflect the new features introduced in version 0.6.0, including Alertmanager integration and advanced analysis API. - Updated the current version information and improved the service components section to include Alertmanager and its functionalities. - Revised the monitoring section to highlight the integration of Alertmanager with Slack, Discord, and Email for advanced alerting capabilities.

- Revised the README to include new features from version 0.6.0, such as Alertmanager integration and advanced analytics API. - Updated the current version information and enhanced the service components section to reflect the addition of Alertmanager and its functionalities. - Modified the monitoring section to emphasize Alertmanager's integration with Slack, Discord, and Email for improved alerting capabilities.

- Introduced a comprehensive guide for the new Analytics API features in version 0.6.0, including detailed documentation for the `/analytics/trends`, `/analytics/compare-models`, and `/alerts/history` endpoints. - Included query parameters, response schemas, and usage examples to facilitate user understanding and implementation. - Enhanced the documentation with performance considerations and error handling guidelines for improved usability.

- Introduced two new Grafana dashboards: Advanced Analytics and Alert History. - The Advanced Analytics dashboard includes various panels for monitoring quality scores, request rates, latency, and error rates, providing insights into model performance. - The Alert History dashboard focuses on alert monitoring, displaying currently firing alerts, total active alerts, and alert frequency, enhancing visibility into system health. - Updated Prometheus configuration to integrate Alertmanager and added alert rules for HTTP and LLM metrics, improving alerting capabilities.

…omparisons - Added `/analytics/trends` endpoint to provide hourly breakdowns of quality trends, including average scores, latency, and error rates. - Introduced `/analytics/compare-models` endpoint for detailed performance comparisons between models over a specified period, including success rates and latency percentiles. - Implemented `/alerts/history` endpoint to retrieve and paginate alert history from Prometheus, enhancing monitoring capabilities. - Updated schemas to support new response models for analytics and alert history.

- Introduced a detailed guide for the newly added Alert History & Monitoring and Advanced Analytics dashboards in Grafana. - The guide includes an overview, panel configurations, usage scenarios, and metric requirements for each dashboard, enhancing user understanding and usability. - Updated to reflect the latest features and functionalities available in version 0.6.0, providing clear instructions for effective monitoring and analysis.

- Updated the current version to v0.6.0 and revised the last updated date to January 2, 2026. - Marked the completion of development for v0.6.0, highlighting the addition of advanced alerting and analytics features. - Included checkmarks for completed major features such as Prometheus Alertmanager integration, advanced analytics capabilities, API improvements, and dashboard enhancements. - Deferred technical debt resolutions to v0.7.0, ensuring clarity on future development priorities. - Added a reference to the release notes for v0.6.0 for detailed feature descriptions.

…tification - Introduced a new `alertmanager.yml` file to configure alert routing and notification settings. - Defined global settings, including resolve timeout and default receiver. - Established routing rules for critical, warning, and specific alerts, directing them to appropriate receivers. - Configured receivers for critical alerts, warning alerts, operations team, and quality team, with placeholders for webhook and email configurations. - Added inhibition rules to prevent duplicate alerts based on severity, enhancing alert management capabilities.

- Introduced a new README.md file for the Alertmanager configuration, detailing file structure, quick start instructions, and configuration components. - Included sections on setting up webhook URLs for Slack and Discord, email configuration, and testing procedures. - Provided guidelines for monitoring, troubleshooting, and security considerations related to Alertmanager, enhancing user understanding and implementation.

- Introduced a new README.md file detailing the structure and configuration of Prometheus Alert Rules. - Included sections for HTTP, LLM, evaluation, and system alerts, outlining alert names, severity levels, conditions, and descriptions. - Provided guidelines for modifying alert thresholds, adjusting wait times, adding new alerts, and validating configurations. - Enhanced user understanding of alert management and monitoring practices within the Prometheus ecosystem.

- Introduced a new Alertmanager service in the Docker Compose setup, enabling alert management and notification capabilities. - Configured Alertmanager with necessary command options, volume mounts for configuration files, and defined dependencies on Prometheus. - Added a new volume for Alertmanager data to ensure persistent storage. - Updated the Prometheus service to include a volume for alert configurations, enhancing overall monitoring setup.

- Introduced comprehensive release notes detailing the new features and enhancements in version 0.6.0, focusing on advanced alerting and analytics capabilities. - Highlighted key features such as Prometheus Alertmanager integration, comprehensive alert rules across multiple categories, and new analytics API endpoints. - Documented new Grafana dashboards for monitoring and analytics, along with configuration changes and upgrade instructions. - Included performance metrics, security notes, and a roadmap for future development, ensuring users are well-informed about the latest updates and best practices.

- Introduced a detailed testing guide for LLM Quality Observer v0.6.0, outlining systematic testing procedures for new features and enhancements. - Included sections on system requirements, basic validation, Alertmanager and Alert Rules testing, new API endpoint testing, Grafana dashboard verification, and integration scenarios. - Provided performance testing guidelines and troubleshooting tips to ensure effective testing and validation of the system. - Enhanced user understanding of the testing process and best practices for ensuring system reliability and performance.

- Introduced a new script to quickly validate core functionalities of version 0.6.0, including container status checks, service health checks, alert rules verification, and API endpoint testing. - Implemented detailed logging for test results, including success and failure messages, to enhance troubleshooting and monitoring. - The script covers performance checks and Grafana dashboard accessibility, ensuring comprehensive validation before production deployment. - Aimed at streamlining the testing process and improving user confidence in system reliability.

Feat/dashboard v1

- Added various badges for release status, license, stars, Docker readiness, Prometheus integration, and Grafana dashboards to both README.md and docs/README-main-us.md. - Enhanced the project description to highlight its purpose as a production-ready MLOps platform for LLM quality monitoring.

Phase 1: Basic token tracking and cost calculation Changes: - Add token usage fields to llm_logs table (input/output/cached/reasoning) - Add cost tracking fields (cost_input_usd, cost_output_usd, cost_total_usd) - Create llm_model_pricing table for model pricing configuration - Implement cost calculation logic in cost_utils.py - Update llm_client.py to extract token information from API responses - Modify /chat endpoint to save token and cost data - Add architecture design document for v0.7.0 - Add AI agent documentation (agent/ directory) Database: - Migration script: scripts/migrate_v0.7.0.sql - New table: llm_model_pricing with 7 initial models - New columns in llm_logs: 8 fields (5 token + 3 cost) API Changes: - ChatResponse now includes usage and cost fields (optional) - Backwards compatible: existing API calls work without changes Docs: - docs/ARCHITECTURE_v0.7.0.md: Complete v0.7.0 design document - agent/: AI agent documentation for Claude Code and OpenAI Codex Related: - Issue: Cost tracking feature request - Roadmap: v0.7.0 milestone

feat(v0.7.0): implement cost tracking and token usage

Merge pull request #10 from dongkoony/main

dongkoony and others added 30 commits November 17, 2025 15:58

fix: update links in README files for consistency

21075f2

chore: update .gitignore to include .env.local and add example local …

c033661

…environment configuration

Merge pull request #1 from dongkoony/feat/dashboard-v1

dba69b1

Feat/dashboard v1

feat: add web dashboard and dashboard API endpoints for v0.2.0

a7cf208

feat: add CORS middleware to enable API access from web dashboard

f3bca05

Merge pull request #2 from dongkoony/feat/dashboard-v1

c3b2c02

Feat/dashboard v1

Merge pull request #3 from dongkoony/feat/dashboard-v1

cbedba4

Feat/dashboard v1

Merge pull request #4 from dongkoony/main

5675b8c

PR

Merge pull request #5 from dongkoony/feat/dashboard-v1

a28c7ee

Merge pull request #4 from dongkoony/main

feat: add flake8 configuration for code style enforcement

fd63813

- Introduced a new .flake8 configuration file to enforce coding standards. - Set maximum line length to 127 and specified ignored error codes. - Excluded common directories and files from linting to streamline the process.

feat: add new dependencies for scheduling and HTTP requests

a450eb4

- Added `apscheduler` for scheduling tasks and `httpx` for making HTTP requests. - These dependencies enhance the functionality of the evaluator service for batch evaluations and external API interactions.

Merge pull request #6 from dongkoony/feat/dashboard-v1

70cddb0

Feat/dashboard v1

dongkoony and others added 28 commits December 26, 2025 16:31

docs: add complete release notes (v0.1.0-v0.5.0) and update documenta…

8397273

…tion

Merge pull request #7 from dongkoony/feat/dashboard-v1

2ecce66

Feat/dashboard v1

Merge pull request #8 from dongkoony/feat/dashboard-v1

f58cd0a

docs: add comprehensive roadmap for LLM Quality Observer development

Merge pull request #9 from dongkoony/feat/dashboard-v1

7e3b261

Feat/dashboard v1

Merge pull request #11 from dongkoony/feat/cost-tracking-v0.7.0

5f79046

feat(v0.7.0): implement cost tracking and token usage

dongkoony merged commit 6c22f38 into docs/readme-v1 Jan 20, 2026
5 of 6 checks passed

dongkoony added a commit that referenced this pull request Jan 20, 2026

Merge pull request #12 from dongkoony/docs/readme-v1

2a0c6e9

Merge pull request #10 from dongkoony/main

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

-#10

-#10
dongkoony merged 68 commits into
docs/readme-v1from
main

dongkoony commented Jan 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dongkoony commented Jan 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant