Skip to content

Feature/rag dashboard#5

Merged
usnaveen merged 12 commits intomainfrom
feature/rag-dashboard
Feb 2, 2026
Merged

Feature/rag dashboard#5
usnaveen merged 12 commits intomainfrom
feature/rag-dashboard

Conversation

@usnaveen
Copy link
Copy Markdown
Owner

@usnaveen usnaveen commented Feb 2, 2026

Note

High Risk
Large changes to core API surface (routing, auth/error handling, rate limiting) plus new Firestore/Gemini integrations; misconfiguration or quota/permission issues could break scoring or new persistence features in production.

Overview
This PR pivots the backend into a multi-agent Flask API with new endpoints for POST /score, POST /audit, POST /coach/analyze, and a Firestore-backed Librarian (/librarian/index|search|chat|stats|video/...) plus highlight CRUD and backup/restore operations.

It introduces centralized env/config management via Config, adds Flask-Limiter rate limiting, and replaces ad-hoc errors with standardized JSON error codes and global handlers; GET /health is expanded to report dependency status (YouTube/Gemini/Firestore).

Deployment/docs are updated accordingly: Docker now runs via Gunicorn, deploy_to_cloud_run.sh deploys from source and passes API keys, secrets are better ignored via .gitignore, and multiple new/updated architecture/agent docs and changelogs are added while removing old model/test scripts and Cloud Build config.

Written by Cursor Bugbot for commit e958e88. This will update automatically on new commits. Configure here.

… - Update Dockerfile and build scripts - Refactor API endpoints and scoring modules - Add latency testing and YouTube client utilities - Remove deprecated files and update dependencies - Prepare for local model caching feature
- Migrate from deprecated google-generativeai to google-genai SDK
- Update all agents (Auditor, Coach) to use new Client API
- Fix transcript_service.py to use stable get_transcript() API
- Fix NumPy version constraint (<2.0) for ChromaDB compatibility
- Add comprehensive architecture documentation with Mermaid diagrams
- Add transcript approach change documentation
- Disable Redis caching (connection issues)
- Add test scripts for transcript functionality

New files:
- auditor_agent.py, coach_agent.py, librarian_agent.py
- transcript_service.py (simplified, stable API)
- ARCHITECTURE_DOCUMENTATION.md (comprehensive system docs)
- TRANSCRIPT_APPROACH_CHANGE.md (implementation summary)
- test_transcript.py, test_transcript_service.py
Project Reorganization:
- Renamed folder from "YouTube Productivity Score Development Container" to "backend"
- Created docs/ folder for all documentation
- Created scripts/ folder for test and utility scripts
- Created tests/ folder for future test suites
- Removed obsolete files (old test scripts, model pickles, logs)
- Updated .gitignore with cleaner professional structure

Cleanup:
- Deleted old test_*.py files (moved relevant ones to scripts/)
- Removed debug_*.py, verify_*.py (moved to scripts/)
- Removed score_model_*.pkl files (build artifacts)
- Removed __pycache__, logs, temporary files
- Fixed ARCHITECTURE_DOCUMENTATION.md quadrantChart syntax

Documentation:
- Kept key docs in root: README, ARCHITECTURE_DOCUMENTATION, AGENTS_IMPLEMENTATION_SUMMARY
- Moved supplementary docs to docs/ folder
- All documentation now properly organized

This follows industry-standard monorepo practices for better maintainability.
- Added .cursorrules file with backend-specific AI coding guidelines
- Added changelogs directory with workflow documentation
- See changelogs/2026-01-22-cursor-rules-mcp-git-workflow.md
- Added coach modes: strict, balanced, relaxed, custom
- User can provide custom instructions to the coach
- Break reminders after configurable time (default 1 hour)
- "Back on track" encouragement when user improves after distraction
- Score trend detection with declining focus warnings
- Comment analysis capability for video quality assessment
- Watch time tracking integration
- Mode-specific intervention thresholds
- Session summary with performance metrics

This makes the coach more contextual and user-controlled.
Persistent Storage:
- Added firebase-admin and google-cloud-firestore to requirements
- Created firestore_service.py with full Firestore integration
- Highlights saved to Firestore collection
- ChromaDB backup/restore to Google Cloud Storage
- Session and video metadata storage

New API Endpoints:
- POST /highlights - Save a highlight
- GET /highlights - Get all highlights
- GET /highlights/video/<id> - Get highlights for a video
- DELETE /highlights/<id> - Delete a highlight
- POST /backup/chromadb - Backup to GCS
- POST /restore/chromadb - Restore from GCS

This enables persistent storage across Cloud Run container restarts.
@usnaveen usnaveen merged commit c8a3d2c into main Feb 2, 2026
1 of 2 checks passed
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 7 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

This PR is being reviewed by Cursor Bugbot

Details

You are on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

Comment thread api.py
analysis = coach.analyze_session(
session_id=session_id,
session_data=session_data,
goal=goal
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CoachAgent missing analyze_session method causes runtime error

High Severity

The /coach/analyze endpoint calls coach.analyze_session(), but the CoachAgent class in coach_agent.py does not define an analyze_session method. This will raise an AttributeError at runtime, making the entire coach analysis endpoint non-functional. The available methods in CoachAgent are start_session, record_video, analyze_comments, update_watch_status, get_session_summary, and end_session.

Additional Locations (1)

Fix in Cursor Fix in Web

Comment thread api.py
require_api_key()
try:
librarian = get_librarian_agent()
video = librarian.get_video_by_id(video_id)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LibrarianAgent missing get_video_by_id method causes failure

High Severity

The /librarian/video/<video_id> endpoint calls librarian.get_video_by_id(video_id), but LibrarianAgent does not implement this method. This will raise an AttributeError at runtime, breaking the video retrieval endpoint.

Additional Locations (1)

Fix in Cursor Fix in Web

Comment thread api.py
require_api_key()
try:
librarian = get_librarian_agent()
highlights = librarian.get_all_highlights()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LibrarianAgent missing get_all_highlights method causes failure

High Severity

The /librarian/get_highlights endpoint calls librarian.get_all_highlights(), but LibrarianAgent does not implement this method. This will raise an AttributeError at runtime, breaking the highlights retrieval endpoint.

Additional Locations (1)

Fix in Cursor Fix in Web

Comment thread api.py
# Reinitialize librarian to pick up restored data
from librarian_agent import LibrarianAgent
global _librarian_instance
_librarian_instance = LibrarianAgent()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Restore endpoint fails to properly reset librarian singleton

Medium Severity

The /restore/chromadb endpoint attempts to reinitialize the librarian singleton by declaring global _librarian_instance in api.py's namespace, but _librarian_instance is defined in librarian_agent.py. This creates a new variable in api.py rather than resetting the actual singleton, so the restored data won't be used.

Fix in Cursor Fix in Web

Comment thread Dockerfile
ENV PYTHONPATH=/app
ENV FLASK_APP=api.py
ENV FLASK_ENV=production
ENV FLASK_APP=api.py
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dockerfile duplicates FLASK_APP and removes FLASK_ENV

Low Severity

The Dockerfile now has a duplicate ENV FLASK_APP=api.py statement (line 31 and line 33), and ENV FLASK_ENV=production was removed. This appears to be an accidental edit that duplicated one line while removing another, leaving the production environment setting unset.

Fix in Cursor Fix in Web

Comment thread simple_scoring.py
print(f"Simple Title+CleanDesc Score - URL: {video_url}, Goal: '{goal}', Score: {final_score}")
return final_score No newline at end of file
score, _, _ = compute_simple_score(video_url, goal)
return score No newline at end of file
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scoring modes are functionally identical despite different names

Medium Severity

The compute_simple_score_from_title and compute_simple_score_title_and_clean_desc functions both call compute_simple_score(video_url, goal) with identical parameters. Despite mode names suggesting different behavior (title_only vs title_and_clean_desc), all modes use both title AND description for scoring. When users select "title_only" expecting faster scoring with just the title, they actually get full title+description analysis.

Fix in Cursor Fix in Web

Comment thread api.py
video_url = data.get('video_url')
goal = data.get('goal')
mode = data.get('mode', 'title_and_description') # Default to "title_and_description"
transcript = data.get('transcript', '')
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing transcript type validation causes TypeError crash

Medium Severity

The /score endpoint accepts transcript from JSON input without validating it's a string. If a client sends a non-string value like {"transcript": 123}, the integer passes through to _get_scoring_prompt where transcript[:2000] raises a TypeError: 'int' object is not subscriptable. The same issue exists in the /audit endpoint's transcript handling. While unlikely in normal usage, this missing type check causes an unhandled crash instead of a proper validation error.

Additional Locations (1)

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant