Feature/rag dashboard by usnaveen · Pull Request #5 · usnaveen/TubeFocus-Backend

usnaveen · 2026-02-02T13:47:43Z

Note

High Risk
Large changes to core API surface (routing, auth/error handling, rate limiting) plus new Firestore/Gemini integrations; misconfiguration or quota/permission issues could break scoring or new persistence features in production.

Overview
This PR pivots the backend into a multi-agent Flask API with new endpoints for POST /score, POST /audit, POST /coach/analyze, and a Firestore-backed Librarian (/librarian/index|search|chat|stats|video/...) plus highlight CRUD and backup/restore operations.

It introduces centralized env/config management via Config, adds Flask-Limiter rate limiting, and replaces ad-hoc errors with standardized JSON error codes and global handlers; GET /health is expanded to report dependency status (YouTube/Gemini/Firestore).

Deployment/docs are updated accordingly: Docker now runs via Gunicorn, deploy_to_cloud_run.sh deploys from source and passes API keys, secrets are better ignored via .gitignore, and multiple new/updated architecture/agent docs and changelogs are added while removing old model/test scripts and Cloud Build config.

^{Written by Cursor Bugbot for commit e958e88. This will update automatically on new commits. Configure here.}

… - Update Dockerfile and build scripts - Refactor API endpoints and scoring modules - Add latency testing and YouTube client utilities - Remove deprecated files and update dependencies - Prepare for local model caching feature

…ide keys

- Migrate from deprecated google-generativeai to google-genai SDK - Update all agents (Auditor, Coach) to use new Client API - Fix transcript_service.py to use stable get_transcript() API - Fix NumPy version constraint (<2.0) for ChromaDB compatibility - Add comprehensive architecture documentation with Mermaid diagrams - Add transcript approach change documentation - Disable Redis caching (connection issues) - Add test scripts for transcript functionality New files: - auditor_agent.py, coach_agent.py, librarian_agent.py - transcript_service.py (simplified, stable API) - ARCHITECTURE_DOCUMENTATION.md (comprehensive system docs) - TRANSCRIPT_APPROACH_CHANGE.md (implementation summary) - test_transcript.py, test_transcript_service.py

Project Reorganization: - Renamed folder from "YouTube Productivity Score Development Container" to "backend" - Created docs/ folder for all documentation - Created scripts/ folder for test and utility scripts - Created tests/ folder for future test suites - Removed obsolete files (old test scripts, model pickles, logs) - Updated .gitignore with cleaner professional structure Cleanup: - Deleted old test_*.py files (moved relevant ones to scripts/) - Removed debug_*.py, verify_*.py (moved to scripts/) - Removed score_model_*.pkl files (build artifacts) - Removed __pycache__, logs, temporary files - Fixed ARCHITECTURE_DOCUMENTATION.md quadrantChart syntax Documentation: - Kept key docs in root: README, ARCHITECTURE_DOCUMENTATION, AGENTS_IMPLEMENTATION_SUMMARY - Moved supplementary docs to docs/ folder - All documentation now properly organized This follows industry-standard monorepo practices for better maintainability.

- Added .cursorrules file with backend-specific AI coding guidelines - Added changelogs directory with workflow documentation - See changelogs/2026-01-22-cursor-rules-mcp-git-workflow.md

- Added coach modes: strict, balanced, relaxed, custom - User can provide custom instructions to the coach - Break reminders after configurable time (default 1 hour) - "Back on track" encouragement when user improves after distraction - Score trend detection with declining focus warnings - Comment analysis capability for video quality assessment - Watch time tracking integration - Mode-specific intervention thresholds - Session summary with performance metrics This makes the coach more contextual and user-controlled.

Persistent Storage: - Added firebase-admin and google-cloud-firestore to requirements - Created firestore_service.py with full Firestore integration - Highlights saved to Firestore collection - ChromaDB backup/restore to Google Cloud Storage - Session and video metadata storage New API Endpoints: - POST /highlights - Save a highlight - GET /highlights - Get all highlights - GET /highlights/video/<id> - Get highlights for a video - DELETE /highlights/<id> - Delete a highlight - POST /backup/chromadb - Backup to GCS - POST /restore/chromadb - Restore from GCS This enables persistent storage across Cloud Run container restarts.

cursor

Cursor Bugbot has reviewed your changes and found 7 potential issues.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

This PR is being reviewed by Cursor Bugbot

Details

You are on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

cursor · 2026-02-02T13:54:54Z

+        analysis = coach.analyze_session(
+            session_id=session_id,
+            session_data=session_data,
+            goal=goal


CoachAgent missing analyze_session method causes runtime error

High Severity

The /coach/analyze endpoint calls coach.analyze_session(), but the CoachAgent class in coach_agent.py does not define an analyze_session method. This will raise an AttributeError at runtime, making the entire coach analysis endpoint non-functional. The available methods in CoachAgent are start_session, record_video, analyze_comments, update_watch_status, get_session_summary, and end_session.

Additional Locations (1)

coach_agent.py#L15-L420

cursor · 2026-02-02T13:54:54Z

+    require_api_key()
+    try:
+        librarian = get_librarian_agent()
+        video = librarian.get_video_by_id(video_id)


LibrarianAgent missing get_video_by_id method causes failure

High Severity

The /librarian/video/<video_id> endpoint calls librarian.get_video_by_id(video_id), but LibrarianAgent does not implement this method. This will raise an AttributeError at runtime, breaking the video retrieval endpoint.

Additional Locations (1)

librarian_agent.py#L16-L237

cursor · 2026-02-02T13:54:54Z

+    require_api_key()
+    try:
+        librarian = get_librarian_agent()
+        highlights = librarian.get_all_highlights()


LibrarianAgent missing get_all_highlights method causes failure

High Severity

The /librarian/get_highlights endpoint calls librarian.get_all_highlights(), but LibrarianAgent does not implement this method. This will raise an AttributeError at runtime, breaking the highlights retrieval endpoint.

Additional Locations (1)

librarian_agent.py#L16-L237

cursor · 2026-02-02T13:54:54Z

+            # Reinitialize librarian to pick up restored data
+            from librarian_agent import LibrarianAgent
+            global _librarian_instance
+            _librarian_instance = LibrarianAgent()


Restore endpoint fails to properly reset librarian singleton

Medium Severity

The /restore/chromadb endpoint attempts to reinitialize the librarian singleton by declaring global _librarian_instance in api.py's namespace, but _librarian_instance is defined in librarian_agent.py. This creates a new variable in api.py rather than resetting the actual singleton, so the restored data won't be used.

cursor · 2026-02-02T13:54:54Z

 ENV PYTHONPATH=/app
 ENV FLASK_APP=api.py
-ENV FLASK_ENV=production
+ENV FLASK_APP=api.py


Dockerfile duplicates FLASK_APP and removes FLASK_ENV

Low Severity

The Dockerfile now has a duplicate ENV FLASK_APP=api.py statement (line 31 and line 33), and ENV FLASK_ENV=production was removed. This appears to be an accidental edit that duplicated one line while removing another, leaving the production environment setting unset.

cursor · 2026-02-02T13:54:54Z

-    print(f"Simple Title+CleanDesc Score - URL: {video_url}, Goal: '{goal}', Score: {final_score}")
-    return final_score 
+    score, _, _ = compute_simple_score(video_url, goal)
+    return score


Scoring modes are functionally identical despite different names

Medium Severity

The compute_simple_score_from_title and compute_simple_score_title_and_clean_desc functions both call compute_simple_score(video_url, goal) with identical parameters. Despite mode names suggesting different behavior (title_only vs title_and_clean_desc), all modes use both title AND description for scoring. When users select "title_only" expecting faster scoring with just the title, they actually get full title+description analysis.

cursor · 2026-02-02T13:54:54Z

        video_url = data.get('video_url')
        goal = data.get('goal')
        mode = data.get('mode', 'title_and_description')  # Default to "title_and_description"
+        transcript = data.get('transcript', '')


Missing transcript type validation causes TypeError crash

Medium Severity

The /score endpoint accepts transcript from JSON input without validating it's a string. If a client sends a non-string value like {"transcript": 123}, the integer passes through to _get_scoring_prompt where transcript[:2000] raises a TypeError: 'int' object is not subscriptable. The same issue exists in the /audit endpoint's transcript handling. While unlikely in normal usage, this missing type check causes an unhandled crash instead of a proper validation error.

Additional Locations (1)

simple_scoring.py#L17-L20

usnaveen added 12 commits August 12, 2025 20:44

refactor(backend): migrate to gemini api, remove local models

0414942

feat(firebase): add cloud functions setup for serverless deployment

69b62ec

fix(backend): switch to functions-framework for standard GCP deployment

6adc694

feat(security): return video metadata from backend to remove client-s…

412e397

…ide keys

chore: add cursor rules and changelog documentation

854d164

- Added .cursorrules file with backend-specific AI coding guidelines - Added changelogs directory with workflow documentation - See changelogs/2026-01-22-cursor-rules-mcp-git-workflow.md

Implement RAG chat and highlights endpoints

1d77f94

README changes

e958e88

usnaveen merged commit c8a3d2c into main Feb 2, 2026
1 of 2 checks passed

cursor bot reviewed Feb 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/rag dashboard#5

Feature/rag dashboard#5
usnaveen merged 12 commits intomainfrom
feature/rag-dashboard

usnaveen commented Feb 2, 2026 •

edited by cursor bot

Loading

Uh oh!

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Feb 2, 2026

Uh oh!

cursor bot Feb 2, 2026

Uh oh!

cursor bot Feb 2, 2026

Uh oh!

cursor bot Feb 2, 2026

Uh oh!

cursor bot Feb 2, 2026

Uh oh!

cursor bot Feb 2, 2026

Uh oh!

cursor bot Feb 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

usnaveen commented Feb 2, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

This PR is being reviewed by Cursor Bugbot

Uh oh!

cursor bot Feb 2, 2026

Choose a reason for hiding this comment

CoachAgent missing analyze_session method causes runtime error

Uh oh!

cursor bot Feb 2, 2026

Choose a reason for hiding this comment

LibrarianAgent missing get_video_by_id method causes failure

Uh oh!

cursor bot Feb 2, 2026

Choose a reason for hiding this comment

LibrarianAgent missing get_all_highlights method causes failure

Uh oh!

cursor bot Feb 2, 2026

Choose a reason for hiding this comment

Restore endpoint fails to properly reset librarian singleton

Uh oh!

cursor bot Feb 2, 2026

Choose a reason for hiding this comment

Dockerfile duplicates FLASK_APP and removes FLASK_ENV

Uh oh!

cursor bot Feb 2, 2026

Choose a reason for hiding this comment

Scoring modes are functionally identical despite different names

Uh oh!

cursor bot Feb 2, 2026

Choose a reason for hiding this comment

Missing transcript type validation causes TypeError crash

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

usnaveen commented Feb 2, 2026 •

edited by cursor bot

Loading

CoachAgent missing `analyze_session` method causes runtime error

LibrarianAgent missing `get_video_by_id` method causes failure

LibrarianAgent missing `get_all_highlights` method causes failure