Conversation
… - Update Dockerfile and build scripts - Refactor API endpoints and scoring modules - Add latency testing and YouTube client utilities - Remove deprecated files and update dependencies - Prepare for local model caching feature
- Migrate from deprecated google-generativeai to google-genai SDK - Update all agents (Auditor, Coach) to use new Client API - Fix transcript_service.py to use stable get_transcript() API - Fix NumPy version constraint (<2.0) for ChromaDB compatibility - Add comprehensive architecture documentation with Mermaid diagrams - Add transcript approach change documentation - Disable Redis caching (connection issues) - Add test scripts for transcript functionality New files: - auditor_agent.py, coach_agent.py, librarian_agent.py - transcript_service.py (simplified, stable API) - ARCHITECTURE_DOCUMENTATION.md (comprehensive system docs) - TRANSCRIPT_APPROACH_CHANGE.md (implementation summary) - test_transcript.py, test_transcript_service.py
Project Reorganization: - Renamed folder from "YouTube Productivity Score Development Container" to "backend" - Created docs/ folder for all documentation - Created scripts/ folder for test and utility scripts - Created tests/ folder for future test suites - Removed obsolete files (old test scripts, model pickles, logs) - Updated .gitignore with cleaner professional structure Cleanup: - Deleted old test_*.py files (moved relevant ones to scripts/) - Removed debug_*.py, verify_*.py (moved to scripts/) - Removed score_model_*.pkl files (build artifacts) - Removed __pycache__, logs, temporary files - Fixed ARCHITECTURE_DOCUMENTATION.md quadrantChart syntax Documentation: - Kept key docs in root: README, ARCHITECTURE_DOCUMENTATION, AGENTS_IMPLEMENTATION_SUMMARY - Moved supplementary docs to docs/ folder - All documentation now properly organized This follows industry-standard monorepo practices for better maintainability.
- Added .cursorrules file with backend-specific AI coding guidelines - Added changelogs directory with workflow documentation - See changelogs/2026-01-22-cursor-rules-mcp-git-workflow.md
- Added coach modes: strict, balanced, relaxed, custom - User can provide custom instructions to the coach - Break reminders after configurable time (default 1 hour) - "Back on track" encouragement when user improves after distraction - Score trend detection with declining focus warnings - Comment analysis capability for video quality assessment - Watch time tracking integration - Mode-specific intervention thresholds - Session summary with performance metrics This makes the coach more contextual and user-controlled.
Persistent Storage: - Added firebase-admin and google-cloud-firestore to requirements - Created firestore_service.py with full Firestore integration - Highlights saved to Firestore collection - ChromaDB backup/restore to Google Cloud Storage - Session and video metadata storage New API Endpoints: - POST /highlights - Save a highlight - GET /highlights - Get all highlights - GET /highlights/video/<id> - Get highlights for a video - DELETE /highlights/<id> - Delete a highlight - POST /backup/chromadb - Backup to GCS - POST /restore/chromadb - Restore from GCS This enables persistent storage across Cloud Run container restarts.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 7 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
This PR is being reviewed by Cursor Bugbot
Details
You are on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle.
To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.
| analysis = coach.analyze_session( | ||
| session_id=session_id, | ||
| session_data=session_data, | ||
| goal=goal |
There was a problem hiding this comment.
CoachAgent missing analyze_session method causes runtime error
High Severity
The /coach/analyze endpoint calls coach.analyze_session(), but the CoachAgent class in coach_agent.py does not define an analyze_session method. This will raise an AttributeError at runtime, making the entire coach analysis endpoint non-functional. The available methods in CoachAgent are start_session, record_video, analyze_comments, update_watch_status, get_session_summary, and end_session.
Additional Locations (1)
| require_api_key() | ||
| try: | ||
| librarian = get_librarian_agent() | ||
| video = librarian.get_video_by_id(video_id) |
There was a problem hiding this comment.
LibrarianAgent missing get_video_by_id method causes failure
High Severity
The /librarian/video/<video_id> endpoint calls librarian.get_video_by_id(video_id), but LibrarianAgent does not implement this method. This will raise an AttributeError at runtime, breaking the video retrieval endpoint.
Additional Locations (1)
| require_api_key() | ||
| try: | ||
| librarian = get_librarian_agent() | ||
| highlights = librarian.get_all_highlights() |
There was a problem hiding this comment.
LibrarianAgent missing get_all_highlights method causes failure
High Severity
The /librarian/get_highlights endpoint calls librarian.get_all_highlights(), but LibrarianAgent does not implement this method. This will raise an AttributeError at runtime, breaking the highlights retrieval endpoint.
Additional Locations (1)
| # Reinitialize librarian to pick up restored data | ||
| from librarian_agent import LibrarianAgent | ||
| global _librarian_instance | ||
| _librarian_instance = LibrarianAgent() |
There was a problem hiding this comment.
Restore endpoint fails to properly reset librarian singleton
Medium Severity
The /restore/chromadb endpoint attempts to reinitialize the librarian singleton by declaring global _librarian_instance in api.py's namespace, but _librarian_instance is defined in librarian_agent.py. This creates a new variable in api.py rather than resetting the actual singleton, so the restored data won't be used.
| ENV PYTHONPATH=/app | ||
| ENV FLASK_APP=api.py | ||
| ENV FLASK_ENV=production | ||
| ENV FLASK_APP=api.py |
There was a problem hiding this comment.
Dockerfile duplicates FLASK_APP and removes FLASK_ENV
Low Severity
The Dockerfile now has a duplicate ENV FLASK_APP=api.py statement (line 31 and line 33), and ENV FLASK_ENV=production was removed. This appears to be an accidental edit that duplicated one line while removing another, leaving the production environment setting unset.
| print(f"Simple Title+CleanDesc Score - URL: {video_url}, Goal: '{goal}', Score: {final_score}") | ||
| return final_score No newline at end of file | ||
| score, _, _ = compute_simple_score(video_url, goal) | ||
| return score No newline at end of file |
There was a problem hiding this comment.
Scoring modes are functionally identical despite different names
Medium Severity
The compute_simple_score_from_title and compute_simple_score_title_and_clean_desc functions both call compute_simple_score(video_url, goal) with identical parameters. Despite mode names suggesting different behavior (title_only vs title_and_clean_desc), all modes use both title AND description for scoring. When users select "title_only" expecting faster scoring with just the title, they actually get full title+description analysis.
| video_url = data.get('video_url') | ||
| goal = data.get('goal') | ||
| mode = data.get('mode', 'title_and_description') # Default to "title_and_description" | ||
| transcript = data.get('transcript', '') |
There was a problem hiding this comment.
Missing transcript type validation causes TypeError crash
Medium Severity
The /score endpoint accepts transcript from JSON input without validating it's a string. If a client sends a non-string value like {"transcript": 123}, the integer passes through to _get_scoring_prompt where transcript[:2000] raises a TypeError: 'int' object is not subscriptable. The same issue exists in the /audit endpoint's transcript handling. While unlikely in normal usage, this missing type check causes an unhandled crash instead of a proper validation error.


Note
High Risk
Large changes to core API surface (routing, auth/error handling, rate limiting) plus new Firestore/Gemini integrations; misconfiguration or quota/permission issues could break scoring or new persistence features in production.
Overview
This PR pivots the backend into a multi-agent Flask API with new endpoints for
POST /score,POST /audit,POST /coach/analyze, and a Firestore-backed Librarian (/librarian/index|search|chat|stats|video/...) plus highlight CRUD and backup/restore operations.It introduces centralized env/config management via
Config, adds Flask-Limiter rate limiting, and replaces ad-hoc errors with standardized JSON error codes and global handlers;GET /healthis expanded to report dependency status (YouTube/Gemini/Firestore).Deployment/docs are updated accordingly: Docker now runs via Gunicorn,
deploy_to_cloud_run.shdeploys from source and passes API keys, secrets are better ignored via.gitignore, and multiple new/updated architecture/agent docs and changelogs are added while removing old model/test scripts and Cloud Build config.Written by Cursor Bugbot for commit e958e88. This will update automatically on new commits. Configure here.