This guide covers how to handle operational issues that require manual intervention. The system is designed to heal itself from minor glitches (retries), but some issues demand a human touch.
Status: Partially Automated (3 Retries) Symptom: "Invalid response from LLM call" or "ValueError" in logs. Action:
- Check Logs: Go to
logs/market_rover.log. - Verify Quota: Ensure your Gemini API key hasn't hit its rate limit or monthly quota.
- Manual Restart: If retries fail 3 times, the API might be down or blocked.
- Wait 15 minutes.
- Restart the server:
python app.py(or restart the Docker container).
Status: Manual Fix Required Symptom: "cannot import name 'xyz'" or "ModuleNotFoundError". Action:
- Rebuild Environment: Dependencies might be out of sync.
# Windows deactivate rm -r .venv python -m venv .venv .\.venv\Scripts\Activate pip install -r requirements.txt
- Check Deployment: If on Streamlit Cloud, check
packages.txtandrequirements.txt.
Status: Manual Fix Required Symptom: "File not found: Portfolio.csv" Action:
- Upload Data: Ensure
Portfolio.csvis present in the root directory. - Format Check: Ensure it has columns:
Symbol,Qty,Avg Price.
Status: Automated Warning / Retry
Symptom: "
- No action required usually; the system retries automatically (3 times).
- If persistent (>24 hours), check
logs/market_rover.logfor "API Down" patterns. - Mine Logs: Run
python scripts/mine_logs.pyto see failure timestamps and frequency.
Status: GitHub Actions / Dependabot Symptom: "Daily Report" didn't post, or Dependabot PR didn't merge. Action:
- Check Actions Tab: Go to
Actions->Daily Issue ReportorDependabot Automationto see the failure log. - Dependabot:
- If auto-merge failed, check if the "checks" passed (e.g. tests).
- Manually merge if it's a safe update.
- Backtest:
- If data is missing for
batch_backtester.py, verifyyfinanceis up.
- If data is missing for
Status: Managed by SRE Support Sentinel (Autonomous Response)
Symptom: GitHub Actions red-dot on main or HIL-Rover.
Safeguard:
- Pre-Flight Integrity Check: Every build starts with
scripts/build_integrity_check.py. - SRE Agent Escalation: If a build fails, the SRE Support Sentinel (Gemini-powered) analyzes the logs and proposes a remediation to the HIL Dashboard.
π Hard Lesson Learned (Payload Timeouts):
- The Incident: HIL-Rover (#22) failed to deploy repeatedly because
gcloud builds submit .uploaded over 1.5 GB of historical log files andnode_modulesfrom the root directory to Cloud Build, triggering massive timeouts. - The Rule: Never deploy without a hardened
.gcloudignoreand.dockerignore. Always verify that these files are explicitly blocking*.log,ci_log*.txt,.venv, andnode_modules/.
Action (Developer):
- Run Integrity Check Locally:
python scripts/build_integrity_check.pyto confirm fixing the regression before pushing. - Verify Payload Size: Ensure no new massive data/log files are being accidentally tracked or uploaded to the build context.
- Review HIL Dashboard: Approve SRE-proposed code or infrastructure fixes.
Status: Manual Fix Required Symptom: "The user-provided container failed to start and listen on the port... PORT=8080" Common Causes:
- ModuleNotFoundError: (Python) Occurs if
__init__.pyfiles are missing in parent directories, preventing the server (e.g., Uvicorn) from importing the application.- Fix: Add empty
__init__.pyfiles tobackend/andsrc/folders of the microservice.
- Fix: Add empty
- Hardcoded Port: Container listens on a port other than 8080.
- Fix: Ensure
uvicorn(Python) ornode(JS) is bound to0.0.0.0:8080.
- Fix: Ensure
- Missing Node Engine: (Node.js) Occurs if the required Node version is not matched by the Docker base image.
- Fix: Update
FROM node:XX-alpinein the microserviceDockerfile.
- Fix: Update
Please note what is NOT covered by the automated retry system:
- Streamlit UI Crashes: If the web page freezes or shows a big red traceback box, that is a UI error. You must refresh the page (
F5). - External CI/CD: Errors in GitHub Actions or Docker deployment pipelines are outside this application's control. Check the GitHub "Actions" tab.
- Infrastructure: If the server runs out of memory (OOM) or disk space, the application will crash. This requires system-level monitoring.
If you encounter a new bug, please log it using this template.
Tip: Run
python scripts/mine_logs.pyfirst to extract recent error messages from the logs automatically.
Bug Report Template:
**Date**: YYYY-MM-DD
**Component**: (e.g., News Scraper, LLM, Web UI)
**Error Message**: (Paste the traceback here)
**Steps to Reproduce**:
1.
2.
**Context**: (e.g., Was the market closed? Was VPN on?)
---
## π§ AI Training (Monthly Cycle)
To improve the Agent's accuracy without waiting for organic user activity, run this **Mental Calibration Cycle** once a month.
### Step 1: Feed the Brain (Day 1)
Run the training script to analyze Nifty 50 stocks and generate new predictions.
```powershell
python scripts/train_brain.py- What it does: Overwrites
Portfolio.csvwith top 20 Nifty stocks, runs the full analysis, and saves "Buy/Sell" signals todata/memory.json.
After market movement has occurred (at least 3 days later), run the validator.
python scripts/validate_outcomes.py- What it does: Checks Yahoo Finance for actual price changes. Updates the memory with "Success" or "Fail".
- Result: Agents will see these outcomes in their next run and self-correct (e.g., "I failed on Banking stocks last month, I should be cautious").