GenAI-First Career Transformation | Production Code from Day 1 | Systematic 37-Month Journey
Current Stage: GenAI-First Data Analyst & AI Engineer (Stage 1 of 5)
Next Milestone: Land Data Analyst & AI Engineer role with GenAI integration skills
Ultimate Goal: Senior LLM Engineer building production AI Trading Assistant
Study Commitment: 25 hours/week systematic learning
📋 View Complete 37-Month Interactive Roadmap (v8.2) →
👔 For Recruiters / Hiring Managers:
- 💼 Production Projects → - Live ETL system + 7 production-grade projects ⭐ START HERE
- 🤖 GenAI-First Differentiation → - What sets this apart
- 📊 Complete Roadmap → - 37-month visualization
- 🔗 LinkedIn → - Professional background
🎓 For Fellow Learners:
- 🤖 AI Integration Strategy → - My AI stack and approach
- 📚 Repository Structure → - Course materials organization
- 💡 Learning Philosophy → - Core principles and approach
- 🛠️ Setup Guides → - Environment configuration
Most learning repositories: Tutorial completions and course exercises with no real-world application.
This repository demonstrates:
- ✅ Production system deployed - Live ETL pipeline saving $15K/year with public code
- ✅ 7 production-grade projects - From ETL foundations to RAG, Multimodal AI, and statistical research systems
- ✅ GenAI integration from Day 1 - LLM SDKs (Gemini, OpenAI, Claude), RAG, Multimodal AI, Pydantic structured outputs, PandasAI, Cursor AI
- ✅ Evaluation-driven development - DeepEval + pytest integrated into every project; RAGAS RAG Triad metrics; Docker containerization across all repos
- ✅ Skills progression by design - Each project introduces new capabilities that build on the previous
- ✅ Domain expertise - 15+ years data experience, 8 years finance, 6 years trading
- ✅ Measurable business impact - 95% efficiency gains, documented results
- ✅ Systematic GenAI-first progression - Clear 37-month path with GenAI/LLM engineering at every stage
The key differentiator: Already delivering production value while building toward LLM engineering, with transparent GenAI integration throughout.
In 2026, GenAI engineering is essential for data professionals. While most candidates learn traditional tools only, this journey integrates GenAI/LLM engineering systematically from the start.
Each stage combines traditional data skills with GenAI augmentation:
Stage 1: GenAI-First Data Analyst & AI Engineer 🟢 ACTIVE
Foundation: Python, SQL, Statistics, Visualization
+ GenAI Layer: IBM GenAI Engineering cert, LLM SDKs (Gemini, OpenAI, Claude), RAG, Multimodal AI, Pydantic, Streamlit, PandasAI, Cursor AI
+ Evaluation Layer: DeepEval + pytest integration, RAGAS (RAG Triad metrics), LangSmith observability
+ Containerization: Docker fundamentals, Dockerfile for every project
= Result: AI-powered dashboards with natural language interfaces + production GenAI applications with evaluation-driven development
Stage 2: GenAI Data Engineer + AI Systems Architect 📅 PLANNED
Foundation: AWS, Airflow, PySpark, PostgreSQL, BigQuery
+ AI Systems Layer: Vector DBs (Pinecone/Weaviate/Qdrant), RAG infrastructure, embedding pipelines
+ Containerization: Docker & Kubernetes Masterclass, production container orchestration
= Result: AI-first data pipelines feeding LLM systems with unstructured data ETL
Stage 3: ML Engineer + Local LLM Specialist 📅 PLANNED
Foundation: scikit-learn, TensorFlow/Keras, PyTorch, MLOps, NVIDIA DLI certification
+ LLM Layer: Ollama (local LLMs), fine-tuning (LoRA/QLoRA/PEFT), on-premise AI for finance
= Result: Private AI systems solving finance's data privacy problem
Stage 4: Agentic AI Engineer & LLM Specialist 📅 PLANNED
Foundation: Advanced LLM architecture, system design
+ Agentic Layer: MCP (Anthropic), LangGraph, CrewAI, Andrew Ng's Agentic AI, multi-agent orchestration
= Result: Autonomous AI trading systems with multi-agent collaboration
Stage 5: Senior LLM Engineer 📅 PLANNED
Foundation: Production architecture, thought leadership
+ Evaluation Layer: Automated Testing for LLMOps, CI/CD for AI, production monitoring
= Result: Senior-level AI systems with evaluation-driven development ($180-250K+)
| Traditional Path | GenAI-First Path (This Journey) |
|---|---|
| Learn tools → Get job → Maybe add AI later | Learn tools + GenAI together → Land AI-ready role |
| Positioned with majority of candidates | Positioned ahead of 95% of candidates |
| Standard market rates | 15-20% salary premium for GenAI skills |
| Limited future trajectory | Clear path to Senior LLM Engineer ($180-250K+) |
Market reality: The agentic AI market is exploding in 2026 with MCP (Anthropic) and A2A (Google) protocols becoming industry standards. Companies need professionals who BUILD production AI systems—not just prompt ChatGPT.
7 projects ordered by skills progression — each builds on the previous, from ETL foundations to flagship research system.
🏗️ Production GitHub Standard (v8.2): Every project ships with: architecture diagram (Mermaid), Dockerfile, evaluation metrics table (DeepEval + pytest), demo GIF, and "What I Learned" section. All projects include DeepEval evaluation framework and Docker containerization support.
1. 1099 Reconciliation ETL Pipeline ✅ Live Production
Automated Python ETL pipeline reconciling retirement plan distribution data between Relius and Matrix financial systems at Daybright Financial.
Business Challenge: Manual reconciliation took 4-6 hours weekly, was error-prone, and blocked critical 1099-R tax reporting deadlines.
Impact: 95% time reduction (4-6 hours → 15 min/week) | $15,000+ annual savings | 10x scalability | Zero errors
Tech: Python • pandas • openpyxl • Excel • Matplotlib • pytest • GitHub Actions CI • faker (synthetic data)
Skills established: ETL pipelines, testing, CI/CD, production deployment
2. DataVault Analyst ⭐ First AI Project
AI-Powered PII-Safe Data Intelligence | "Chat With Your Data"
Natural language analytics for retirement plan operations with PII protection, AI guardrails, and code transparency.
Business Challenge: Operations teams need to extract insights from Excel data containing sensitive PII (SSN, names, DOB), but manual Excel filtering is slow, error-prone, and creates PII exposure risk.
AI-Powered Solution:
| Feature | Implementation |
|---|---|
| AI Chat | LLM SDK (provider-agnostic) + PandasAI with generated code visibility |
| PII Protection | Governance-as-code: PII leak prevention in AI responses |
| Hybrid Analytics | Pre-built dashboards + AI chat (works even without API key) |
| Structured Outputs | Pydantic-validated AI responses with type-safe schemas |
Tech: Python • pandas • Streamlit • Gemini SDK • PandasAI • Pydantic • DeepEval • Docker • GitHub Actions CI
New skills introduced: + LLM SDK, PandasAI, Streamlit, Pydantic structured outputs, PII handling
3. PolicyPulse 🧠 RAG Foundation
AI-Powered HR Policy Chatbot | "Ask Your Policies"
RAG chatbot that answers employee policy questions with cited sources and auto-escalates to HR when the AI is uncertain.
Business Challenge: Employees spend hours searching through policy documents for answers to common questions. HR teams are overwhelmed by repetitive inquiries about PTO, benefits, and overtime rules.
AI-Powered Solution:
| Feature | Implementation |
|---|---|
| Semantic Search | Embeddings + ChromaDB vector store + similarity scoring |
| Cited Answers | Every response cites specific policy section & document |
| Smart Escalation | Confidence < 0.7 → auto-generate HR ticket with context |
| RAG Pipeline | Document → Chunk → Embed → Retrieve → Generate |
Tech: Python • ChromaDB • Gemini Embeddings • Streamlit • Pydantic • DeepEval • RAGAS • Docker • GitHub Actions CI
New skills introduced: + Embeddings, ChromaDB, RAG pipeline, semantic search, ticket escalation, RAG Triad evaluation
4. FormSense 📄 Document Intelligence
AI-Powered Distribution Form Validator | "From Paper to Processing"
Multimodal AI system that reads retirement plan distribution forms (handwritten checkboxes, signatures), validates against business rules, and routes the result: complete forms generate processing tickets, incomplete forms trigger advisor emails.
Business Challenge: Distribution form intake requires manual reading of handwritten forms with checkboxes, signatures, and complex fields. Errors in extraction create compliance risk for ERISA-regulated retirement plans.
AI-Powered Solution:
| Feature | Implementation |
|---|---|
| Vision AI | Gemini Vision reads checkboxes, handwriting, printed text |
| Validation | Business rule engine for ERISA-regulated distribution processing |
| Smart Routing | Complete → operations ticket | Incomplete → email to advisor |
| Confidence | Field-level extraction confidence scoring |
Tech: Python • Gemini Vision SDK • Streamlit • Pydantic • DeepEval • Docker • GitHub Actions CI
New skills introduced: + Multimodal AI (Vision LLM), form extraction, business rule validation, email automation
5. Operations-Demand-Intelligence 📊 Enterprise Analytics | 🚧 In Development
AI-Powered Workflow Demand Analysis for data-driven staffing decisions using OnBase enterprise data.
Analyzing 8+ months of workflow data to enable intelligent resource allocation with AI-powered natural language insights.
Business Challenge:
Operations teams lack visibility into workflow demand patterns, leading to reactive staffing and resource inefficiencies. No data-driven approach for Distribution vs Loan workflow segmentation.
AI-Powered Solution:
| Feature | Implementation | AI Enhancement |
|---|---|---|
| Demand Analysis | Volume patterns, Distribution vs Loan segmentation | AI-powered trend detection, anomaly alerts |
| Interactive Dashboard | Streamlit with Plotly visualizations | LLM SDK + PandasAI chat: "Why did loan volume spike in March?" |
| Insights Generation | Traditional business metrics | AI-generated commentary for stakeholders |
| Data Privacy | PII handling, synthetic data for GitHub | AI with privacy guardrails, read-only access |
Tech: Python • pandas • Streamlit • Gemini SDK • PandasAI • Plotly • DeepEval • Docker • GitHub Actions CI
New skills introduced: + Enterprise real data integration, advanced analytics, stakeholder reporting
6. StreamSmart Optimizer 📺 Consumer AI App
AI-Powered Streaming Subscription Rotation Advisor | "Spend Less, Watch More"
Consumer-facing dashboard that helps households optimize streaming subscriptions through AI-driven rotation scheduling, cost-per-view analytics, and content search via live APIs.
Business Challenge: 36% of U.S. streaming subscribers already rotate services to cut costs (Antenna Research, 2025), but manage it with spreadsheets. No existing tool combines AI rotation planning + content search + cost analytics.
AI-Powered Solution:
| Feature | Implementation |
|---|---|
| Content Search | Watchmode + TMDB API integration ("Where can I watch X?") |
| AI Rotation Planner | LLM analyzes habits + content calendar → optimal schedule |
| Savings Engine | Cost-per-view analytics + annual savings projections |
| Guardrails | Price validation, financial disclaimers, scope limits |
Tech: Python • httpx async • Watchmode/TMDB APIs • Streamlit • Gemini SDK • Pydantic • DeepEval • LangSmith • Docker • GitHub Actions CI
New skills introduced: + External API integration, consumer UX, optimization algorithms, async HTTP
7. Attention-Flow Catalyst 🚀 Flagship
Research Question: Which trigger or combination best predicts +10% price moves within 3 trading days?
Flagship project evolving through all 5 career stages:
| Stage | Focus | AI Integration |
|---|---|---|
| 1 (Active) | Statistical backtesting, signal leaderboard | LLM SDK chat + PandasAI, AI insights |
| 2 | AWS pipelines, 500+ tickers, vector storage | RAG infrastructure, embedding pipelines |
| 3 | ML predictions, ensemble models | Local LLMs (Ollama), fine-tuned financial models |
| 4 | Agentic AI trading system | MCP + LangGraph + Multi-agent orchestration |
| 5 | Production deployment + evaluation | LLMOps testing, CI/CD for AI, monitoring |
Phase 1A (Active): Dynamic stock screener • Alternative data collection (SEC, Wikipedia, News) • Statistical backtesting with bootstrap confidence • Trigger leaderboard
Phase 1B (Next): Streamlit dashboard + LLM SDK chat interface + PandasAI + AI-generated commentary + Natural language queries
What makes it defensible: Walk-forward validation • Survivorship bias controls • Modern stack (DuckDB, Parquet) • Progressive architecture through all career stages
Tech: Python • DuckDB • Parquet • httpx async • edgartools • yfinance • Wikipedia API • Gemini SDK • PandasAI • Streamlit • DeepEval • Docker • GitHub Actions CI
New skills introduced: + Statistical methodology, DuckDB lakehouse, async data collection, multi-source alternative data
learning_journey/
│
├── 📄 README.md # This file - Complete overview with GenAI-first positioning
│
├── 📂 projects/ # ⭐ Project directory (links to separate repos)
│ └── README.md # Comprehensive project index
│ ├── 1099 ETL Pipeline (production, public, $15K savings)
│ ├── DataVault Analyst (first AI project, PII-safe analytics)
│ ├── PolicyPulse (RAG chatbot, citations, ticket escalation)
│ ├── FormSense (multimodal AI, document intelligence)
│ ├── Operations-Demand-Intelligence (enterprise analytics, AI chat)
│ ├── StreamSmart Optimizer (consumer AI, API integration)
│ └── Attention-Flow Catalyst (flagship, 5-stage evolution)
│
├── 📂 getting-started/ # For new visitors
│ ├── README.md # Navigation & overview
│ ├── SETUP_GUIDE.md # Complete dev environment + AI tools
│ ├── environment-verification.py # Test your setup
│ └── prerequisites.md # What you need to begin
│
├── 📂 courses/ # Course-specific materials
│ ├── cs50_harvard/ # CS50 work & notes
│ │
│ ├── python_for_everybody/ # Python course materials
│ │ ├── code/ # Practice scripts (AI-enhanced)
│ │ ├── experiments/ # Enhanced exercises
│ │ ├── notebooks/ # Jupyter notebooks
│ │ └── notes/ # Course notes by module
│ │
│ ├── datacamp_data_analyst/ # DataCamp Data Analyst Track
│ │ ├── notebooks/ # Practice notebooks
│ │ └── README.md # Track progress
│ │
│ ├── vanderbilt_genai_analyst/ # 🤖 GenAI-Powered Analysis
│ │ ├── chatgpt_workflows/ # CLUE/TRUST/CAPTURE frameworks
│ │ ├── prompt_engineering/ # Prompt patterns & examples
│ │ └── README.md # Course progress
│ │
│ ├── ibm_genai_engineering/ # 🤖 IBM GenAI Engineering (16 courses)
│ │
│ ├── ibm_data_analyst/ # IBM course materials
│ │
│ └── sql_mode_thoughtspot/ # SQL practice
│
├── 📂 certifications/ # Certificate tracking
│ ├── README.md # All certifications overview
│ └── in-progress/ # Current progress tracking
│ ├── python-for-everybody-progress.md
│ ├── google-data-analytics-progress.md
│ ├── ibm-data-analyst-progress.md
│ ├── ibm-genai-engineering-progress.md # 🤖 GenAI Engineering cert
│ ├── vanderbilt-genai-analyst-progress.md
│ └── statistics-with-python-progress.md
│
├── 📂 docs/ # Documentation & guides
│ ├── index.html # GitHub Pages landing page
│ ├── roadmap.html # Interactive 37-month GenAI-first roadmap (v8.2)
│ └── activation-plans/ # Structured learning guides
│ ├── README.md # Guide overview
│ ├── WEEK_01_MASTER_ACTIVATION_PLAN.md
│ ├── DAILY_ROUTINE_GUIDE.md
│ ├── ACCEPTANCE_CRITERIA.md
│ └── WEEK_01_QUICK_REFERENCE.md
│
├── 📂 notes/ # Learning journal
│ ├── week1_summary.md # Weekly progress summaries
│ ├── ai_tools_exploration.md # 🤖 AI tools learning notes
│ ├── trading_ideas.md # Trading research
│ └── learning-journal.md # Daily reflections
│
├── .gitignore # Ignore .venv, cache, etc.
├── .vscode/ # VS Code settings
└── requirements.txt # Python dependencies
Systematic progression with GenAI/LLM engineering at every stage. Income secured from Stage 1 onward.
Core Skills: Python • SQL • Statistics • Visualization • Power BI/Tableau
GenAI Skills: LLM SDKs (Gemini, OpenAI, Claude) • RAG (ChromaDB) • Multimodal AI (Gemini Vision) • Pydantic • Streamlit • PandasAI • Cursor AI • Prompt Engineering
Evaluation: DeepEval + pytest • RAGAS (RAG Triad) • LangSmith
Containerization: Docker fundamentals (KodeKloud)
Learning Path:
- CS50 (Harvard) - Computer Science fundamentals
- Python for Everybody (University of Michigan)
- Google Data Analytics Professional Certificate
- IBM Data Analyst Professional Certificate
- Statistics with Python (University of Michigan)
- 🤖 IBM Generative AI Engineering Professional Certificate (16 courses) — RAG, LangChain, fine-tuning, deployment
- 🤖 AI Python for Beginners (DeepLearning.AI) — Andrew Ng's AI-first Python
- 🤖 Generative AI Data Analyst Specialization (Vanderbilt)
- 🤖 ChatGPT Prompt Engineering (DeepLearning.AI)
- 🤖 30 Days of Streamlit Challenge — Build AI UIs fast
- 🧪 Building & Evaluating Advanced RAG (DeepLearning.AI) — RAG Triad metrics, evaluation-driven development
- 🐳 Docker for Beginners with Hands-on Labs (KodeKloud/Coursera) — Containerization fundamentals
Key Deliverables (7 projects, easy → flagship):
- 1099 ETL Pipeline ✅ — Production system, $15K savings (Foundation: ETL + Testing + CI/CD)
- DataVault Analyst — First AI project (+ LLM SDK, Pydantic, PII handling)
- PolicyPulse — RAG foundation (+ Embeddings, ChromaDB, semantic search)
- FormSense — Document intelligence (+ Multimodal AI, Vision LLM)
- Operations-Demand-Intelligence 🚧 — Enterprise analytics (+ real data, advanced analytics)
- StreamSmart Optimizer — Consumer AI app (+ external APIs, optimization engine)
- Attention-Flow Catalyst 🚀 — Flagship (+ statistical methodology, DuckDB, async)
Outcome: GenAI-First Data Analyst & AI Engineer position
Core Skills: AWS • Airflow • PySpark • PostgreSQL • BigQuery • Data warehousing
AI Systems Skills: Vector DBs (Pinecone/Weaviate/Qdrant) • RAG infrastructure • Embedding pipelines • Unstructured data ETL
Containerization: Docker & Kubernetes Masterclass (Months 12-13)
Key Deliverable: All 7 projects evolve — Cloud deployment, production databases, scheduled pipelines
Core Skills: scikit-learn • TensorFlow/Keras • PyTorch • MLOps • NVIDIA DLI certification
LLM Skills: Ollama (local LLMs) • Fine-tuning (LoRA/QLoRA/PEFT) • On-premise AI for finance
Key Deliverable: Fine-tuned financial LLM solving finance's data privacy problem
Core Skills: Advanced LLM architecture • System design • Production deployment
Agentic Skills: MCP (Anthropic) • LangGraph • CrewAI • Andrew Ng's Agentic AI • Multi-agent orchestration
Key Deliverable: AI Trading Assistant with multi-agent collaboration (research + analysis + execution agents)
Core Skills: Production AI architecture • Thought leadership • System design interviews
Evaluation Skills: Automated Testing for LLMOps • CI/CD for AI • Production monitoring
Key Deliverable: Production-grade AI Trading Platform with evaluation-driven development
Final Target: Senior LLM Engineer ($180-250K+) with advanced expertise and global opportunities.
Development:
- Cursor AI IDE - Primary editor with AI pair programming (Composer mode)
- VS Code + Codeium - Secondary environment with code completion
GenAI Engineering:
- LLM SDKs (Gemini, OpenAI, Claude) - Provider-agnostic API integration for production AI systems
- Pydantic - Structured output validation for all AI responses
- ChromaDB - Vector store for RAG pipelines (PolicyPulse)
- Gemini Vision SDK - Multimodal AI for document understanding (FormSense)
- LangChain - Framework for building GenAI applications
- Streamlit - AI-powered web app interfaces
- PandasAI - Natural language data querying for dashboard integration
AI Evaluation (v8.2 Cross-Project Standard):
- DeepEval + pytest - Evaluation-driven development integrated into CI/CD for all projects
- RAGAS - RAG Triad metrics (Context Relevance, Groundedness, Answer Relevance) for PolicyPulse
- LangSmith - LLM observability, tracing, and debugging for StreamSmart and beyond
Containerization:
- Docker - Dockerfile for every portfolio project (Stage 1 fundamentals via KodeKloud course)
Analysis:
- ChatGPT Plus - Advanced Data Analysis, code generation, debugging
Learning:
- IBM GenAI Engineering Professional Certificate (RAG, LangChain, fine-tuning, deployment) — Stage 1 primary
- Vanderbilt GenAI Specialization (CLUE/TRUST/CAPTURE frameworks)
- DeepLearning.AI Prompt Engineering (API integration, production patterns)
- AI Python for Beginners (Andrew Ng's AI-first Python foundation)
- Building & Evaluating Advanced RAG (DeepLearning.AI) — RAG Triad evaluation with TruLens/DeepEval
- Docker for Beginners with Hands-on Labs (KodeKloud/Coursera) — Containerization fundamentals
Transparency: Document AI assistance in commits and comments
Validation: Always test AI-generated code
Production: Implement guardrails (read-only, cost controls, disclaimers)
Progressive: Expand GenAI capabilities systematically across stages
| Stage | AI Tools & Frameworks |
|---|---|
| 2 | Vector DBs (Pinecone/Weaviate) + RAG infrastructure + Embedding pipelines + Docker & Kubernetes Masterclass + BigQuery |
| 3 | Ollama (local LLMs) + Fine-tuning (LoRA/QLoRA/PEFT) + Generative AI with LLMs (AWS) + NVIDIA DLI |
| 4 | MCP (Anthropic) + LangGraph + CrewAI + Andrew Ng's Agentic AI + Multi-agent systems |
| 5 | Automated Testing for LLMOps + CI/CD for AI + Production evaluation + Monitoring |
Languages: Python 3.11+, SQL
Data: pandas, NumPy, Matplotlib, Seaborn, Plotly
Databases: SQLite, DuckDB, ChromaDB, PostgreSQL
AI/GenAI: LLM SDKs (Gemini, OpenAI, Claude), Gemini Vision, Pydantic, LangChain, Streamlit, PandasAI, Cursor AI, ChatGPT Plus
Evaluation: DeepEval + pytest, RAGAS (RAG Triad), LangSmith observability
Containerization: Docker (Stage 1 fundamentals), Docker & Kubernetes (Stage 2)
Platforms: Coursera, DataCamp, DeepLearning.AI
# Clone repository
git clone https://github.com/manuel-reyes-ml/learning_journey.git
cd learning_journey
# See detailed setup guide (includes AI tools setup)
open getting_started/SETUP_GUIDE.md
# Verify environment
python getting_started/environment-verification.pySetup: See Complete Setup Guide
GenAI-Enhanced Practice: Daily coding with AI pair programming, but always understanding and validating outputs
Production-First: Every project built to production standards with proper error handling, testing, and documentation
Transparent Integration: Document when/how AI assists, show reasoning not just outputs
Enhancement Always: Never just complete exercises—optimize, expand, and apply to real-world scenarios
Domain Application: Every skill applied to trading/finance domain for authentic learning
Systematic Progression: Clear 37-month path with measurable milestones
Every exercise is enhanced with additional functionality, error handling, testing, and real-world application. Now with AI assistance documented transparently.
Standard Approach:
# Calculate average
numbers = [1, 2, 3, 4, 5]
average = sum(numbers) / len(numbers)
print(f"Average: {average}")My GenAI-Enhanced Approach:
def calculate_statistics(data: list[float], include_outliers: bool = True) -> dict:
"""
Calculate comprehensive statistics with multiple methods.
Args:
data: List of numeric values
include_outliers: Whether to include outlier analysis
Returns:
dict: Statistics including mean, median, mode, std dev
Note: Developed with Cursor AI assistance for statistical functions
"""
import statistics
from collections import Counter
if not data:
raise ValueError("Cannot calculate statistics on empty dataset")
stats = {
'mean': statistics.mean(data),
'median': statistics.median(data),
'mode': statistics.mode(data) if len(Counter(data)) < len(data) else None,
'std_dev': statistics.stdev(data) if len(data) > 1 else 0,
'range': (min(data), max(data))
}
if include_outliers:
q1 = statistics.quantiles(data, n=4)[0]
q3 = statistics.quantiles(data, n=4)[2]
iqr = q3 - q1
stats['outliers'] = [x for x in data if x < (q1 - 1.5*iqr) or x > (q3 + 1.5*iqr)]
return stats
# Apply to real trading data
stock_returns = [0.05, 0.03, -0.02, 0.04, 0.01, 0.15] # 15% is potential outlier
analysis = calculate_statistics(stock_returns)
print(f"Return Analysis: {analysis}")25 hours/week structured as:
- Mornings (4:30-6 AM): Theory, lectures, reading
- Evenings (8-10 PM): Hands-on coding with AI tools, projects
- Weekends: Deep work on complex projects and integration
Sustainable pace designed for 37-month journey while working full-time.
Active Stage: 1 of 5 (GenAI-First Data Analyst & AI Engineer)
Projects: 1 deployed (production), 1 in development (ODI), 5 scoped and queued
Total Pipeline: 7 production-grade projects (easy → flagship)
Certifications: 8 in progress (including 3 GenAI-focused)
Study Hours: 25/week consistent
Next Milestones:
- Complete DataVault Analyst (first AI project to publish)
- Launch PolicyPulse (RAG foundation)
- Launch FormSense (Multimodal AI)
- Complete Operations-Demand-Intelligence (enterprise analytics)
- Build StreamSmart Optimizer (consumer AI app)
- Complete Attention-Flow Catalyst Phase 1A & 1B (flagship)
- Finish 5 core certifications + IBM GenAI Engineering cert
- Secure GenAI-First Data Analyst & AI Engineer role
Professional:
- LinkedIn: Manuel Reyes
- GitHub: @manuel-reyes-ml
- Email: manuelreyesv410@gmail.com
Portfolio:
- Data Portfolio Repository
- 1099 ETL Pipeline ✅ Production
- DataVault Analyst — First AI Project
- PolicyPulse — RAG Foundation
- FormSense — Document Intelligence
- Operations-Demand-Intelligence 🚧
- StreamSmart Optimizer — Consumer AI
- Attention-Flow Catalyst 🚀 Flagship
Open To:
- 💼 GenAI-First Data Analyst & AI Engineer opportunities (remote preferred)
- 🤝 Networking with data professionals and traders
- 🤖 GenAI tool and workflow discussions
- 💡 Code reviews and technical discussions
- 🎓 Mentorship (giving or receiving)
Welcome:
- Code quality feedback and best practices
- GenAI integration approaches and tool recommendations
- Trading strategy discussions
- Career advice and networking
- Collaboration on projects
How:
- Open GitHub issues for technical discussions
- Connect on LinkedIn for professional networking
- Comment on commits with feedback
- Share your own GenAI-powered learning journey
This repository documents a complete career transformation: from business ops professional to Senior LLM Engineer, with GenAI/LLM engineering from Day 1.
What this represents:
- 37-month systematic journey (5,000+ hours)
- 7 production-grade projects demonstrating progressive skill mastery
- Production systems with measurable business impact
- GenAI-first approach positioning ahead of traditional candidates
- Foundation for six-figure remote tech career
- Path to building revenue-generating AI systems
- Demonstration that structured learning + GenAI integration enables career reinvention
Ultimate goal: Production AI Trading Assistant combining deep finance expertise with cutting-edge agentic AI capabilities.
Real-time documentation of a GenAI-first career transformation from Day 1 to Senior LLM Engineer.
- ⭐ Star this repository to follow the journey
- 🔔 Watch for updates on GenAI integration and project progress
- 🔗 Connect for professional discussions and collaboration
Current Stage: GenAI-First Data Analyst & AI Engineer (1 of 5) | Building GenAI-Enhanced Foundations
Status: 🟢 Active • Learning in Public • Deploying Production Systems