Skip to content

manuel-reyes-ml/learning_journey

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

843 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🚀 Learning Journey: Business Ops Professional → Senior LLM Engineer

GenAI-First Career Transformation | Production Code from Day 1 | Systematic 37-Month Journey

Current Stage: GenAI-First Data Analyst & AI Engineer (Stage 1 of 5)
Next Milestone: Land Data Analyst & AI Engineer role with GenAI integration skills
Ultimate Goal: Senior LLM Engineer building production AI Trading Assistant
Study Commitment: 25 hours/week systematic learning

Current Stage Study Hours Timeline GenAI-First

📋 View Complete 37-Month Interactive Roadmap (v8.2) →


🗺️ Quick Navigation

👔 For Recruiters / Hiring Managers:

  1. 💼 Production Projects → - Live ETL system + 7 production-grade projects ⭐ START HERE
  2. 🤖 GenAI-First Differentiation → - What sets this apart
  3. 📊 Complete Roadmap → - 37-month visualization
  4. 🔗 LinkedIn → - Professional background

🎓 For Fellow Learners:

  1. 🤖 AI Integration Strategy → - My AI stack and approach
  2. 📚 Repository Structure → - Course materials organization
  3. 💡 Learning Philosophy → - Core principles and approach
  4. 🛠️ Setup Guides → - Environment configuration

💪 What Makes This Different

Most learning repositories: Tutorial completions and course exercises with no real-world application.

This repository demonstrates:

  • Production system deployed - Live ETL pipeline saving $15K/year with public code
  • 7 production-grade projects - From ETL foundations to RAG, Multimodal AI, and statistical research systems
  • GenAI integration from Day 1 - LLM SDKs (Gemini, OpenAI, Claude), RAG, Multimodal AI, Pydantic structured outputs, PandasAI, Cursor AI
  • Evaluation-driven development - DeepEval + pytest integrated into every project; RAGAS RAG Triad metrics; Docker containerization across all repos
  • Skills progression by design - Each project introduces new capabilities that build on the previous
  • Domain expertise - 15+ years data experience, 8 years finance, 6 years trading
  • Measurable business impact - 95% efficiency gains, documented results
  • Systematic GenAI-first progression - Clear 37-month path with GenAI/LLM engineering at every stage

The key differentiator: Already delivering production value while building toward LLM engineering, with transparent GenAI integration throughout.


🤖 GenAI-First Differentiation (2026 Market Advantage)

In 2026, GenAI engineering is essential for data professionals. While most candidates learn traditional tools only, this journey integrates GenAI/LLM engineering systematically from the start.

The GenAI-First Framework

Each stage combines traditional data skills with GenAI augmentation:

Stage 1: GenAI-First Data Analyst & AI Engineer 🟢 ACTIVE

Foundation: Python, SQL, Statistics, Visualization
+ GenAI Layer: IBM GenAI Engineering cert, LLM SDKs (Gemini, OpenAI, Claude), RAG, Multimodal AI, Pydantic, Streamlit, PandasAI, Cursor AI
+ Evaluation Layer: DeepEval + pytest integration, RAGAS (RAG Triad metrics), LangSmith observability
+ Containerization: Docker fundamentals, Dockerfile for every project
= Result: AI-powered dashboards with natural language interfaces + production GenAI applications with evaluation-driven development

Stage 2: GenAI Data Engineer + AI Systems Architect 📅 PLANNED

Foundation: AWS, Airflow, PySpark, PostgreSQL, BigQuery
+ AI Systems Layer: Vector DBs (Pinecone/Weaviate/Qdrant), RAG infrastructure, embedding pipelines
+ Containerization: Docker & Kubernetes Masterclass, production container orchestration
= Result: AI-first data pipelines feeding LLM systems with unstructured data ETL

Stage 3: ML Engineer + Local LLM Specialist 📅 PLANNED

Foundation: scikit-learn, TensorFlow/Keras, PyTorch, MLOps, NVIDIA DLI certification
+ LLM Layer: Ollama (local LLMs), fine-tuning (LoRA/QLoRA/PEFT), on-premise AI for finance
= Result: Private AI systems solving finance's data privacy problem

Stage 4: Agentic AI Engineer & LLM Specialist 📅 PLANNED

Foundation: Advanced LLM architecture, system design
+ Agentic Layer: MCP (Anthropic), LangGraph, CrewAI, Andrew Ng's Agentic AI, multi-agent orchestration
= Result: Autonomous AI trading systems with multi-agent collaboration

Stage 5: Senior LLM Engineer 📅 PLANNED

Foundation: Production architecture, thought leadership
+ Evaluation Layer: Automated Testing for LLMOps, CI/CD for AI, production monitoring
= Result: Senior-level AI systems with evaluation-driven development ($180-250K+)

Why This Matters

Traditional Path GenAI-First Path (This Journey)
Learn tools → Get job → Maybe add AI later Learn tools + GenAI together → Land AI-ready role
Positioned with majority of candidates Positioned ahead of 95% of candidates
Standard market rates 15-20% salary premium for GenAI skills
Limited future trajectory Clear path to Senior LLM Engineer ($180-250K+)

Market reality: The agentic AI market is exploding in 2026 with MCP (Anthropic) and A2A (Google) protocols becoming industry standards. Companies need professionals who BUILD production AI systems—not just prompt ChatGPT.


🏆 Production & Portfolio Highlights

7 projects ordered by skills progression — each builds on the previous, from ETL foundations to flagship research system.

🏗️ Production GitHub Standard (v8.2): Every project ships with: architecture diagram (Mermaid), Dockerfile, evaluation metrics table (DeepEval + pytest), demo GIF, and "What I Learned" section. All projects include DeepEval evaluation framework and Docker containerization support.

1. 1099 Reconciliation ETL Pipeline ✅ Live Production

Automated Python ETL pipeline reconciling retirement plan distribution data between Relius and Matrix financial systems at Daybright Financial.

Business Challenge: Manual reconciliation took 4-6 hours weekly, was error-prone, and blocked critical 1099-R tax reporting deadlines.

Impact: 95% time reduction (4-6 hours → 15 min/week) | $15,000+ annual savings | 10x scalability | Zero errors

Tech: Python • pandas • openpyxl • Excel • Matplotlib • pytest • GitHub Actions CI • faker (synthetic data)

Skills established: ETL pipelines, testing, CI/CD, production deployment


2. DataVault Analyst ⭐ First AI Project

AI-Powered PII-Safe Data Intelligence | "Chat With Your Data"

Natural language analytics for retirement plan operations with PII protection, AI guardrails, and code transparency.

Business Challenge: Operations teams need to extract insights from Excel data containing sensitive PII (SSN, names, DOB), but manual Excel filtering is slow, error-prone, and creates PII exposure risk.

AI-Powered Solution:

Feature Implementation
AI Chat LLM SDK (provider-agnostic) + PandasAI with generated code visibility
PII Protection Governance-as-code: PII leak prevention in AI responses
Hybrid Analytics Pre-built dashboards + AI chat (works even without API key)
Structured Outputs Pydantic-validated AI responses with type-safe schemas

Tech: Python • pandas • Streamlit • Gemini SDK • PandasAI • Pydantic • DeepEval • Docker • GitHub Actions CI

New skills introduced: + LLM SDK, PandasAI, Streamlit, Pydantic structured outputs, PII handling


3. PolicyPulse 🧠 RAG Foundation

AI-Powered HR Policy Chatbot | "Ask Your Policies"

RAG chatbot that answers employee policy questions with cited sources and auto-escalates to HR when the AI is uncertain.

Business Challenge: Employees spend hours searching through policy documents for answers to common questions. HR teams are overwhelmed by repetitive inquiries about PTO, benefits, and overtime rules.

AI-Powered Solution:

Feature Implementation
Semantic Search Embeddings + ChromaDB vector store + similarity scoring
Cited Answers Every response cites specific policy section & document
Smart Escalation Confidence < 0.7 → auto-generate HR ticket with context
RAG Pipeline Document → Chunk → Embed → Retrieve → Generate

Tech: Python • ChromaDB • Gemini Embeddings • Streamlit • Pydantic • DeepEval • RAGAS • Docker • GitHub Actions CI

New skills introduced: + Embeddings, ChromaDB, RAG pipeline, semantic search, ticket escalation, RAG Triad evaluation


4. FormSense 📄 Document Intelligence

AI-Powered Distribution Form Validator | "From Paper to Processing"

Multimodal AI system that reads retirement plan distribution forms (handwritten checkboxes, signatures), validates against business rules, and routes the result: complete forms generate processing tickets, incomplete forms trigger advisor emails.

Business Challenge: Distribution form intake requires manual reading of handwritten forms with checkboxes, signatures, and complex fields. Errors in extraction create compliance risk for ERISA-regulated retirement plans.

AI-Powered Solution:

Feature Implementation
Vision AI Gemini Vision reads checkboxes, handwriting, printed text
Validation Business rule engine for ERISA-regulated distribution processing
Smart Routing Complete → operations ticket | Incomplete → email to advisor
Confidence Field-level extraction confidence scoring

Tech: Python • Gemini Vision SDK • Streamlit • Pydantic • DeepEval • Docker • GitHub Actions CI

New skills introduced: + Multimodal AI (Vision LLM), form extraction, business rule validation, email automation


5. Operations-Demand-Intelligence 📊 Enterprise Analytics | 🚧 In Development

AI-Powered Workflow Demand Analysis for data-driven staffing decisions using OnBase enterprise data.

Analyzing 8+ months of workflow data to enable intelligent resource allocation with AI-powered natural language insights.

Business Challenge:
Operations teams lack visibility into workflow demand patterns, leading to reactive staffing and resource inefficiencies. No data-driven approach for Distribution vs Loan workflow segmentation.

AI-Powered Solution:

Feature Implementation AI Enhancement
Demand Analysis Volume patterns, Distribution vs Loan segmentation AI-powered trend detection, anomaly alerts
Interactive Dashboard Streamlit with Plotly visualizations LLM SDK + PandasAI chat: "Why did loan volume spike in March?"
Insights Generation Traditional business metrics AI-generated commentary for stakeholders
Data Privacy PII handling, synthetic data for GitHub AI with privacy guardrails, read-only access

Tech: Python • pandas • Streamlit • Gemini SDK • PandasAI • Plotly • DeepEval • Docker • GitHub Actions CI

New skills introduced: + Enterprise real data integration, advanced analytics, stakeholder reporting


6. StreamSmart Optimizer 📺 Consumer AI App

AI-Powered Streaming Subscription Rotation Advisor | "Spend Less, Watch More"

Consumer-facing dashboard that helps households optimize streaming subscriptions through AI-driven rotation scheduling, cost-per-view analytics, and content search via live APIs.

Business Challenge: 36% of U.S. streaming subscribers already rotate services to cut costs (Antenna Research, 2025), but manage it with spreadsheets. No existing tool combines AI rotation planning + content search + cost analytics.

AI-Powered Solution:

Feature Implementation
Content Search Watchmode + TMDB API integration ("Where can I watch X?")
AI Rotation Planner LLM analyzes habits + content calendar → optimal schedule
Savings Engine Cost-per-view analytics + annual savings projections
Guardrails Price validation, financial disclaimers, scope limits

Tech: Python • httpx async • Watchmode/TMDB APIs • Streamlit • Gemini SDK • Pydantic • DeepEval • LangSmith • Docker • GitHub Actions CI

New skills introduced: + External API integration, consumer UX, optimization algorithms, async HTTP


7. Attention-Flow Catalyst 🚀 Flagship

Research Question: Which trigger or combination best predicts +10% price moves within 3 trading days?

Flagship project evolving through all 5 career stages:

Stage Focus AI Integration
1 (Active) Statistical backtesting, signal leaderboard LLM SDK chat + PandasAI, AI insights
2 AWS pipelines, 500+ tickers, vector storage RAG infrastructure, embedding pipelines
3 ML predictions, ensemble models Local LLMs (Ollama), fine-tuned financial models
4 Agentic AI trading system MCP + LangGraph + Multi-agent orchestration
5 Production deployment + evaluation LLMOps testing, CI/CD for AI, monitoring

Phase 1A (Active): Dynamic stock screener • Alternative data collection (SEC, Wikipedia, News) • Statistical backtesting with bootstrap confidence • Trigger leaderboard

Phase 1B (Next): Streamlit dashboard + LLM SDK chat interface + PandasAI + AI-generated commentary + Natural language queries

What makes it defensible: Walk-forward validation • Survivorship bias controls • Modern stack (DuckDB, Parquet) • Progressive architecture through all career stages

Tech: Python • DuckDB • Parquet • httpx async • edgartools • yfinance • Wikipedia API • Gemini SDK • PandasAI • Streamlit • DeepEval • Docker • GitHub Actions CI

New skills introduced: + Statistical methodology, DuckDB lakehouse, async data collection, multi-source alternative data


📂 Repository Structure

learning_journey/
│
├── 📄 README.md                          # This file - Complete overview with GenAI-first positioning
│
├── 📂 projects/                          # ⭐ Project directory (links to separate repos)
│   └── README.md                         # Comprehensive project index
│       ├── 1099 ETL Pipeline (production, public, $15K savings)
│       ├── DataVault Analyst (first AI project, PII-safe analytics)
│       ├── PolicyPulse (RAG chatbot, citations, ticket escalation)
│       ├── FormSense (multimodal AI, document intelligence)
│       ├── Operations-Demand-Intelligence (enterprise analytics, AI chat)
│       ├── StreamSmart Optimizer (consumer AI, API integration)
│       └── Attention-Flow Catalyst (flagship, 5-stage evolution)
│
├── 📂 getting-started/                   # For new visitors
│   ├── README.md                         # Navigation & overview
│   ├── SETUP_GUIDE.md                    # Complete dev environment + AI tools
│   ├── environment-verification.py       # Test your setup
│   └── prerequisites.md                  # What you need to begin
│
├── 📂 courses/                           # Course-specific materials
│   ├── cs50_harvard/                     # CS50 work & notes
│   │
│   ├── python_for_everybody/             # Python course materials
│   │   ├── code/                         # Practice scripts (AI-enhanced)
│   │   ├── experiments/                  # Enhanced exercises
│   │   ├── notebooks/                    # Jupyter notebooks
│   │   └── notes/                        # Course notes by module
│   │
│   ├── datacamp_data_analyst/            # DataCamp Data Analyst Track
│   │   ├── notebooks/                    # Practice notebooks
│   │   └── README.md                     # Track progress
│   │
│   ├── vanderbilt_genai_analyst/         # 🤖 GenAI-Powered Analysis
│   │   ├── chatgpt_workflows/            # CLUE/TRUST/CAPTURE frameworks
│   │   ├── prompt_engineering/           # Prompt patterns & examples
│   │   └── README.md                     # Course progress
│   │
│   ├── ibm_genai_engineering/            # 🤖 IBM GenAI Engineering (16 courses)
│   │
│   ├── ibm_data_analyst/                 # IBM course materials
│   │
│   └── sql_mode_thoughtspot/             # SQL practice
│
├── 📂 certifications/                    # Certificate tracking
│   ├── README.md                         # All certifications overview
│   └── in-progress/                      # Current progress tracking
│       ├── python-for-everybody-progress.md
│       ├── google-data-analytics-progress.md
│       ├── ibm-data-analyst-progress.md
│       ├── ibm-genai-engineering-progress.md  # 🤖 GenAI Engineering cert
│       ├── vanderbilt-genai-analyst-progress.md
│       └── statistics-with-python-progress.md
│
├── 📂 docs/                              # Documentation & guides
│   ├── index.html                        # GitHub Pages landing page
│   ├── roadmap.html                      # Interactive 37-month GenAI-first roadmap (v8.2)
│   └── activation-plans/                 # Structured learning guides
│       ├── README.md                     # Guide overview
│       ├── WEEK_01_MASTER_ACTIVATION_PLAN.md
│       ├── DAILY_ROUTINE_GUIDE.md
│       ├── ACCEPTANCE_CRITERIA.md
│       └── WEEK_01_QUICK_REFERENCE.md
│
├── 📂 notes/                             # Learning journal
│   ├── week1_summary.md                  # Weekly progress summaries
│   ├── ai_tools_exploration.md           # 🤖 AI tools learning notes
│   ├── trading_ideas.md                  # Trading research
│   └── learning-journal.md               # Daily reflections
│
├── .gitignore                            # Ignore .venv, cache, etc.
├── .vscode/                              # VS Code settings
└── requirements.txt                      # Python dependencies

🎯 The 37-Month GenAI-First Roadmap

Systematic progression with GenAI/LLM engineering at every stage. Income secured from Stage 1 onward.

Stage 1: GenAI-First Data Analyst & AI Engineer (Months 1-5) 🟢 ACTIVE

Core Skills: Python • SQL • Statistics • Visualization • Power BI/Tableau
GenAI Skills: LLM SDKs (Gemini, OpenAI, Claude) • RAG (ChromaDB) • Multimodal AI (Gemini Vision) • Pydantic • Streamlit • PandasAI • Cursor AI • Prompt Engineering
Evaluation: DeepEval + pytest • RAGAS (RAG Triad) • LangSmith
Containerization: Docker fundamentals (KodeKloud)

Learning Path:

  • CS50 (Harvard) - Computer Science fundamentals
  • Python for Everybody (University of Michigan)
  • Google Data Analytics Professional Certificate
  • IBM Data Analyst Professional Certificate
  • Statistics with Python (University of Michigan)
  • 🤖 IBM Generative AI Engineering Professional Certificate (16 courses) — RAG, LangChain, fine-tuning, deployment
  • 🤖 AI Python for Beginners (DeepLearning.AI) — Andrew Ng's AI-first Python
  • 🤖 Generative AI Data Analyst Specialization (Vanderbilt)
  • 🤖 ChatGPT Prompt Engineering (DeepLearning.AI)
  • 🤖 30 Days of Streamlit Challenge — Build AI UIs fast
  • 🧪 Building & Evaluating Advanced RAG (DeepLearning.AI) — RAG Triad metrics, evaluation-driven development
  • 🐳 Docker for Beginners with Hands-on Labs (KodeKloud/Coursera) — Containerization fundamentals

Key Deliverables (7 projects, easy → flagship):

  1. 1099 ETL Pipeline ✅ — Production system, $15K savings (Foundation: ETL + Testing + CI/CD)
  2. DataVault Analyst — First AI project (+ LLM SDK, Pydantic, PII handling)
  3. PolicyPulse — RAG foundation (+ Embeddings, ChromaDB, semantic search)
  4. FormSense — Document intelligence (+ Multimodal AI, Vision LLM)
  5. Operations-Demand-Intelligence 🚧 — Enterprise analytics (+ real data, advanced analytics)
  6. StreamSmart Optimizer — Consumer AI app (+ external APIs, optimization engine)
  7. Attention-Flow Catalyst 🚀 — Flagship (+ statistical methodology, DuckDB, async)

Outcome: GenAI-First Data Analyst & AI Engineer position


Stage 2: GenAI Data Engineer + AI Systems Architect (Months 6-15) 📅 Planned

Core Skills: AWS • Airflow • PySpark • PostgreSQL • BigQuery • Data warehousing
AI Systems Skills: Vector DBs (Pinecone/Weaviate/Qdrant) • RAG infrastructure • Embedding pipelines • Unstructured data ETL
Containerization: Docker & Kubernetes Masterclass (Months 12-13)

Key Deliverable: All 7 projects evolve — Cloud deployment, production databases, scheduled pipelines


Stage 3: ML Engineer + Local LLM Specialist (Months 16-29) 📅 Planned

Core Skills: scikit-learn • TensorFlow/Keras • PyTorch • MLOps • NVIDIA DLI certification
LLM Skills: Ollama (local LLMs) • Fine-tuning (LoRA/QLoRA/PEFT) • On-premise AI for finance

Key Deliverable: Fine-tuned financial LLM solving finance's data privacy problem


Stage 4: Agentic AI Engineer & LLM Specialist (Months 30-34) 📅 Planned

Core Skills: Advanced LLM architecture • System design • Production deployment
Agentic Skills: MCP (Anthropic) • LangGraph • CrewAI • Andrew Ng's Agentic AI • Multi-agent orchestration

Key Deliverable: AI Trading Assistant with multi-agent collaboration (research + analysis + execution agents)


Stage 5: Senior LLM Engineer (Months 35-37) 📅 Planned

Core Skills: Production AI architecture • Thought leadership • System design interviews
Evaluation Skills: Automated Testing for LLMOps • CI/CD for AI • Production monitoring

Key Deliverable: Production-grade AI Trading Platform with evaluation-driven development

Final Target: Senior LLM Engineer ($180-250K+) with advanced expertise and global opportunities.

📋 View Interactive Roadmap →


🤖 AI Tools & Workflows Integration

Current Stack (Stage 1)

Development:

  • Cursor AI IDE - Primary editor with AI pair programming (Composer mode)
  • VS Code + Codeium - Secondary environment with code completion

GenAI Engineering:

  • LLM SDKs (Gemini, OpenAI, Claude) - Provider-agnostic API integration for production AI systems
  • Pydantic - Structured output validation for all AI responses
  • ChromaDB - Vector store for RAG pipelines (PolicyPulse)
  • Gemini Vision SDK - Multimodal AI for document understanding (FormSense)
  • LangChain - Framework for building GenAI applications
  • Streamlit - AI-powered web app interfaces
  • PandasAI - Natural language data querying for dashboard integration

AI Evaluation (v8.2 Cross-Project Standard):

  • DeepEval + pytest - Evaluation-driven development integrated into CI/CD for all projects
  • RAGAS - RAG Triad metrics (Context Relevance, Groundedness, Answer Relevance) for PolicyPulse
  • LangSmith - LLM observability, tracing, and debugging for StreamSmart and beyond

Containerization:

  • Docker - Dockerfile for every portfolio project (Stage 1 fundamentals via KodeKloud course)

Analysis:

  • ChatGPT Plus - Advanced Data Analysis, code generation, debugging

Learning:

  • IBM GenAI Engineering Professional Certificate (RAG, LangChain, fine-tuning, deployment) — Stage 1 primary
  • Vanderbilt GenAI Specialization (CLUE/TRUST/CAPTURE frameworks)
  • DeepLearning.AI Prompt Engineering (API integration, production patterns)
  • AI Python for Beginners (Andrew Ng's AI-first Python foundation)
  • Building & Evaluating Advanced RAG (DeepLearning.AI) — RAG Triad evaluation with TruLens/DeepEval
  • Docker for Beginners with Hands-on Labs (KodeKloud/Coursera) — Containerization fundamentals

Integration Principles

Transparency: Document AI assistance in commits and comments
Validation: Always test AI-generated code
Production: Implement guardrails (read-only, cost controls, disclaimers)
Progressive: Expand GenAI capabilities systematically across stages

Evolution Path

Stage AI Tools & Frameworks
2 Vector DBs (Pinecone/Weaviate) + RAG infrastructure + Embedding pipelines + Docker & Kubernetes Masterclass + BigQuery
3 Ollama (local LLMs) + Fine-tuning (LoRA/QLoRA/PEFT) + Generative AI with LLMs (AWS) + NVIDIA DLI
4 MCP (Anthropic) + LangGraph + CrewAI + Andrew Ng's Agentic AI + Multi-agent systems
5 Automated Testing for LLMOps + CI/CD for AI + Production evaluation + Monitoring

💻 Development Environment

Languages: Python 3.11+, SQL
Data: pandas, NumPy, Matplotlib, Seaborn, Plotly
Databases: SQLite, DuckDB, ChromaDB, PostgreSQL
AI/GenAI: LLM SDKs (Gemini, OpenAI, Claude), Gemini Vision, Pydantic, LangChain, Streamlit, PandasAI, Cursor AI, ChatGPT Plus
Evaluation: DeepEval + pytest, RAGAS (RAG Triad), LangSmith observability
Containerization: Docker (Stage 1 fundamentals), Docker & Kubernetes (Stage 2)
Platforms: Coursera, DataCamp, DeepLearning.AI

# Clone repository
git clone https://github.com/manuel-reyes-ml/learning_journey.git
cd learning_journey

# See detailed setup guide (includes AI tools setup)
open getting_started/SETUP_GUIDE.md

# Verify environment
python getting_started/environment-verification.py

Setup: See Complete Setup Guide


💡 Learning Philosophy

Core Principles

GenAI-Enhanced Practice: Daily coding with AI pair programming, but always understanding and validating outputs

Production-First: Every project built to production standards with proper error handling, testing, and documentation

Transparent Integration: Document when/how AI assists, show reasoning not just outputs

Enhancement Always: Never just complete exercises—optimize, expand, and apply to real-world scenarios

Domain Application: Every skill applied to trading/finance domain for authentic learning

Systematic Progression: Clear 37-month path with measurable milestones

👨‍💻 Enhancement Philosophy

Beyond Basic Completion:

Every exercise is enhanced with additional functionality, error handling, testing, and real-world application. Now with AI assistance documented transparently.

Standard Approach:

# Calculate average
numbers = [1, 2, 3, 4, 5]
average = sum(numbers) / len(numbers)
print(f"Average: {average}")

My GenAI-Enhanced Approach:

def calculate_statistics(data: list[float], include_outliers: bool = True) -> dict:
    """
    Calculate comprehensive statistics with multiple methods.
    
    Args:
        data: List of numeric values
        include_outliers: Whether to include outlier analysis
    
    Returns:
        dict: Statistics including mean, median, mode, std dev
        
    Note: Developed with Cursor AI assistance for statistical functions
    """
    import statistics
    from collections import Counter
    
    if not data:
        raise ValueError("Cannot calculate statistics on empty dataset")
    
    stats = {
        'mean': statistics.mean(data),
        'median': statistics.median(data),
        'mode': statistics.mode(data) if len(Counter(data)) < len(data) else None,
        'std_dev': statistics.stdev(data) if len(data) > 1 else 0,
        'range': (min(data), max(data))
    }
    
    if include_outliers:
        q1 = statistics.quantiles(data, n=4)[0]
        q3 = statistics.quantiles(data, n=4)[2]
        iqr = q3 - q1
        stats['outliers'] = [x for x in data if x < (q1 - 1.5*iqr) or x > (q3 + 1.5*iqr)]
    
    return stats

# Apply to real trading data
stock_returns = [0.05, 0.03, -0.02, 0.04, 0.01, 0.15]  # 15% is potential outlier
analysis = calculate_statistics(stock_returns)
print(f"Return Analysis: {analysis}")

Study Commitment

25 hours/week structured as:

  • Mornings (4:30-6 AM): Theory, lectures, reading
  • Evenings (8-10 PM): Hands-on coding with AI tools, projects
  • Weekends: Deep work on complex projects and integration

Sustainable pace designed for 37-month journey while working full-time.


📊 Current Progress

Active Stage: 1 of 5 (GenAI-First Data Analyst & AI Engineer)
Projects: 1 deployed (production), 1 in development (ODI), 5 scoped and queued
Total Pipeline: 7 production-grade projects (easy → flagship)
Certifications: 8 in progress (including 3 GenAI-focused)
Study Hours: 25/week consistent

Next Milestones:

  • Complete DataVault Analyst (first AI project to publish)
  • Launch PolicyPulse (RAG foundation)
  • Launch FormSense (Multimodal AI)
  • Complete Operations-Demand-Intelligence (enterprise analytics)
  • Build StreamSmart Optimizer (consumer AI app)
  • Complete Attention-Flow Catalyst Phase 1A & 1B (flagship)
  • Finish 5 core certifications + IBM GenAI Engineering cert
  • Secure GenAI-First Data Analyst & AI Engineer role

🔗 Connect & Collaborate

Professional:

Portfolio:

Open To:

  • 💼 GenAI-First Data Analyst & AI Engineer opportunities (remote preferred)
  • 🤝 Networking with data professionals and traders
  • 🤖 GenAI tool and workflow discussions
  • 💡 Code reviews and technical discussions
  • 🎓 Mentorship (giving or receiving)

🤝 How to Engage

Welcome:

  • Code quality feedback and best practices
  • GenAI integration approaches and tool recommendations
  • Trading strategy discussions
  • Career advice and networking
  • Collaboration on projects

How:

  • Open GitHub issues for technical discussions
  • Connect on LinkedIn for professional networking
  • Comment on commits with feedback
  • Share your own GenAI-powered learning journey

💭 The Vision

This repository documents a complete career transformation: from business ops professional to Senior LLM Engineer, with GenAI/LLM engineering from Day 1.

What this represents:

  • 37-month systematic journey (5,000+ hours)
  • 7 production-grade projects demonstrating progressive skill mastery
  • Production systems with measurable business impact
  • GenAI-first approach positioning ahead of traditional candidates
  • Foundation for six-figure remote tech career
  • Path to building revenue-generating AI systems
  • Demonstration that structured learning + GenAI integration enables career reinvention

Ultimate goal: Production AI Trading Assistant combining deep finance expertise with cutting-edge agentic AI capabilities.


⭐ Follow the Journey

Real-time documentation of a GenAI-first career transformation from Day 1 to Senior LLM Engineer.

  • Star this repository to follow the journey
  • 🔔 Watch for updates on GenAI integration and project progress
  • 🔗 Connect for professional discussions and collaboration

💡 "37 months. 7 projects. GenAI-first from Day 1. Production code. Clear trajectory."

Current Stage: GenAI-First Data Analyst & AI Engineer (1 of 5) | Building GenAI-Enhanced Foundations
Status: 🟢 Active • Learning in Public • Deploying Production Systems

→ View Live Progress & Interactive Roadmap

About

37-month learning roadmap from Financial Services Professional to LLM Engineer. Includes comprehensive course notes (CS50, Python, SQL, IBM DA) and enhanced project implementations. Active learning documentation.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors