You need to open an Issue on their kubeflow/docs-agent GitHub repository immediately. Use the specific technical details from the README to prove you have done your homework.
Here is a highly targeted draft you can post right now:
Title: GSoC 2026: Transitioning docs-agent to an Agentic RAG Architecture (KEP-867)
Hi @chasecadet, @tarekabouzeid, and Francisco,
I am highly interested in the Agentic RAG on Kubeflow project for GSoC 2026.
I have been reviewing the docs-agent repository and the KEP-867 architecture. I see the current implementation uses KFP for the ETL pipeline into Milvus and serves Llama 3.1-8B via KServe/vLLM with basic JSON tool-calling enabled.
To transition this into the true Agentic architecture you are looking for, I propose introducing a LangGraph state machine above the LLM. Instead of a single-pass retrieval, the agent would follow a reasoning loop:
Query Routing: Decompose the prompt to determine which specialized Milvus index (or GitHub tool) to target.
Context Grading: Evaluate the retrieved chunks. If the context is insufficient, the agent autonomously modifies its search parameters and re-queries before generating a final response.
I also noticed in the README's "Future Improvements" a goal to extract the embedding model into a standalone service rather than installing sentence-transformers in the pipeline every time.
My Question: For the GSoC proposal, should I include the architectural blueprint for decoupling the embedding service, or would you prefer I focus 100% of the scope strictly on the LangGraph agentic routing and KServe deployment manifests?
I am drafting my proposal now and would love your insight!
You need to open an Issue on their kubeflow/docs-agent GitHub repository immediately. Use the specific technical details from the README to prove you have done your homework.
Here is a highly targeted draft you can post right now:
Title: GSoC 2026: Transitioning docs-agent to an Agentic RAG Architecture (KEP-867)
Hi @chasecadet, @tarekabouzeid, and Francisco,
I am highly interested in the Agentic RAG on Kubeflow project for GSoC 2026.
I have been reviewing the docs-agent repository and the KEP-867 architecture. I see the current implementation uses KFP for the ETL pipeline into Milvus and serves Llama 3.1-8B via KServe/vLLM with basic JSON tool-calling enabled.
To transition this into the true Agentic architecture you are looking for, I propose introducing a LangGraph state machine above the LLM. Instead of a single-pass retrieval, the agent would follow a reasoning loop:
Query Routing: Decompose the prompt to determine which specialized Milvus index (or GitHub tool) to target.
Context Grading: Evaluate the retrieved chunks. If the context is insufficient, the agent autonomously modifies its search parameters and re-queries before generating a final response.
I also noticed in the README's "Future Improvements" a goal to extract the embedding model into a standalone service rather than installing sentence-transformers in the pipeline every time.
My Question: For the GSoC proposal, should I include the architectural blueprint for decoupling the embedding service, or would you prefer I focus 100% of the scope strictly on the LangGraph agentic routing and KServe deployment manifests?
I am drafting my proposal now and would love your insight!