feat: add search_github_issues MCP tool and issues ingestion pipeline#140
feat: add search_github_issues MCP tool and issues ingestion pipeline#140kmr-rohit wants to merge 1 commit into
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Hey @kmr-rohit requested a few changes can you help me understand a few things here. |
|
Hi @SanthoshToorpu , this pr introduces 3 main things :
|
|
Closing in favor of #205, which consolidates this PR along with #143 into a single clean PR. #205 includes all the work from here (issues pipeline,
Test coverage for all of this is in #206. |

Summary
search_github_issuesMCP tool for semantic search across Kubeflow GitHub issues (kubeflow/kubeflow, kubeflow/pipelines, kubeflow/manifests, kubeflow/katib, kserve/kserve, kubeflow/website)_search_collectionhelper in MCP server to reduce duplication between docs and issues searchissues-pipeline.py) with comment-boundary-aware chunking for GitHub issues ingestionindex_real_issues.py) with incremental mode supportDetails
MCP Server (
server.py)search_github_issuestool searches theissues_ragMilvus collection_search_collectionhelper eliminates code duplicationIssues Pipeline (
pipelines/)issues-pipeline.py: KFP pipeline withdownload_github_issues,chunk_and_embed_issues,store_issues_milvuscomponentsissues_utils.py: Utility functions for parsing issue metadata, building prefixes, and smart chunking at comment boundaries (\n\n---\n)Indexing Script (
scripts/index_real_issues.py)--freshflag for full re-indexAgent CRD (
setup.yaml)search_github_issuesto agent toolNamesTest plan
issues_utils.pychunking logic (planned for test-infrastructure branch)🤖 Generated with Claude Code