Skip to content

feat: add search_github_issues MCP tool and issues ingestion pipeline#140

Closed
kmr-rohit wants to merge 1 commit into
kubeflow:mainfrom
kmr-rohit:feat/github-issues-pipeline
Closed

feat: add search_github_issues MCP tool and issues ingestion pipeline#140
kmr-rohit wants to merge 1 commit into
kubeflow:mainfrom
kmr-rohit:feat/github-issues-pipeline

Conversation

@kmr-rohit
Copy link
Copy Markdown

Summary

  • Add search_github_issues MCP tool for semantic search across Kubeflow GitHub issues (kubeflow/kubeflow, kubeflow/pipelines, kubeflow/manifests, kubeflow/katib, kserve/kserve, kubeflow/website)
  • Extract shared _search_collection helper in MCP server to reduce duplication between docs and issues search
  • Add KFP pipeline (issues-pipeline.py) with comment-boundary-aware chunking for GitHub issues ingestion
  • Add standalone indexing script (index_real_issues.py) with incremental mode support
  • Update kagent Agent CRD with two-tool routing system prompt

Details

MCP Server (server.py)

  • New search_github_issues tool searches the issues_rag Milvus collection
  • Returns issue content with metadata: repo name, issue number, state, labels, and GitHub URL
  • Shared _search_collection helper eliminates code duplication

Issues Pipeline (pipelines/)

  • issues-pipeline.py: KFP pipeline with download_github_issues, chunk_and_embed_issues, store_issues_milvus components
  • issues_utils.py: Utility functions for parsing issue metadata, building prefixes, and smart chunking at comment boundaries (\n\n---\n)

Indexing Script (scripts/index_real_issues.py)

  • Fetches real GitHub issues via API, chunks with issues_utils, embeds with sentence-transformers, stores in Milvus
  • Supports incremental mode (default) — queries existing issue numbers per repo to avoid duplicates
  • --fresh flag for full re-index

Agent CRD (setup.yaml)

  • Adds search_github_issues to agent toolNames
  • Updated system prompt with two-tool routing: docs for "how to" questions, issues for error/bug troubleshooting

Test plan

  • Tested live on OCI cluster with 1,631 real issue chunks indexed across 6 repos
  • Verified semantic search returns relevant results (e.g., "KServe model 404 error" → correct issue)
  • Confirmed agent correctly routes between docs and issues tools
  • Unit tests for issues_utils.py chunking logic (planned for test-infrastructure branch)

🤖 Generated with Claude Code

@google-oss-prow
Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign franciscojavierarceo for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kmr-rohit
Copy link
Copy Markdown
Author

Screenshot 2026-03-17 at 2 24 44 AM

@SanthoshToorpu
Copy link
Copy Markdown
Contributor

Hey @kmr-rohit requested a few changes can you help me understand a few things here.

@kmr-rohit
Copy link
Copy Markdown
Author

Hi @SanthoshToorpu , this pr introduces 3 main things :

  1. KFP pipeline for ingesting issues with comment aware chunking , for a given repo into a new collection ( issues_rag )
  2. Script utilized to download github issue to process ingestion, support increamental mode with a flag. So that document chunk already ingested need not to be ingested unless a new comment is added.
  3. MCP tool search_github_issues which allows agent to fetch similar issues to user query from collection

@kmr-rohit
Copy link
Copy Markdown
Author

Closing in favor of #205, which consolidates this PR along with #143 into a single clean PR.

#205 includes all the work from here (issues pipeline, search_github_issues tool) plus:

Test coverage for all of this is in #206.

@kmr-rohit kmr-rohit closed this May 6, 2026
@cursor cursor Bot deleted the feat/github-issues-pipeline branch May 28, 2026 18:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants