-
Notifications
You must be signed in to change notification settings - Fork 1
feat: ingestion pipeline — ingestSource, atomic swap, examples, integration tests (tasks 4.5-4.8) #5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
johnmcollier
merged 2 commits into
redhat-ai-dev:main
from
johnmcollier:feat/epic4-ingestion-pipeline
Jun 19, 2026
Merged
feat: ingestion pipeline — ingestSource, atomic swap, examples, integration tests (tasks 4.5-4.8) #5
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,151 @@ | ||
| import crypto from "node:crypto"; | ||
| import type { Repositories } from "../db/init.js"; | ||
|
|
||
| interface ExampleSkill { | ||
| slug: string; | ||
| name: string; | ||
| description: string; | ||
| content: string; | ||
| } | ||
|
|
||
| const EXAMPLE_SKILLS: ExampleSkill[] = [ | ||
| { | ||
| slug: "git-conventional-commit", | ||
| name: "Git Conventional Commit", | ||
| description: "Writes a conventional commit message following the Conventional Commits specification.", | ||
| content: `--- | ||
| name: Git Conventional Commit | ||
| description: Writes a conventional commit message following the Conventional Commits specification. | ||
| allowed-tools: | ||
| - Bash | ||
| --- | ||
|
|
||
| ## Git Conventional Commit | ||
|
|
||
| Analyse the staged diff and write a well-formed [Conventional Commit](https://www.conventionalcommits.org/) message. | ||
|
|
||
| ### Format | ||
|
|
||
| \`\`\` | ||
| <type>(<scope>): <short summary> | ||
|
|
||
| [optional body] | ||
|
|
||
| [optional footer(s)] | ||
| \`\`\` | ||
|
|
||
| **Types:** \`feat\`, \`fix\`, \`docs\`, \`style\`, \`refactor\`, \`perf\`, \`test\`, \`chore\`, \`ci\`, \`build\`, \`revert\` | ||
|
|
||
| ### Instructions | ||
|
|
||
| 1. Run \`git diff --staged\` to inspect the changes. | ||
| 2. Choose the correct type based on what changed. | ||
| 3. Keep the summary under 72 characters, imperative mood, no period. | ||
| 4. Add a body if the change needs context that the diff alone cannot convey. | ||
| 5. Add a \`BREAKING CHANGE:\` footer if the change breaks any public API. | ||
| `, | ||
| }, | ||
| { | ||
| slug: "code-review-checklist", | ||
| name: "Code Review Checklist", | ||
| description: "Reviews a code change against a standard checklist of common issues.", | ||
| content: `--- | ||
| name: Code Review Checklist | ||
| description: Reviews a code change against a standard checklist of common issues. | ||
| allowed-tools: | ||
| - Read | ||
| - Bash | ||
| --- | ||
|
|
||
| ## Code Review Checklist | ||
|
|
||
| Review the provided code or diff against the following checklist and report findings. | ||
|
|
||
| ### Checklist | ||
|
|
||
| **Correctness** | ||
| - [ ] Logic is correct and handles edge cases | ||
| - [ ] Error paths are handled (exceptions, nulls, empty collections) | ||
| - [ ] No off-by-one errors | ||
|
|
||
| **Security** | ||
| - [ ] No secrets or credentials in code | ||
| - [ ] Inputs are validated and sanitised | ||
| - [ ] No SQL injection or command injection vectors | ||
|
|
||
| **Maintainability** | ||
| - [ ] Functions/methods are focused and small | ||
| - [ ] Names are descriptive and consistent | ||
| - [ ] Dead code has been removed | ||
|
|
||
| **Tests** | ||
| - [ ] New behaviour is covered by tests | ||
| - [ ] Existing tests still pass | ||
|
|
||
| ### Output | ||
|
|
||
| For each finding, report: **severity** (critical / major / minor / nit), **location** (file + line), and **recommendation**. | ||
| `, | ||
| }, | ||
| { | ||
| slug: "explain-code", | ||
| name: "Explain Code", | ||
| description: "Explains what a code block or file does in plain language.", | ||
| content: `--- | ||
| name: Explain Code | ||
| description: Explains what a code block or file does in plain language. | ||
| allowed-tools: | ||
| - Read | ||
| --- | ||
|
|
||
| ## Explain Code | ||
|
|
||
| Read the target code and explain it clearly for the intended audience. | ||
|
|
||
| ### Steps | ||
|
|
||
| 1. Identify the language and any key frameworks/libraries in use. | ||
| 2. Summarise the **purpose** of the code in one sentence. | ||
| 3. Walk through the **main logic flow** step by step. | ||
| 4. Call out any **non-obvious design decisions** or trade-offs. | ||
| 5. List **side effects** (I/O, mutations, external calls) if present. | ||
| 6. Flag any **potential bugs or issues** you notice while reading. | ||
|
|
||
| ### Output format | ||
|
|
||
| - Start with a one-sentence TL;DR. | ||
| - Use numbered steps for the logic walk-through. | ||
| - Use a short bullet list for side effects and issues. | ||
| - Avoid jargon unless the user's context makes it appropriate. | ||
| `, | ||
| }, | ||
| ]; | ||
|
|
||
| function sha256(content: string): string { | ||
| return crypto.createHash("sha256").update(content, "utf-8").digest("hex"); | ||
| } | ||
|
|
||
| export async function loadExamplesIfEmpty(repos: Repositories): Promise<void> { | ||
| if (repos.skills.count() !== 0 || repos.sources.findAll().length !== 0) { | ||
| return; | ||
| } | ||
|
|
||
| // Wrap both writes in a single transaction so a transient failure cannot | ||
| // leave a sources row with no skills (which would permanently suppress retry). | ||
| repos.skills.transactionSync(() => { | ||
| const source = repos.sources.create({ slug: "examples", url: "built-in" }); | ||
| repos.skills.upsertMany( | ||
| EXAMPLE_SKILLS.map((skill) => ({ | ||
| sourceId: source.id, | ||
| sourceSlug: "examples", | ||
| slug: skill.slug, | ||
| name: skill.name, | ||
| description: skill.description, | ||
| artifactType: "skill-md" as const, | ||
| digest: sha256(skill.content), | ||
| content: skill.content, | ||
| supportingFiles: [], | ||
| })) | ||
| ); | ||
| }); | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,136 @@ | ||
| import fs from "node:fs"; | ||
| import os from "node:os"; | ||
| import path from "node:path"; | ||
| import { clone, discoverSkills, parseFrontmatter, bundleSkill } from "./index.js"; | ||
| import type { Repositories } from "../db/init.js"; | ||
| import type { UpsertSkillInput } from "../db/types.js"; | ||
|
|
||
| export interface SkillIndexEntry { | ||
| slug: string; | ||
| name: string; | ||
| description: string; | ||
| allowedTools: string[]; | ||
| artifactType: "skill-md" | "archive"; | ||
| digest: string; | ||
| content: string; | ||
| supportingFiles: string[]; | ||
| } | ||
|
|
||
| export interface SkillFailure { | ||
| path: string; | ||
| reason: string; | ||
| } | ||
|
|
||
| export interface SyncReport { | ||
| discovered: number; | ||
| indexed: number; | ||
| failed: number; | ||
| failures: SkillFailure[]; | ||
| } | ||
|
|
||
| /** | ||
| * Inner pipeline: discover → parse → bundle → atomic swap. | ||
| * Exported so integration tests can call it directly without a real clone. | ||
| */ | ||
| export async function ingestFromClonedPath( | ||
| sourceId: number, | ||
| sourceSlug: string, | ||
| repoPath: string, | ||
| repos: Repositories | ||
| ): Promise<SyncReport> { | ||
| const candidates = discoverSkills(repoPath); | ||
| const indexed: SkillIndexEntry[] = []; | ||
| const failures: SkillFailure[] = []; | ||
|
|
||
| for (const candidate of candidates) { | ||
| const relativePath = path.relative(repoPath, candidate.skillMdPath); | ||
| try { | ||
| const bundleResult = await bundleSkill(candidate); | ||
|
|
||
| // For skill-md, the artifact IS the raw SKILL.md content — reuse it to | ||
| // avoid a second readFileSync. For archives the artifact is base64 tar.gz, | ||
| // so we read the file directly. | ||
| const skillMdContent = | ||
| bundleResult.artifactType === "skill-md" | ||
| ? bundleResult.artifact | ||
| : fs.readFileSync(candidate.skillMdPath, "utf-8"); | ||
|
|
||
| const fmResult = parseFrontmatter(skillMdContent); | ||
| if (!fmResult.ok) { | ||
| failures.push({ path: relativePath, reason: fmResult.reason }); | ||
| continue; | ||
| } | ||
| indexed.push({ | ||
| slug: candidate.slug, | ||
| name: fmResult.data.name, | ||
| description: fmResult.data.description, | ||
| allowedTools: fmResult.data.allowedTools, | ||
| artifactType: bundleResult.artifactType, | ||
| digest: bundleResult.digest, | ||
| content: bundleResult.artifact, | ||
| supportingFiles: candidate.supportingFiles, | ||
| }); | ||
| } catch (err) { | ||
| failures.push({ | ||
| path: relativePath, | ||
| reason: err instanceof Error ? err.message : String(err), | ||
| }); | ||
| } | ||
| } | ||
|
|
||
| atomicSwap(sourceId, sourceSlug, indexed, repos); | ||
|
|
||
| return { | ||
| discovered: candidates.length, | ||
| indexed: indexed.length, | ||
| failed: failures.length, | ||
| failures, | ||
| }; | ||
| } | ||
|
|
||
| /** | ||
| * Atomically replaces all skills for a source in a single SQLite transaction: | ||
| * deletes the old set and inserts the new set in one commit. | ||
| */ | ||
| export function atomicSwap( | ||
| sourceId: number, | ||
| sourceSlug: string, | ||
| skills: SkillIndexEntry[], | ||
| repos: Repositories | ||
| ): void { | ||
| const inputs: UpsertSkillInput[] = skills.map((s) => ({ | ||
| sourceId, | ||
| sourceSlug, | ||
| slug: s.slug, | ||
| name: s.name, | ||
| description: s.description, | ||
| artifactType: s.artifactType, | ||
| digest: s.digest, | ||
| content: s.content, | ||
| supportingFiles: s.supportingFiles, | ||
| })); | ||
|
|
||
| repos.skills.transactionSync(() => { | ||
| repos.skills.deleteBySource(sourceId); | ||
| repos.skills.upsertMany(inputs); | ||
| }); | ||
| } | ||
|
|
||
| /** | ||
| * Full ingestion pipeline: clone → discover → parse → bundle → atomic swap. | ||
| * Clone failures propagate up without being caught. | ||
| */ | ||
| export async function ingestSource( | ||
| sourceId: number, | ||
| sourceSlug: string, | ||
| url: string, | ||
| repos: Repositories | ||
| ): Promise<SyncReport> { | ||
| const tmpDir = path.join(os.tmpdir(), `rhess-sync-${sourceId}-${Date.now()}`); | ||
| try { | ||
| await clone(url, tmpDir); | ||
| return await ingestFromClonedPath(sourceId, sourceSlug, tmpDir, repos); | ||
| } finally { | ||
| fs.rmSync(tmpDir, { recursive: true, force: true }); | ||
| } | ||
| } | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.