Skip to content

Feat/refactor search#42

Merged
linkai0924 merged 2 commits intomainfrom
feat/refactor-search
Apr 7, 2026
Merged

Feat/refactor search#42
linkai0924 merged 2 commits intomainfrom
feat/refactor-search

Conversation

@linkai0924
Copy link
Copy Markdown
Collaborator

@linkai0924 linkai0924 commented Apr 7, 2026

Summary by CodeRabbit

  • New Features

    • Security scans now include category classification data in all scan results.
    • Updated API responses to expose security status and category information.
  • Documentation

    • Updated design specifications and implementation roadmap with revised database schema and backend completion status.
  • Chores

    • Added database migration to support category field in scan records.
    • Introduced scan backfill tool for bulk reprocessing of historical scans with category data.

Integrate automatic category classification into the security scan workflow:

- Add fixed taxonomy of 12 allowed categories (frontend-development, backend-development, etc.)

- Update LLM system prompt to include category selection requirement

- Add Category field to SecurityScan model

- Validate and write back category to capability_items on scan completion

- Auto-create category in item_categories via CategoryService.EnsureCategory()

- Inject ScanJobService into SyncService for post-sync auto-scan trigger

- Update SKILL_SCAN_DESIGN.md to reflect backend implementation status

Signed-off-by: 林凯90331 <90331@sangfor.com>

Co-authored-by: CoStrict <zgsm@sangfor.com.cn>
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 7, 2026

📝 Walkthrough

Walkthrough

The pull request integrates category tracking into the capability item security scanning system. It updates the design document, adds scan backfill functionality, extends database models with category support, implements category validation and persistence in the scan service, and seeds taxonomy data via migration.

Changes

Cohort / File(s) Summary
Design & Documentation
docs/proposals/SKILL_SCAN_DESIGN.md
Updated security scan flow to use security_scans table instead of scan_results, adjusted API responses to reflect security_status instead of scan_status, and documented category taxonomy integration with validation constraints.
Build Configuration
package.json
Added backfill:scan npm script to execute scan backfill via Go worker CLI.
Submodule Reference
portal
Updated submodule commit pointer to incorporate related changes.
CLI & Backfill
server/cmd/worker/main.go
Implemented scan-backfill subcommand with CLI flags (--all, --security-status, --registry-id, --item-type, --limit, --trigger-type, --priority, --max-attempts, --dry-run) for filtering and re-enqueueing capability items. Refactored worker initialization to inject ScanJobService and CategorySvc into service dependencies.
Data Models
server/internal/models/models.go
Added Category field to SecurityScan struct with empty string default for JSON serialization.
Scan Service Logic
server/internal/services/scan_service.go
Extended scan workflow to validate category taxonomy, persist category to security_scans, propagate to capability_items, and invoke CategoryService.EnsureCategory(). Added allowedScanCategories constant, isValidScanCategory() validation helper, and reportCategoryValue() accessor. Updated ScanReport struct and ScanService with CategorySvc dependency.
Service Tests
server/internal/services/scan_service_test.go
Added comprehensive test TestScanItemUpdatesCategoryFromScanResult verifying category extraction from LLM response, persistence to database, and ItemCategory record creation.
Database Migration
server/migrations/20260407110000_add_scan_category_and_seed_taxonomy.sql
Adds non-nullable category column to security_scans table and seeds predefined category taxonomy entries into item_categories with localized names and sort order.

Sequence Diagram

sequenceDiagram
    participant Client as CLI/Sync
    participant Worker as ScanService
    participant LLM as LLM API
    participant DB as Database
    participant Cat as CategoryService

    Client->>Worker: ScanItem(capabilityItem)
    Worker->>LLM: callLLM(scan prompt + expected JSON schema)
    LLM-->>Worker: ScanReport{Category, RiskLevel, Verdict, ...}
    Worker->>Worker: isValidScanCategory(category)
    Worker->>DB: INSERT security_scans(category, ...)
    Worker->>DB: UPDATE capability_items SET security_status, last_scan_id
    alt category not empty
        Worker->>DB: UPDATE capability_items SET category
        Worker->>Cat: EnsureCategory(category slug)
        Cat->>DB: INSERT/UPDATE item_categories
    end
    Worker-->>Client: ScanResult{Category, ...}
Loading

Estimated Code Review Effort

🎯 3 (Moderate) | ⏱️ ~28 minutes

Possibly Related PRs

  • Feat/refactor search #41: Introduces the ItemCategory model and CategoryService with EnsureCategory() method that this PR directly integrates into the scan workflow via ScanService.CategorySvc injection and category persistence logic.

Poem

🐰 Categories hop into scans so bright,
A backfill command makes rescans right!
From LLM parsed to database stored,
Category flows like clover galore! 🌱

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Title check ⚠️ Warning The PR title 'Feat/refactor search' does not align with the changeset content, which is primarily about security scan functionality, database migrations, and worker backfill capabilities. Update the title to accurately reflect the main changes, such as 'Add security scan category support and backfill infrastructure' or 'Refactor scan service with category taxonomy integration'.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/refactor-search

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (4)
server/cmd/worker/main.go (1)

124-129: Consider reusing the CategoryService instance.

CategoryService is instantiated twice (line 88 and line 128). While functionally correct since both share the same DB, you could extract a single shared instance for minor memory efficiency and clearer dependency relationships.

♻️ Proposed refactor
+	categorySvc := &services.CategoryService{DB: db}
 	scanJobSvc := &services.ScanJobService{DB: db}
 	syncSvc := &services.SyncService{
 		DB:             db,
 		Git:            &services.GitService{TempBaseDir: tmpDir},
 		Parser:         &services.ParserService{},
 		ScanJobService: scanJobSvc,
-		CategorySvc:    &services.CategoryService{DB: db},
+		CategorySvc:    categorySvc,
 	}
 	// ...
 		scanSvc := &services.ScanService{
 			DB:          db,
 			LLMClient:   scanLLMClient,
 			ModelName:   llmCfg.Model,
-			CategorySvc: &services.CategoryService{DB: db},
+			CategorySvc: categorySvc,
 		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/cmd/worker/main.go` around lines 124 - 129, CategoryService is being
instantiated twice; create a single shared instance (e.g., categorySvc :=
&services.CategoryService{DB: db}) and reuse that variable when constructing
ScanService and any other consumers instead of allocating a second
&services.CategoryService{DB: db}; update the ScanService initialization
(scanSvc) to reference the shared categorySvc to eliminate the duplicate
allocation and clarify dependencies.
server/cmd/scan-backfill/main.go (1)

138-141: Use log.Printf instead of log.Println(fmt.Sprintf(...)).

log.Println(fmt.Sprintf(...)) is redundant and slightly less efficient. Use log.Printf directly.

♻️ Proposed fix
-	log.Println(fmt.Sprintf(
+	log.Printf(
 		"backfill complete: matched=%d enqueued=%d skipped=%d failed=%d",
 		len(items), enqueued, skipped, failed,
-	))
+	)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/cmd/scan-backfill/main.go` around lines 138 - 141, Replace the
redundant log.Println(fmt.Sprintf(...)) call with a direct log.Printf call:
locate the logging statement that currently calls
log.Println(fmt.Sprintf("backfill complete: matched=%d enqueued=%d skipped=%d
failed=%d", len(items), enqueued, skipped, failed)) and change it to use
log.Printf with the same format string and arguments so the formatted message is
produced directly (no fmt.Sprintf wrapper).
docs/proposals/SKILL_SCAN_DESIGN.md (1)

190-190: Minor: Add language specifier to fenced code block.

The code block starting at line 190 is missing a language identifier, which triggers a markdownlint warning and may affect syntax highlighting.

📝 Proposed fix
-```
+```text
 SyncService 写入 CapabilityItem(新增或更新)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/proposals/SKILL_SCAN_DESIGN.md` at line 190, The fenced code block
containing the line "SyncService 写入 CapabilityItem(新增或更新)" is missing a language
specifier; update that fence to include a language identifier (e.g., add ```text
before the line) so Markdown linting and syntax highlighting work correctly for
the block in SKILL_SCAN_DESIGN.md.
server/internal/services/scan_service.go (1)

226-228: Consider logging errors from EnsureCategory instead of silently discarding them.

The _, _ = s.CategorySvc.EnsureCategory(...) pattern silently ignores any errors. While taxonomy creation failures shouldn't block scan persistence, silently swallowing errors makes debugging difficult when categories aren't being created as expected.

♻️ Proposed fix to log errors
 	if scanRecord.Category != "" && s.CategorySvc != nil {
-		_, _ = s.CategorySvc.EnsureCategory(scanRecord.Category, "scan")
+		if _, err := s.CategorySvc.EnsureCategory(scanRecord.Category, "scan"); err != nil {
+			// Log but don't fail - category creation is non-critical
+			log.Printf("warn: failed to ensure category %q: %v", scanRecord.Category, err)
+		}
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/internal/services/scan_service.go` around lines 226 - 228, The call to
s.CategorySvc.EnsureCategory currently discards errors (_, _ =
s.CategorySvc.EnsureCategory...), so change it to capture the error and log it
instead of swallowing it; e.g., call _, err :=
s.CategorySvc.EnsureCategory(scanRecord.Category, "scan") and if err != nil log
a warning or error containing scanRecord.Category and the err using the
service's logger (e.g., s.Logger or s.log) or the standard log package if no
logger field exists so failures to create categories are visible but do not
block persistence.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@docs/proposals/SKILL_SCAN_DESIGN.md`:
- Line 190: The fenced code block containing the line "SyncService 写入
CapabilityItem(新增或更新)" is missing a language specifier; update that fence to
include a language identifier (e.g., add ```text before the line) so Markdown
linting and syntax highlighting work correctly for the block in
SKILL_SCAN_DESIGN.md.

In `@server/cmd/scan-backfill/main.go`:
- Around line 138-141: Replace the redundant log.Println(fmt.Sprintf(...)) call
with a direct log.Printf call: locate the logging statement that currently calls
log.Println(fmt.Sprintf("backfill complete: matched=%d enqueued=%d skipped=%d
failed=%d", len(items), enqueued, skipped, failed)) and change it to use
log.Printf with the same format string and arguments so the formatted message is
produced directly (no fmt.Sprintf wrapper).

In `@server/cmd/worker/main.go`:
- Around line 124-129: CategoryService is being instantiated twice; create a
single shared instance (e.g., categorySvc := &services.CategoryService{DB: db})
and reuse that variable when constructing ScanService and any other consumers
instead of allocating a second &services.CategoryService{DB: db}; update the
ScanService initialization (scanSvc) to reference the shared categorySvc to
eliminate the duplicate allocation and clarify dependencies.

In `@server/internal/services/scan_service.go`:
- Around line 226-228: The call to s.CategorySvc.EnsureCategory currently
discards errors (_, _ = s.CategorySvc.EnsureCategory...), so change it to
capture the error and log it instead of swallowing it; e.g., call _, err :=
s.CategorySvc.EnsureCategory(scanRecord.Category, "scan") and if err != nil log
a warning or error containing scanRecord.Category and the err using the
service's logger (e.g., s.Logger or s.log) or the standard log package if no
logger field exists so failures to create categories are visible but do not
block persistence.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 599b66a2-eaf3-48c5-9c8a-76e449cfb1ec

📥 Commits

Reviewing files that changed from the base of the PR and between 07ec96f and 5962260.

📒 Files selected for processing (9)
  • docs/proposals/SKILL_SCAN_DESIGN.md
  • package.json
  • portal
  • server/cmd/scan-backfill/main.go
  • server/cmd/worker/main.go
  • server/internal/models/models.go
  • server/internal/services/scan_service.go
  • server/internal/services/scan_service_test.go
  • server/migrations/20260407110000_add_scan_category_and_seed_taxonomy.sql

Add a new CLI tool to backfill scan jobs for capability items.

Features:

- Filter by security status, registry ID, or item type

- Support dry-run mode for previewing items

- Configurable priority and max attempts for scan jobs

- Add npm script 'backfill:scan' for convenient execution

Signed-off-by: 林凯90331 <90331@sangfor.com>

Co-authored-by: CoStrict <zgsm@sangfor.com.cn>
@linkai0924 linkai0924 force-pushed the feat/refactor-search branch from 5962260 to 92ab675 Compare April 7, 2026 07:36
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
server/cmd/worker/main.go (1)

63-65: ⚠️ Potential issue | 🟡 Minor

Reassign the post-migration GORM session to restore logging.

Line 82 discards the return value from db.Session(), which means the db variable retains the silent logger. The session with the restored originalLogger is never used. All subsequent code—including runPostMigrations(db) and service initialization—continues using the silent logger.

Suggested fix
-	// Re-enable GORM logging after migration
-	_ = db.Session(&gorm.Session{Logger: originalLogger})
+	// Re-enable GORM logging after migration
+	db = db.Session(&gorm.Session{Logger: originalLogger})
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/cmd/worker/main.go` around lines 63 - 65, The current code sets db =
db.Session(&gorm.Session{Logger: db.Logger.LogMode(gormlogger.Silent)}) to
silence logs but never restores the original logger; reassign the returned
session that uses originalLogger after migrations so subsequent calls (e.g.,
runPostMigrations(db) and service initialization) use the restored logger.
Locate the variables db and originalLogger and replace the discarded Session
call after AutoMigrate with a reassignment that sets db =
db.Session(&gorm.Session{Logger: originalLogger}) (or equivalent) to re-enable
the original GORM logger.
🧹 Nitpick comments (1)
server/cmd/worker/main.go (1)

11-11: Use errors.Is for ErrScanJobAlreadyQueued.

server/internal/services/scan_job_service.go:12 exposes a sentinel error. Message matching on Line 293 is brittle and will stop classifying wrapped sentinel errors correctly.

Suggested fix
-	"strings"
+	"errors"
@@
-		case err != nil && strings.Contains(err.Error(), services.ErrScanJobAlreadyQueued.Error()):
+		case errors.Is(err, services.ErrScanJobAlreadyQueued):

Also applies to: 293-295

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/cmd/worker/main.go` at line 11, The code is brittle because it matches
the error string to detect the sentinel ErrScanJobAlreadyQueued; instead import
the standard errors package and replace string-based message matching with
errors.Is(err, server/internal/services.ErrScanJobAlreadyQueued) (or errors.As
if you need to extract a wrapped error type), updating the check in the worker
main where the current err.Error() substring check is used (apply the same
change to the other occurrence in this file). Ensure the errors import is added
and remove the fragile string comparison so wrapped sentinel errors are
correctly recognized.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@server/cmd/worker/main.go`:
- Around line 206-211: The CLI currently allows negative values for the flags
`priority` and `maxAttempts` (set via `fs.IntVar`) which are later passed
through unchanged; add validation immediately after `_ = fs.Parse(args)` in
`main` to reject negatives: if `priority < 0` or `maxAttempts < 0` print a clear
error and exit (non-zero) so invalid `--priority`/`--max-attempts` are refused
rather than persisted; ensure the check uses the same variable names
(`priority`, `maxAttempts`, `dryRun`, `limit`, `triggerType`) so callers fail
fast before any call into the scan job creation/`scan_job_service` code.
- Around line 271-274: The current fallback to revision = 1 for items with no
version row (using revisionByItemID[item.ID]) can enqueue jobs for items lacking
any models.CapabilityVersion; instead, check the map presence with the comma-ok
idiom (or equivalent) when reading revisionByItemID for item.ID and skip the
item if the revision is missing or <= 0 (e.g., continue the loop) so only items
with an actual version row are enqueued; update the logic around
revisionByItemID, item.ID and the surrounding enqueuing code in main.go
accordingly.

---

Outside diff comments:
In `@server/cmd/worker/main.go`:
- Around line 63-65: The current code sets db = db.Session(&gorm.Session{Logger:
db.Logger.LogMode(gormlogger.Silent)}) to silence logs but never restores the
original logger; reassign the returned session that uses originalLogger after
migrations so subsequent calls (e.g., runPostMigrations(db) and service
initialization) use the restored logger. Locate the variables db and
originalLogger and replace the discarded Session call after AutoMigrate with a
reassignment that sets db = db.Session(&gorm.Session{Logger: originalLogger})
(or equivalent) to re-enable the original GORM logger.

---

Nitpick comments:
In `@server/cmd/worker/main.go`:
- Line 11: The code is brittle because it matches the error string to detect the
sentinel ErrScanJobAlreadyQueued; instead import the standard errors package and
replace string-based message matching with errors.Is(err,
server/internal/services.ErrScanJobAlreadyQueued) (or errors.As if you need to
extract a wrapped error type), updating the check in the worker main where the
current err.Error() substring check is used (apply the same change to the other
occurrence in this file). Ensure the errors import is added and remove the
fragile string comparison so wrapped sentinel errors are correctly recognized.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2720797f-b89c-432d-8b00-e8adf2fc7dda

📥 Commits

Reviewing files that changed from the base of the PR and between 5962260 and 92ab675.

📒 Files selected for processing (2)
  • package.json
  • server/cmd/worker/main.go
✅ Files skipped from review due to trivial changes (1)
  • package.json

Comment on lines +206 to +211
fs.IntVar(&limit, "limit", 0, "max number of items to enqueue (0 = no limit)")
fs.StringVar(&triggerType, "trigger-type", "manual", "scan job trigger type to record")
fs.IntVar(&priority, "priority", 1, "scan job priority")
fs.IntVar(&maxAttempts, "max-attempts", 2, "scan job max attempts")
fs.BoolVar(&dryRun, "dry-run", false, "preview items without inserting scan jobs")
_ = fs.Parse(args)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Reject negative queue controls.

Line 282 forwards priority and maxAttempts verbatim, and server/internal/services/scan_job_service.go:24-63 only normalizes zero. scan-backfill --priority=-1 or --max-attempts=-1 therefore persists invalid job settings instead of failing fast.

Suggested fix
 	fs.BoolVar(&dryRun, "dry-run", false, "preview items without inserting scan jobs")
 	_ = fs.Parse(args)
+	if priority < 0 {
+		log.Fatal("--priority must be >= 0")
+	}
+	if maxAttempts < 0 {
+		log.Fatal("--max-attempts must be >= 0")
+	}
 
 	cfg := config.Load()

Also applies to: 282-285

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/cmd/worker/main.go` around lines 206 - 211, The CLI currently allows
negative values for the flags `priority` and `maxAttempts` (set via `fs.IntVar`)
which are later passed through unchanged; add validation immediately after `_ =
fs.Parse(args)` in `main` to reject negatives: if `priority < 0` or `maxAttempts
< 0` print a clear error and exit (non-zero) so invalid
`--priority`/`--max-attempts` are refused rather than persisted; ensure the
check uses the same variable names (`priority`, `maxAttempts`, `dryRun`,
`limit`, `triggerType`) so callers fail fast before any call into the scan job
creation/`scan_job_service` code.

Comment on lines +271 to +274
revision := revisionByItemID[item.ID]
if revision <= 0 {
revision = 1
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Skip items that have no version row.

server/internal/services/scan_job_service.go:24-63 inserts whatever itemRevision it receives. Falling back to 1 on Lines 272-274 can enqueue pending jobs for items that have no models.CapabilityVersion at all.

Suggested fix
-		revision := revisionByItemID[item.ID]
-		if revision <= 0 {
-			revision = 1
-		}
+		revision, ok := revisionByItemID[item.ID]
+		if !ok || revision <= 0 {
+			skipped++
+			log.Printf("skipped item=%s slug=%s reason=no-capability-version", item.ID, item.Slug)
+			continue
+		}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
revision := revisionByItemID[item.ID]
if revision <= 0 {
revision = 1
}
revision, ok := revisionByItemID[item.ID]
if !ok || revision <= 0 {
skipped++
log.Printf("skipped item=%s slug=%s reason=no-capability-version", item.ID, item.Slug)
continue
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/cmd/worker/main.go` around lines 271 - 274, The current fallback to
revision = 1 for items with no version row (using revisionByItemID[item.ID]) can
enqueue jobs for items lacking any models.CapabilityVersion; instead, check the
map presence with the comma-ok idiom (or equivalent) when reading
revisionByItemID for item.ID and skip the item if the revision is missing or <=
0 (e.g., continue the loop) so only items with an actual version row are
enqueued; update the logic around revisionByItemID, item.ID and the surrounding
enqueuing code in main.go accordingly.

@linkai0924 linkai0924 merged commit f8979b0 into main Apr 7, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant