Skip to content

Concurrent chat.message ingestion can bypass inputHash dedup #17

@oritwoen

Description

@oritwoen

What happened?

Noticed this while load-testing the OpenCode plugin hooks. chat.message appears to treat inputHash as a dedup key, but concurrent calls with the same payload can still create two rows.

In src/opencode.ts, the flow is check-then-insert:

  • lookup via observation.getByInputHash(projectId, hash)
  • if not found, call observation.add(...)

At the storage layer, src/core/db.ts has input_hash plus a non-unique index (idx_observations_project_input_hash), but no unique constraint for (project_id, input_hash). Under concurrency, both calls can miss the lookup and insert.

Expected:

  • one observation row
  • frequency incremented to reflect repeated event

Actual:

  • duplicate observation rows with frequency 1

How to reproduce

node --experimental-strip-types --input-type=module -e "import { mkdtempSync, rmSync } from 'node:fs'; import { join } from 'node:path'; import { tmpdir } from 'node:os'; import { createObsxaPlugin } from './src/opencode.ts'; import { createObsxa } from './src/index.ts'; const dir = mkdtempSync(join(tmpdir(), 'obsxa-plugin-')); const db = join(dir, 'obsxa.db'); const plugin = createObsxaPlugin({ db, projectId: 'p1', projectName: 'P1' }); const hooks = await plugin({ project: { id: 'p1' }, directory: dir, worktree: dir }); const msgIn = { sessionID: 's1', agent: 'assistant', messageID: 'm1' }; const msgOut = { message: { summary: { title: 'same' } }, parts: [{ type: 'text', text: 'same payload message that should dedupe by input hash' }] }; await Promise.all([hooks['chat.message']?.(msgIn, msgOut), hooks['chat.message']?.(msgIn, msgOut)]); const obsxa = await createObsxa({ db }); const rows = await obsxa.observation.list('p1'); console.log('count', rows.length); console.log('freqs', rows.map(r => r.frequency).join(',')); await obsxa.close(); await hooks.destroy?.(); rmSync(dir, { recursive: true, force: true });"

Observed output:

  • count 2
  • freqs 1,1

Anything else?

This looks like a data integrity bug for dedup semantics in plugin ingestion, not just a cosmetic duplicate. It can inflate counts and skew any downstream analysis that assumes one hashed payload maps to one observation identity.

Related code paths:

  • src/opencode.ts (findByHash, chat.message)
  • src/core/observation.ts (getByInputHash, add, incrementFrequency)
  • src/core/db.ts (observations.inputHash, idx_observations_project_input_hash)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions