Skip to content

Latest commit

 

History

History
128 lines (103 loc) · 3.93 KB

File metadata and controls

128 lines (103 loc) · 3.93 KB

Query

Query Modes

GsRag supports 6 query modes:

Mode Keywords Vector Search Graph Traversal Chunk Retrieval Use Case
local low-level entity vectors Entity neighbors No Entity-specific questions
global high-level relation vectors From relations No Broad/topic questions
hybrid both entity + relation Both paths No Balanced local+global
naive chunk vectors No Yes Simple text retrieval
mix both entity + relation + chunk Both paths Yes All retrieval paths combined
bypass None None None Direct LLM call, no RAG

Query Methods

Three levels of query, each returning progressively more data:

// Level 1 — Standard: best for most use cases
const result: QueryResult = await gsrag.query("What is...", { mode: "hybrid" });
console.log(result.content); // LLM-generated answer

// Level 2 — Raw data: retrieval context only, no LLM generation
const data = await gsrag.queryData("What is...", { mode: "hybrid" });
// Inspect entities, relations, chunks returned

// Level 3 — Full pipeline: includes llmResponse
const raw = await gsrag.queryLlm("What is...", { mode: "hybrid" });
console.log(raw.llmResponse?.content); // Full LLM payload

Two Call Signatures

// Object style (recommended)
await gsrag.query({ query: "...", mode: "hybrid", topK: 20 });

// String + param style
await gsrag.query("...", new QueryParam({ mode: "hybrid", topK: 20 }));
await gsrag.query("...", { mode: "hybrid", enableRerank: false });

QueryParam

import { QueryParam } from "@gsrag/core";

const param = new QueryParam({
  mode: "local",
  topK: 30,
  chunkTopK: 15,
  maxEntityTokens: 4000,
  maxRelationTokens: 5000,
  maxTotalTokens: 20000,
  enableRerank: false,
  stream: false,
  onlyNeedContext: false,
  onlyNeedPrompt: false,
  conversationHistory: [
    { role: "user", content: "Previous question" },
    { role: "assistant", content: "Previous answer" },
  ],
  historyTurns: 2,
  userPrompt: "Focus on technical details.",
  hlKeywords: ["AI", "machine learning"],
  llKeywords: ["transformer", "attention"],
  includeReferences: true,
});

See Configuration for all default values and env overrides.

Streaming

const result = await gsrag.query("Tell me a story...", { stream: true });

if (result.isStreaming && result.responseIterator) {
  for await (const chunk of result.responseIterator) {
    process.stdout.write(chunk);
  }
}

QueryResult

class QueryResult {
  content?: string;                           // Generated answer text
  responseIterator?: AsyncIterable<string>;    // Streaming chunks
  rawData?: QueryRawData;                      // Structured retrieval data
  isStreaming: boolean;

  // Convenience properties:
  get referenceList(): Array<{ referenceId: string; filePath: string }>;
  get metadata(): Record<string, unknown>;
  get status(): "success" | "failure" | undefined;
  get data(): Record<string, unknown> | undefined;  // entities, relations, chunks
  get llmResponse(): Record<string, unknown> | undefined;
}

Query Flow

User Query
    │
    ▼
Keyword Extraction (LLM or fallback)
    │
    ├── local  ──► Entity vector search  ──► Graph neighbor traversal
    ├── global ──► Relation vector search ──► Entity lookup
    ├── hybrid ──► Both paths (round-robin merge)
    ├── mix    ──► Both paths + chunk search
    └── bypass ──► Skip retrieval
    │
    ▼
Context Assembly → Reranking (optional) → Token Truncation
    │
    ▼
LLM Generation → QueryResult

Cache

Query responses are cached when enableLlmCache is enabled. Cache keys are computed from query text, mode, and parameters. Cached responses are returned instantly without calling the LLM.