GsRag supports 6 query modes:
| Mode | Keywords | Vector Search | Graph Traversal | Chunk Retrieval | Use Case |
|---|---|---|---|---|---|
local |
low-level | entity vectors | Entity neighbors | No | Entity-specific questions |
global |
high-level | relation vectors | From relations | No | Broad/topic questions |
hybrid |
both | entity + relation | Both paths | No | Balanced local+global |
naive |
— | chunk vectors | No | Yes | Simple text retrieval |
mix |
both | entity + relation + chunk | Both paths | Yes | All retrieval paths combined |
bypass |
— | None | None | None | Direct LLM call, no RAG |
Three levels of query, each returning progressively more data:
// Level 1 — Standard: best for most use cases
const result: QueryResult = await gsrag.query("What is...", { mode: "hybrid" });
console.log(result.content); // LLM-generated answer
// Level 2 — Raw data: retrieval context only, no LLM generation
const data = await gsrag.queryData("What is...", { mode: "hybrid" });
// Inspect entities, relations, chunks returned
// Level 3 — Full pipeline: includes llmResponse
const raw = await gsrag.queryLlm("What is...", { mode: "hybrid" });
console.log(raw.llmResponse?.content); // Full LLM payload// Object style (recommended)
await gsrag.query({ query: "...", mode: "hybrid", topK: 20 });
// String + param style
await gsrag.query("...", new QueryParam({ mode: "hybrid", topK: 20 }));
await gsrag.query("...", { mode: "hybrid", enableRerank: false });import { QueryParam } from "@gsrag/core";
const param = new QueryParam({
mode: "local",
topK: 30,
chunkTopK: 15,
maxEntityTokens: 4000,
maxRelationTokens: 5000,
maxTotalTokens: 20000,
enableRerank: false,
stream: false,
onlyNeedContext: false,
onlyNeedPrompt: false,
conversationHistory: [
{ role: "user", content: "Previous question" },
{ role: "assistant", content: "Previous answer" },
],
historyTurns: 2,
userPrompt: "Focus on technical details.",
hlKeywords: ["AI", "machine learning"],
llKeywords: ["transformer", "attention"],
includeReferences: true,
});See Configuration for all default values and env overrides.
const result = await gsrag.query("Tell me a story...", { stream: true });
if (result.isStreaming && result.responseIterator) {
for await (const chunk of result.responseIterator) {
process.stdout.write(chunk);
}
}class QueryResult {
content?: string; // Generated answer text
responseIterator?: AsyncIterable<string>; // Streaming chunks
rawData?: QueryRawData; // Structured retrieval data
isStreaming: boolean;
// Convenience properties:
get referenceList(): Array<{ referenceId: string; filePath: string }>;
get metadata(): Record<string, unknown>;
get status(): "success" | "failure" | undefined;
get data(): Record<string, unknown> | undefined; // entities, relations, chunks
get llmResponse(): Record<string, unknown> | undefined;
}User Query
│
▼
Keyword Extraction (LLM or fallback)
│
├── local ──► Entity vector search ──► Graph neighbor traversal
├── global ──► Relation vector search ──► Entity lookup
├── hybrid ──► Both paths (round-robin merge)
├── mix ──► Both paths + chunk search
└── bypass ──► Skip retrieval
│
▼
Context Assembly → Reranking (optional) → Token Truncation
│
▼
LLM Generation → QueryResult
Query responses are cached when enableLlmCache is enabled. Cache keys are computed from query text, mode, and parameters. Cached responses are returned instantly without calling the LLM.