Skip to content

FalkorDB/GraphRAG-SDK

🧪 v1.0.0rc1 is available as a pre-release. Install with pip install graphrag-sdk --pre or pin ==1.0.0rc1. This is a breaking rewrite from v0.x. Stable users: pip install graphrag-sdk still gives you v0.8.2 by default.

GraphRAG-SDK

The simplest, most accurate GraphRAG framework built on FalkorDB

Python 3.10+ License: Apache 2.0 CI Discord GitHub Stars

knowledge-graph-construction-b

Most GraphRAG systems work in demos and break under production constraints. GraphRAG SDK was built from real deployments around a simple idea: the retrieval harness matters more than the model. The result is a modular, benchmark-leading framework with predictable cost and sensible defaults that gets you from raw documents to cited answers quickly.


Benchmarks

Rank System Fact retrieval Complex Contextual Creative Overall
1 FalkorDB GraphRAG SDK ◄ 65.22 58.63 69.54 57.08 63.73
2 AutoPrunedRetriever 45.99 62.80 83.10 62.97 63.72
3 G-Reasoner 60.07 53.92 71.28 50.48 58.94
4 HippoRAG2 60.14 53.38 64.10 48.28 56.48
5 Fast-GraphRAG 56.95 48.55 56.41 46.18 52.02
6 MS-GraphRAG (local) 49.29 50.93 64.40 39.10 50.93
7 RAG (w rerank) 60.92 42.93 51.30 38.26 48.35
8 LightRAG 58.62 49.07 48.85 23.80 45.09
9 HippoRAG 52.93 38.52 48.70 38.85 44.75

FalkorDB scored with gpt-4o-mini (Azure OpenAI) on the GraphRAG-Bench Novel dataset — 20 novels, 2,010 questions, automated evaluation (ROUGE-L + answer-correctness with gpt-4o-mini). Competitor numbers are sourced from the GraphRAG-Bench published leaderboard. See docs/benchmark.md for full methodology and reproduction instructions.


document-to-provenance-answer-flow-v1

Ingestion & Retrieval Pipeline

Area Item Execution Description
Ingestion 1. Load Sequential Read raw text from files (PDF, TXT) or strings.
Ingestion 2. Chunk Sequential Split content into overlapping text chunks.
Ingestion 3. Lexical Graph Sequential Create Document and Chunk nodes with provenance edges.
Ingestion 4. Extract Sequential Run GLiNER2 local NER and LLM-based relationship extraction.
Ingestion 5. Quality Filter Sequential Remove invalid extracted nodes (empty IDs, malformed shape).
Ingestion 6. Prune Sequential Filter nodes/relations against the schema; drop orphan relations.
Ingestion 7. Resolve Sequential Deduplicate entities (exact match, semantic, LLM-verified).
Ingestion 8. Write Sequential Persist graph updates with batched MERGE operations in FalkorDB.
Ingestion 9a. Mentions Parallel Link entities back to source chunks.
Ingestion 9b. Index Parallel Embed and index chunks for retrieval.
Retrieval Vector search Runtime Finds semantically similar chunks.
Retrieval Full-text search Runtime Matches exact terms and keywords.
Retrieval Cypher queries Runtime Executes structured graph lookups.
Retrieval Relationship expansion Runtime Traverses connected entities and context.
Retrieval Cosine reranking Runtime Reorders candidates by relevance.

💡 Every answer is traceable to its source chunks via MENTIONS edges. Pass return_context=True to completion() to get the retrieval trail alongside the answer.

Quick Start

1. Install and start FalkorDB

pip install graphrag-sdk[litellm]
docker run -d -p 6379:6379 -p 3000:3000 --name falkordb falkordb/falkordb:latest
export OPENAI_API_KEY="sk-..."

For PDF ingestion, install the pdf extra instead: pip install graphrag-sdk[litellm,pdf].

2. Ingest a document

import asyncio
from graphrag_sdk import GraphRAG, ConnectionConfig, LiteLLM, LiteLLMEmbedder

async def main():
    async with GraphRAG(
        connection=ConnectionConfig(host="localhost", graph_name="my_graph"),
        llm=LiteLLM(model="openai/gpt-5.4"),
        embedder=LiteLLMEmbedder(model="openai/text-embedding-3-large", dimensions=1536),
    ) as rag:
        # Ingest raw text (pass a file path with the `pdf` extra installed for PDFs)
        result = await rag.ingest(
            "my_doc",
            text="Alice Johnson is a software engineer at Acme Corp in London.",
        )
        print(f"Nodes: {result.nodes_created}, Edges: {result.relationships_created}")

        # Finalize: deduplicate entities, backfill embeddings, create indexes
        await rag.finalize()

        # Full RAG: retrieve + generate
        answer = await rag.completion("Where does Alice work?")
        print(answer.answer)

asyncio.run(main())

3. Define a schema (optional)

from graphrag_sdk import GraphSchema, EntityType, RelationType, SchemaPattern

schema = GraphSchema(
    entities=[
        EntityType(label="Person", description="A human being"),
        EntityType(label="Organization", description="A company or institution"),
        EntityType(label="Location", description="A geographic location"),
    ],
    relations=[
        RelationType(label="WORKS_AT", description="Is employed by"),
        RelationType(label="LOCATED_IN", description="Is situated in"),
    ],
    patterns=[
        SchemaPattern(source="Person", relationship="WORKS_AT", target="Organization"),
        SchemaPattern(source="Organization", relationship="LOCATED_IN", target="Location"),
    ],
)

async with GraphRAG(
    connection=ConnectionConfig(host="localhost", graph_name="my_graph"),
    llm=LiteLLM(model="openai/gpt-5.4"),
    embedder=LiteLLMEmbedder(model="openai/text-embedding-3-large", dimensions=1536),
    schema=schema,
) as rag:
    ...  # ingest / completion as above

Examples

# Example What it demonstrates
1 Quick Start Minimal ingest + query
2 PDF with Schema PDF ingestion with custom entity types
3 Custom Strategies Benchmark-winning pipeline configuration
4 Custom Provider Implement your own LLM/Embedder
5 Notebook Demo Interactive walkthrough with provenance inspection

Documentation

Guide Description
Getting Started Step-by-step tutorial from install to first query
Architecture Pipeline design, graph schema, retrieval strategy
Configuration Connection, providers, and tuning reference
Strategies All ABCs and built-in implementations
Providers LLM and embedder configuration guide
Benchmark Methodology, results, and reproduction instructions
API Reference Full API documentation

Contributing

We welcome contributions! See CONTRIBUTING.md for development setup, testing, and code style guidelines.

Please read our Code of Conduct before participating.

Community


Citation

If you use GraphRAG SDK in your research, please cite:

@software{graphrag_sdk,
  title  = {GraphRAG SDK: A Modular Graph RAG Framework},
  author = {FalkorDB},
  year   = {2026},
  url    = {https://github.com/FalkorDB/GraphRAG-SDK},
}

License

Apache License 2.0

About

Build fast and accurate GenAI apps with GraphRAG SDK at scale.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors