GraphRAG-SDK

🧪 v1.0.0rc1 is available as a pre-release. Install with pip install graphrag-sdk --pre or pin ==1.0.0rc1. This is a breaking rewrite from v0.x. Stable users: pip install graphrag-sdk still gives you v0.8.2 by default.

GraphRAG-SDK

The simplest, most accurate GraphRAG framework built on FalkorDB

Most GraphRAG systems work in demos and break under production constraints. GraphRAG SDK was built from real deployments around a simple idea: the retrieval harness matters more than the model. The result is a modular, benchmark-leading framework with predictable cost and sensible defaults that gets you from raw documents to cited answers quickly.

Benchmarks

Rank	System	Fact retrieval	Complex	Contextual	Creative	Overall
1	`FalkorDB GraphRAG SDK ◄`	`65.22`	`58.63`	`69.54`	`57.08`	`63.73`
2	AutoPrunedRetriever	45.99	62.80	83.10	62.97	63.72
3	G-Reasoner	60.07	53.92	71.28	50.48	58.94
4	HippoRAG2	60.14	53.38	64.10	48.28	56.48
5	Fast-GraphRAG	56.95	48.55	56.41	46.18	52.02
6	MS-GraphRAG (local)	49.29	50.93	64.40	39.10	50.93
7	RAG (w rerank)	60.92	42.93	51.30	38.26	48.35
8	LightRAG	58.62	49.07	48.85	23.80	45.09
9	HippoRAG	52.93	38.52	48.70	38.85	44.75

FalkorDB scored with gpt-4o-mini (Azure OpenAI) on the GraphRAG-Bench Novel dataset — 20 novels, 2,010 questions, automated evaluation (ROUGE-L + answer-correctness with gpt-4o-mini). Competitor numbers are sourced from the GraphRAG-Bench published leaderboard. See docs/benchmark.md for full methodology and reproduction instructions.

Ingestion & Retrieval Pipeline

Area	Item	Execution	Description
Ingestion	1. Load	Sequential	Read raw text from files (PDF, TXT) or strings.
Ingestion	2. Chunk	Sequential	Split content into overlapping text chunks.
Ingestion	3. Lexical Graph	Sequential	Create `Document` and `Chunk` nodes with provenance edges.
Ingestion	4. Extract	Sequential	Run GLiNER2 local NER and LLM-based relationship extraction.
Ingestion	5. Quality Filter	Sequential	Remove invalid extracted nodes (empty IDs, malformed shape).
Ingestion	6. Prune	Sequential	Filter nodes/relations against the schema; drop orphan relations.
Ingestion	7. Resolve	Sequential	Deduplicate entities (exact match, semantic, LLM-verified).
Ingestion	8. Write	Sequential	Persist graph updates with batched `MERGE` operations in FalkorDB.
Ingestion	9a. Mentions	Parallel	Link entities back to source chunks.
Ingestion	9b. Index	Parallel	Embed and index chunks for retrieval.
Retrieval	Vector search	Runtime	Finds semantically similar chunks.
Retrieval	Full-text search	Runtime	Matches exact terms and keywords.
Retrieval	Cypher queries	Runtime	Executes structured graph lookups.
Retrieval	Relationship expansion	Runtime	Traverses connected entities and context.
Retrieval	Cosine reranking	Runtime	Reorders candidates by relevance.

💡 Every answer is traceable to its source chunks via MENTIONS edges. Pass return_context=True to completion() to get the retrieval trail alongside the answer.

Quick Start

1. Install and start FalkorDB

pip install graphrag-sdk[litellm]
docker run -d -p 6379:6379 -p 3000:3000 --name falkordb falkordb/falkordb:latest
export OPENAI_API_KEY="sk-..."

For PDF ingestion, install the pdf extra instead: pip install graphrag-sdk[litellm,pdf].

2. Ingest a document

import asyncio
from graphrag_sdk import GraphRAG, ConnectionConfig, LiteLLM, LiteLLMEmbedder

async def main():
    async with GraphRAG(
        connection=ConnectionConfig(host="localhost", graph_name="my_graph"),
        llm=LiteLLM(model="openai/gpt-5.4"),
        embedder=LiteLLMEmbedder(model="openai/text-embedding-3-large", dimensions=1536),
    ) as rag:
        # Ingest raw text (pass a file path with the `pdf` extra installed for PDFs)
        result = await rag.ingest(
            "my_doc",
            text="Alice Johnson is a software engineer at Acme Corp in London.",
        )
        print(f"Nodes: {result.nodes_created}, Edges: {result.relationships_created}")

        # Finalize: deduplicate entities, backfill embeddings, create indexes
        await rag.finalize()

        # Full RAG: retrieve + generate
        answer = await rag.completion("Where does Alice work?")
        print(answer.answer)

asyncio.run(main())

3. Define a schema (optional)

from graphrag_sdk import GraphSchema, EntityType, RelationType, SchemaPattern

schema = GraphSchema(
    entities=[
        EntityType(label="Person", description="A human being"),
        EntityType(label="Organization", description="A company or institution"),
        EntityType(label="Location", description="A geographic location"),
    ],
    relations=[
        RelationType(label="WORKS_AT", description="Is employed by"),
        RelationType(label="LOCATED_IN", description="Is situated in"),
    ],
    patterns=[
        SchemaPattern(source="Person", relationship="WORKS_AT", target="Organization"),
        SchemaPattern(source="Organization", relationship="LOCATED_IN", target="Location"),
    ],
)

async with GraphRAG(
    connection=ConnectionConfig(host="localhost", graph_name="my_graph"),
    llm=LiteLLM(model="openai/gpt-5.4"),
    embedder=LiteLLMEmbedder(model="openai/text-embedding-3-large", dimensions=1536),
    schema=schema,
) as rag:
    ...  # ingest / completion as above

Examples

#	Example	What it demonstrates
1	Quick Start	Minimal ingest + query
2	PDF with Schema	PDF ingestion with custom entity types
3	Custom Strategies	Benchmark-winning pipeline configuration
4	Custom Provider	Implement your own LLM/Embedder
5	Notebook Demo	Interactive walkthrough with provenance inspection

Documentation

Guide	Description
Getting Started	Step-by-step tutorial from install to first query
Architecture	Pipeline design, graph schema, retrieval strategy
Configuration	Connection, providers, and tuning reference
Strategies	All ABCs and built-in implementations
Providers	LLM and embedder configuration guide
Benchmark	Methodology, results, and reproduction instructions
API Reference	Full API documentation

Contributing

We welcome contributions! See CONTRIBUTING.md for development setup, testing, and code style guidelines.

Please read our Code of Conduct before participating.

Community

Discord -- Ask questions, share what you build
GitHub Discussions -- Feature ideas, Q&A
Issues -- Bug reports and feature requests

Citation

If you use GraphRAG SDK in your research, please cite:

@software{graphrag_sdk,
  title  = {GraphRAG SDK: A Modular Graph RAG Framework},
  author = {FalkorDB},
  year   = {2026},
  url    = {https://github.com/FalkorDB/GraphRAG-SDK},
}

License

Apache License 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 183 Commits
.github		.github
assets		assets
docs		docs
graphrag_sdk		graphrag_sdk
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
docker-compose.yml		docker-compose.yml
mkdocs.yml		mkdocs.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GraphRAG-SDK

The simplest, most accurate GraphRAG framework built on FalkorDB

Benchmarks

Ingestion & Retrieval Pipeline

Quick Start

1. Install and start FalkorDB

2. Ingest a document

3. Define a schema (optional)

Examples

Documentation

Contributing

Community

Citation

License

About

Uh oh!

Releases 23

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GraphRAG-SDK

The simplest, most accurate GraphRAG framework built on FalkorDB

Benchmarks

Ingestion & Retrieval Pipeline

Quick Start

1. Install and start FalkorDB

2. Ingest a document

3. Define a schema (optional)

Examples

Documentation

Contributing

Community

Citation

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 23

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages