What do you want to be done?
Implement the GDS (Graph Data Science) extension for NeuG based on the design specification in PR #64 (feat: Add GDS Design Spec).
This is a major feature that brings Graph Data Science capabilities to NeuG, enabling users to run complex graph algorithms on projected subgraphs without copying data.
Background
PR #64 introduced the comprehensive GDS design specification (specs/004-gds/spec.md, 1120+ lines) which covers the full stack from product requirements to implementation details:
- 8 core graph algorithms: PageRank, Connected Components, Louvain, Leiden, Label Propagation, Weakly Connected Components, Shortest Path, BFS
- User-facing Cypher API:
INSTALL/LOAD EXTENSION, project_graph, CALL algo
- C++ implementation structures:
ProjectedSubgraph, GDSGraph, GDSAlgo physical plan
- Developer extension API and prioritized roadmap
- Platform support: Multi-platform .so distribution (Linux/macOS, x86_64/ARM64)
Implementation Requirements
1. Extension Infrastructure
Goal: Enable NeuG to load and manage external extensions dynamically.
Key Files to Create/Modify:
include/neug/extension/extension_api.h - Extension entry point and callback interfaces
include/neug/extension/extension_registry.h - Extension catalog management
src/extension/extension_registry.cpp - Implementation
src/extension/extension_loader.cpp - dlopen/dlclose wrapper
CMakeLists.txt updates for extension build targets
2. Graph Projection (ProjectedSubgraph)
Goal: Create zero-copy subgraph projections that store labels + predicates, not duplicated data.
Key Files to Create/Modify:
include/neug/gds/projected_subgraph.h - ProjectedSubgraph definition
src/gds/projected_subgraph.cpp - Implementation
include/neug/main/session.h - Add subgraph context storage
src/compiler/ - Add CALL procedure parser support
3. GDSGraph Runtime View
Goal: Provide a runtime view over the full graph with predicate filtering applied at scan time.
Key Files to Create/Modify:
include/neug/gds/gds_graph.h - GDSGraph definition
src/gds/gds_graph.cpp - Implementation with CSR integration
4. Algorithm Implementation
Goal: Implement 8 graph algorithms with proper result streaming.
Phase 1 - Core Algorithms (MVP):
Phase 2 - Extended Algorithms:
Key Files to Create/Modify:
include/neug/gds/algorithms/ - Algorithm headers (one per algo)
src/gds/algorithms/ - Algorithm implementations
- Each algorithm should follow the pattern:
class PageRankAlgo : public GDSAlgoBase {
Result Execute(const GDSGraph& graph, const AlgoParams& params) override;
};
5. Physical Plan Integration (GDSAlgo)
Goal: Integrate GDS algorithms into the NeuG query execution pipeline.
Key Files to Create/Modify:
include/neug/execution/gds_algo_physical_plan.h
src/execution/gds_algo_physical_plan.cpp
src/compiler/ - Add logical-to-physical plan conversion for GDS calls
6. Cypher Parser Support
Goal: Parse GDS-specific Cypher syntax and integrate with query pipeline.
Key Files to Modify:
src/compiler/parser/ - ANTLR4 grammar updates
src/compiler/binder/ - Function binding for GDS procedures
src/compiler/ - Logical plan generation
Technical Notes
Key Design Decisions (from spec):
- Zero-copy ProjectedSubgraph: Store predicates, not duplicated data. Apply filters at runtime during algorithm execution.
- Triplet Edge Labels: Edge projections use
<src_label, edge_type, dst_label> format for precise filtering.
- Weight Semantics:
shortest_path with weight_property: null is equivalent to BFS, but bfs procedure has additional max_depth parameter - document this distinction clearly.
- Platform Support: Multi-platform .so distribution requires platform detection at install time.
Architecture Overview (from spec sequence diagram):
User → NeuG: INSTALL EXTENSION 'gds'
NeuG → OSS: Download libjson.neug_extension
OSS → NeuG: .so file
User → NeuG: LOAD EXTENSION 'gds'
NeuG → ExtReg: dlopen() → call Init()
ExtReg → NeuG: Functions registered in catalog
User → NeuG: CALL project_graph('g', {Person:'true'}, {KNOWS:'true'})
NeuG → SubgraphCtx: Store ProjectedSubgraph (labels + predicates, no data copy)
SubgraphCtx → NeuG: OK
User → NeuG: CALL k_core('g', {min_k:3}) YIELD node, core_number
NeuG → SubgraphCtx: Lookup ProjectedSubgraph by name 'g'
SubgraphCtx → NeuG: VertexEntries + EdgeEntries
NeuG → NeuG: Compile to GDSAlgo physical plan (bind label_ids + Expression predicates)
NeuG → GDSAlgo: Execute with GDSGraph (scan full graph, apply predicates at runtime)
GDSAlgo → NeuG: (node, core_number) tuples
NeuG → User: Result set
Known Issues from Spec Review (PR #64 Greptile comments):
- Algorithm count mismatch in "AI/GraphRAG 刚需算法" section (claims 3, lists 2) - needs fix in spec
- Malformed platform support table in §2.2 - needs fix in spec
- Ambiguous semantics between
bfs and shortest_path with no weight - needs clarification in implementation
Acceptance Criteria
Implementation Phases
| Phase |
Scope |
Priority |
| Phase 0 |
Extension infrastructure + build system |
P0 |
| Phase 1 |
ProjectedSubgraph + GDSGraph runtime view |
P0 |
| Phase 2 |
Phase 1 algorithms (PageRank, CC, Label Prop, Leiden) |
P0 |
| Phase 3 |
Physical plan integration + Cypher parser |
P0 |
| Phase 4 |
Phase 2 algorithms (Louvain, WCC, Shortest Path, BFS) |
P1 |
| Phase 5 |
Python bindings + benchmarks |
P1 |
Related
What do you want to be done?
Implement the GDS (Graph Data Science) extension for NeuG based on the design specification in PR #64 (feat: Add GDS Design Spec).
This is a major feature that brings Graph Data Science capabilities to NeuG, enabling users to run complex graph algorithms on projected subgraphs without copying data.
Background
PR #64 introduced the comprehensive GDS design specification (
specs/004-gds/spec.md, 1120+ lines) which covers the full stack from product requirements to implementation details:INSTALL/LOAD EXTENSION,project_graph,CALL algoProjectedSubgraph,GDSGraph,GDSAlgophysical planImplementation Requirements
1. Extension Infrastructure
Goal: Enable NeuG to load and manage external extensions dynamically.
INSTALL EXTENSION 'gds'Cypher command.neug_extensionfile from OSS or load from local filesystemLOAD EXTENSION 'gds'Cypher commanddlopen()the extension.sofileInit()function to register functions in cataloginclude/neug/extension/)Key Files to Create/Modify:
include/neug/extension/extension_api.h- Extension entry point and callback interfacesinclude/neug/extension/extension_registry.h- Extension catalog managementsrc/extension/extension_registry.cpp- Implementationsrc/extension/extension_loader.cpp- dlopen/dlclose wrapperCMakeLists.txtupdates for extension build targets2. Graph Projection (
ProjectedSubgraph)Goal: Create zero-copy subgraph projections that store labels + predicates, not duplicated data.
struct ProjectedSubgraphcontaining:std::unordered_map<std::string, VertexEntry>- node labels + filter expressionsstd::unordered_map<std::string, EdgeEntry>- edge labels + filter expressions (triplet format:<src_label, edge_type, dst_label>)struct VertexEntry- stores label filter (e.g.,"Person: true"or"User.age > 30")struct EdgeEntry- stores triplet edge label + relationship filterProjectedSubgraphin session context keyed by nameCALL project_graph('g', {Person:'true'}, {KNOWS:'true'})Key Files to Create/Modify:
include/neug/gds/projected_subgraph.h- ProjectedSubgraph definitionsrc/gds/projected_subgraph.cpp- Implementationinclude/neug/main/session.h- Add subgraph context storagesrc/compiler/- Add CALL procedure parser support3. GDSGraph Runtime View
Goal: Provide a runtime view over the full graph with predicate filtering applied at scan time.
struct GDSGraphwrapping:ProjectedSubgraphfor label/predicate infoExpressionobjectsKey Files to Create/Modify:
include/neug/gds/gds_graph.h- GDSGraph definitionsrc/gds/gds_graph.cpp- Implementation with CSR integration4. Algorithm Implementation
Goal: Implement 8 graph algorithms with proper result streaming.
Phase 1 - Core Algorithms (MVP):
PageRank
CALL page_rank('g', {damping: 0.85, tolerance: 1e-6, max_iter: 100})YIELD node, rankConnected Components
CALL connected_components('g')YIELD node, componentLabel Propagation
CALL label_propagation('g', {max_iter: 10})YIELD node, communityLeiden (AI/GraphRAG use case)
CALL leiden('g', {resolution: 1.0, max_iter: 10})YIELD node, community, qualityPhase 2 - Extended Algorithms:
Louvain
CALL louvain('g', {resolution: 1.0})YIELD node, community, modularityWeakly Connected Components (WCC)
Shortest Path
CALL shortest_path('g', {source: 123, target: 456, weight_property: 'distance'})YIELD node, distance, pathweight_property: nullshould be documented as equivalent to BFSBFS
CALL bfs('g', {source: 123, max_depth: 5})YIELD node, depthshortest_pathdue tomax_depthparameterKey Files to Create/Modify:
include/neug/gds/algorithms/- Algorithm headers (one per algo)src/gds/algorithms/- Algorithm implementations5. Physical Plan Integration (
GDSAlgo)Goal: Integrate GDS algorithms into the NeuG query execution pipeline.
GDSAlgoPhysicalPlanextending base physical planlabel_idsfromProjectedSubgraphto actual storage IDsExpressionpredicates from string to executable formYIELDclause for column projectionResultTable)Key Files to Create/Modify:
include/neug/execution/gds_algo_physical_plan.hsrc/execution/gds_algo_physical_plan.cppsrc/compiler/- Add logical-to-physical plan conversion for GDS calls6. Cypher Parser Support
Goal: Parse GDS-specific Cypher syntax and integrate with query pipeline.
CALL procedure_name(params) YIELD columns{key1: value1, key2: {nested: value}}page_rank,connected_components, etc. in function catalogKey Files to Modify:
src/compiler/parser/- ANTLR4 grammar updatessrc/compiler/binder/- Function binding for GDS proceduressrc/compiler/- Logical plan generationTechnical Notes
Key Design Decisions (from spec):
<src_label, edge_type, dst_label>format for precise filtering.shortest_pathwithweight_property: nullis equivalent to BFS, butbfsprocedure has additionalmax_depthparameter - document this distinction clearly.Architecture Overview (from spec sequence diagram):
Known Issues from Spec Review (PR #64 Greptile comments):
bfsandshortest_pathwith no weight - needs clarification in implementationAcceptance Criteria
project_graphcreates zero-copy projections (verify no data duplication in memory)CALL algo.*syntax fully integrated with Cypher parserDatabase.call_procedure())Implementation Phases
Related
specs/004-gds/spec.md(1120+ lines)