VGAE_KnowledgeGraph_Expansion is a Python-based repository that leverages NebulaGraph and PyTorch Geometric to identify knowledge gaps in a knowledge graph, suggest new node positions in the latent embedding space, and generate structured content for the proposed nodes using OpenAI's language models.
This repository is designed to enhance knowledge graphs by dynamically identifying underrepresented regions, generating new nodes, and seamlessly integrating them into the graph structure. The core functionality revolves around training a Variational Graph Autoencoder (VGAE) to analyze graph embeddings and performing cluster-based analysis to drive graph expansion.
-
Knowledge Gap Identification:
- Use VGAE latent space to identify sparse regions in the graph.
- Perform cluster density analysis to suggest new node positions.
-
Content Generation:
- Generate structured content for new nodes using OpenAI's language models.
- Align generated content with existing graph themes and metadata.
-
Graph Expansion:
- Dynamically insert new nodes and edges into NebulaGraph.
- Use graph embeddings and cosine similarity to identify relevant neighbors.
-
Graph Learning with PyTorch Geometric:
- Train a VGAE model with a custom encoder using GATConv layers.
- Encode nodes into a latent embedding space for downstream analysis.
- Connects to NebulaGraph to query document nodes and edges.
- Embeds document features using BERT for textual data.
- Prepares
Dataobjects compatible with PyTorch Geometric.
- Implements a VGAE with a custom GATConv-based encoder.
- Trains the model to encode the graph structure and node features.
- Provides methods for cluster density analysis and node position suggestions.
- Uses OpenAI's GPT models to create structured document content.
- Incorporates metadata and neighbor nodes to generate contextually relevant content.
- Executes the pipeline:
- Prepares graph data.
- Trains the VGAE model.
- Analyzes cluster density.
- Suggests new node positions.
- Generates and integrates new content.
- Contains data for the nodes and edges of the example input knowledge graph.
Use the main script to identify knowledge gaps, suggest new nodes, and generate content:
python main.py- Query and preprocess document nodes and edges from NebulaGraph.
- Train a VGAE to encode graph structure and identify sparse regions.
- Suggest positions for new nodes using cluster analysis.
- Generate structured content for these nodes.
- Insert the new nodes and edges into NebulaGraph.
- Python 3.8+
- Libraries:
torchtorch_geometrictransformersnebula3-pythonopenaiscikit-learn
- Support for multi-modal graphs with additional edge types and node categories.
- Integration with real-time graph updates for dynamic applications.
- Exploration of advanced graph neural networks for richer embeddings.
- Prompt engineering and additional LLM strategies to refine document content generation.
Built by Kaustav Malik for dynamic knowledge graph enrichment and expansion.