A comprehensive Backstage plugin that integrates Ollama LLM with Retrieval-Augmented Generation (RAG) to provide intelligent Q&A capabilities for your Backstage entities.
- 🤖 AI-Powered Q&A: Ask natural language questions about your services and entities
- 📚 RAG Integration: Uses RAG to ground answers in actual Backstage catalog and TechDocs data
- 🔍 Vector Search: Efficient similarity search using embeddings
- 🎯 Entity-Aware: Contextually aware of the current entity being viewed
- 🔧 Configurable: Flexible configuration for models, indexing, and behavior
- 🏗️ Clean Architecture: Built with SOLID principles and modular design
The plugin is structured following Clean Code principles with clear separation of concerns.
The plugin uses the Strategy Pattern to support multiple RAG techniques and algorithms. The RAGService acts as a generic orchestrator that delegates all RAG operations to the active strategy implementation.
Key Components:
IRAGStrategyinterface: Contract that all RAG strategies must implementRAGStrategyFactory: Factory that instantiates the configured strategySimpleRAGStrategy: Default implementation (traditional retrieve-then-generate)RAGService: Generic service that orchestrates strategy execution
Benefits:
- ✅ Extensible: Add new RAG techniques without modifying core service
- ✅ Testable: Each strategy can be tested independently
- ✅ Configurable: Switch strategies via configuration
- ✅ Clean: Follows Open/Closed Principle
Implementing a Custom Strategy:
Create a new file in plugins/ask-ai-backend/src/rag/strategies/:
import { IRAGStrategy, RAGAnswer, RAGContext } from '../types';
import { RAGServiceDependencies } from '../../interfaces';
export class MyCustomRAGStrategy implements IRAGStrategy {
readonly name = 'custom';
constructor(private readonly deps: RAGServiceDependencies) {}
async indexAll(): Promise<void> {
// Your indexing logic
}
async indexEntity(entityRef: string): Promise<void> {
// Your entity-specific indexing
}
async retrieve(context: RAGContext): Promise<DocumentChunk[]> {
// Your retrieval logic (e.g., hybrid search, reranking)
}
async answer(context: RAGContext): Promise<RAGAnswer> {
// Your answer generation logic
}
}Register it in RAGStrategyFactory.ts:
case 'custom':
return new MyCustomRAGStrategy(dependencies);Configure it in app-config.yaml:
askAi:
ragStrategy: "custom"ask-ai-backend/
├── src/
│ ├── models/ # Domain models and types
│ ├── interfaces/ # Service interfaces (SOLID)
│ ├── rag/ # RAG strategy pattern
│ │ ├── types.ts # RAG interfaces and types
│ │ ├── index.ts # Public exports
│ │ ├── RAGStrategyFactory.ts # Strategy factory
│ │ └── strategies/ # Strategy implementations
│ │ ├── SimpleRAGStrategy.ts # Default RAG strategy
│ │ └── SimpleRAGStrategy.test.ts
│ ├── services/ # Service implementations
│ │ ├── ConfigService.ts
│ │ ├── OllamaLLMService.ts
│ │ ├── InMemoryVectorStore.ts
│ │ ├── PgVectorStore.ts
│ │ ├── VectorStoreFactory.ts
│ │ ├── DocumentProcessor.ts
│ │ ├── CatalogCollector.ts
│ │ ├── TechDocsCollector.ts
│ │ └── RAGService.ts # Generic orchestrator
│ ├── router.ts # Express router
│ └── index.ts
ask-ai/
├── src/
│ ├── api/ # API client
│ ├── hooks/ # React hooks
│ ├── components/ # React components
│ ├── plugin.ts # Plugin definition
│ └── index.ts
Before installing the plugin, ensure you have:
-
A running Backstage instance - See Backstage getting started docs
-
Ollama server - Install and run Ollama:
# Install Ollama (macOS/Linux) curl -fsSL https://ollama.com/install.sh | sh # Or use Docker docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama # Pull models ollama pull llama3.2 ollama pull all-minilm # For embeddings
Add the backend plugin to your Backstage backend:
# From your Backstage root directory
cd plugins
# The plugin code should be in plugins/ask-ai-backendAdd the plugin to your packages/backend/package.json:
{
"dependencies": {
"@internal/ask-ai-backend": "link:../../plugins/ask-ai-backend"
}
}Add the frontend plugin to your Backstage app:
Add to packages/app/package.json:
{
"dependencies": {
"@internal/ask-ai": "link:../../plugins/ask-ai"
}
}In packages/backend/src/index.ts, register the router:
import { createAskAiRouter } from '@internal/ask-ai-backend';
// In your createBackend function or similar setup
const askAiRouter = await createAskAiRouter({
logger: env.logger,
config: env.config,
discovery: env.discovery,
});
backend.use('/api/ask-ai', askAiRouter);Add configuration to your app-config.yaml:
askAi:
# Default LLM model for chat
defaultModel: "llama3.2"
# Model for generating embeddings
embeddingModel: "all-minilm"
# Ollama server URL
ollamaBaseUrl: "http://localhost:11434"
# Enable RAG functionality
ragEnabled: true
# RAG strategy to use: 'simple' (default), or custom implementations
ragStrategy: "simple"
# Number of similar chunks to retrieve
defaultTopK: 5
# Document chunking configuration
chunkSize: 512
chunkOverlap: 50
# Vector store configuration (memory or postgresql)
vectorStore:
type: memory # or 'postgresql' for productionIn packages/app/src/components/catalog/EntityPage.tsx, add the Ask AI card:
import { EntityAskAiCard } from '@internal/ask-ai';
// Add to your service entity page
const serviceEntityPage = (
<EntityLayout>
<EntityLayout.Route path="/" title="Overview">
<Grid container spacing={3}>
{/* Other cards */}
<Grid item md={12}>
<EntityAskAiCard />
</Grid>
</Grid>
</EntityLayout.Route>
{/* Or add as a separate tab */}
<EntityLayout.Route path="/ask-ai" title="Ask AI">
<EntityAskAiCard />
</EntityLayout.Route>
</EntityLayout>
);- Navigate to any service or entity page in your Backstage catalog
- Scroll to the "Ask AI" card
- Type your question in the text field
- Click "Ask AI" or press Enter
- View the AI-generated answer with sources
- "What APIs does this service expose?"
- "Who owns this service?"
- "What other services depend on this one?"
- "What is the purpose of this component?"
- "What technologies does this service use?"
When RAG is enabled (default), the plugin uses the configured RAG strategy to answer questions:
Simple RAG Strategy (default):
- Converts your question to an embedding
- Searches for relevant documentation chunks using vector similarity
- Provides these as context to the LLM
- Generates an answer grounded in actual Backstage data
Future Strategies (extensible via IRAGStrategy):
- Hybrid RAG: Combines semantic search with keyword matching (BM25)
- ReRank RAG: Uses cross-encoders to rerank retrieved chunks
- Multi-Query RAG: Generates multiple query variations for better coverage
- Agentic RAG: LLM decides when to retrieve more context iteratively
- Self-RAG: Includes verification and self-correction steps
- Graph RAG: Uses knowledge graphs for entity relationships
Toggle off "Use RAG" to ask questions directly to the LLM without context retrieval.
Ask a question with optional RAG.
Request:
{
"prompt": "What APIs does this service expose?",
"model": "llama3.2",
"entityId": "component:default/my-service",
"useRAG": true,
"topK": 5
}Response:
{
"answer": "Based on the documentation...",
"sources": [...],
"model": "llama3.2"
}Trigger indexing of all documents.
Get indexing status.
Index a specific entity.
Request:
{
"entityRef": "component:default/my-service"
}Health check endpoint.
# Backend
cd plugins/ask-ai-backend
yarn test
# Frontend
cd plugins/ask-ai
yarn test# Backend
cd plugins/ask-ai-backend
yarn build
# Frontend
cd plugins/ask-ai
yarn buildyarn lintThis plugin strictly follows SOLID principles:
- Each service has one clear responsibility
OllamaLLMService: Only handles LLM operationsPgVectorStore/InMemoryVectorStore: Only handles vector storageDocumentProcessor: Only handles document processingRAGService: Only orchestrates strategy executionSimpleRAGStrategy: Only implements the simple RAG algorithm
- Services are open for extension via interfaces
- Easy to add new RAG strategies without modifying
RAGService - Easy to add new vector stores by implementing
IVectorStore - Easy to add new LLM providers by implementing
ILLMService - Strategy pattern enables adding techniques like:
- Hybrid retrieval (semantic + keyword)
- Re-ranking strategies
- Multi-query generation
- Agentic RAG
- Self-RAG with verification
- All services implement interfaces
- Any
IRAGStrategycan replace another without breakingRAGService - Any
IVectorStoreimplementation works with any strategy - Services can be swapped with alternative implementations
- Small, focused interfaces
IRAGStrategydefines only core RAG operationsIVectorStoredefines only vector operations- Clients depend only on interfaces they use
- No fat interfaces forcing unused methods
- High-level modules depend on abstractions
RAGServicedepends onIRAGStrategy, not concrete strategiesSimpleRAGStrategydepends onILLMServiceandIVectorStoreinterfaces- All dependencies are injected via constructors
- Enables easy testing with mocks
The plugin supports multiple vector store backends for storing document embeddings. Choose the option that best fits your deployment scenario.
Best for: Local development, testing, proof-of-concept
The default in-memory vector store stores all embeddings in RAM. Simple and fast for development, but:
- ❌ Data is lost on restart
- ❌ Not scalable beyond ~10k vectors
- ❌ No persistence across deployments
Configuration:
askAi:
vectorStore:
type: memoryBest for: Production deployments, self-hosted environments
PostgreSQL with the pgvector extension provides persistent, scalable vector storage:
- ✅ Persistent storage (survives restarts)
- ✅ ACID transactions
- ✅ Efficient similarity search with HNSW index (O(log n))
- ✅ Scales to millions of vectors
- ✅ Familiar PostgreSQL operations and tooling
- ✅ Self-hosted with full control
Quick Start:
-
Start PostgreSQL with Docker:
docker-compose up -d postgres
-
Configure the plugin:
askAi: vectorStore: type: postgresql postgresql: host: localhost port: 5432 database: backstage_vectors user: backstage password: ${POSTGRES_PASSWORD} maxConnections: 10
-
Run migrations: The plugin automatically initializes the schema on first connection.
| Feature | In-Memory | PostgreSQL + pgvector |
|---|---|---|
| Persistence | ❌ None | ✅ Full |
| Scalability | ~10k vectors | Millions |
| Search Speed | O(n) | O(log n) with HNSW |
| Setup Complexity | None | Medium |
| Production Ready | ❌ No | ✅ Yes |
| Cost | Free | Database hosting |
The plugin's interface-based design makes it easy to add other vector stores:
Pinecone (Managed Cloud):
export class PineconeVectorStore implements IVectorStore {
// Implementation using Pinecone SDK
}Weaviate (Open-Source):
export class WeaviateVectorStore implements IVectorStore {
// Implementation using Weaviate client
}Qdrant, Milvus, Chroma, etc. can all be added by implementing the IVectorStore interface.
- Initial indexing runs 10 seconds after startup
- Re-index periodically or on catalog updates
- Consider incremental indexing for large catalogs
- Batch embed requests for efficiency
- Cache embeddings when possible
- Use appropriate chunk sizes for your use case
# Check if Ollama is running
curl http://localhost:11434/api/tags
# Check logs
docker logs ollama # if using Docker- Ensure indexing has completed:
GET /api/ask-ai/index/status - Trigger manual indexing:
POST /api/ask-ai/index - Check that entities have descriptions or TechDocs
- Increase
topKto retrieve more context - Adjust
chunkSizeandchunkOverlap - Try different models (llama3.2, mistral, etc.)
Contributions are welcome! Please ensure:
- Code follows SOLID principles
- Tests are included
- Documentation is updated
- Linting passes
This project is licensed under the GNU General Public License v3.0 (GPL-3.0) for personal and non-commercial use only.
For personal, educational, and non-commercial purposes, this software is freely available under the GPL-3.0 license:
✅ You Can:
- Use this plugin for personal projects and learning
- Modify and adapt the code for non-commercial purposes
- Contribute improvements back to the project
- Disclose source and include license notices
- Share modifications under the same GPL-3.0 license
- Clearly state any significant changes made
❌ You Cannot:
- Sublicense under different terms
- Hold authors liable for damages
Commercial use of this software requires a separate commercial license.
Commercial use includes, but is not limited to:
- Integration into commercial products or services
- Use within organizations generating revenue
- Deployment in enterprise or production environments for business purposes
- Distribution as part of commercial offerings
For commercial licensing inquiries, please contact inbox.
We offer flexible commercial licensing options tailored to your organization's needs, including support and maintenance agreements.
The GPL-3.0 license terms for non-commercial use can be found in the LICENSE file.
Copyright (C) 2025-2026 flickleafy
This program is free software for personal use: you can redistribute it
and/or modify it under the terms of the GNU General Public License as
published by the Free Software Foundation, either version 3 of the License,
or (at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
Commercial use requires a separate commercial license. Please contact
the copyright holder for commercial licensing terms.
For GPL-3.0 license details: https://www.gnu.org/licenses/gpl-3.0.html