This guide is written for architects, data engineers, and full-stack developers who want to understand how MongoDB Atlas powers an agentic AI application using a loan-portfolio domain.
To achieve a true Customer 360 view—data is fragmented across isolated systems (origination, servicing, risk, communications), making it difficult to unify and search information semantically. Traditional datastores further separate OLTP and vector workloads, preventing seamless exploration and enrichment. MongoDB Atlas enables a single, flexible document store with built-in vector search and hybrid search, empowering teams to deliver rich, AI-powered insights across the entire customer lifecycle.
| Folder/File | Purpose |
|---|---|
src/app/lib/types.ts |
Single source of truth for TypeScript loan schema |
src/app/lib/seed-data.ts |
ETL+enrichment logic: generates realistic loans, analyst notes, embeddings |
src/app/lib/agent.ts |
LangGraph workflow + 7 domain tools backed by MongoDB queries |
package.json |
Dependencies: @langchain/*, mongodb 6.x, next 15, react 19, etc. |
The demo simulates five canonical sources:
- Loan Origination System (LOS) – applicant KYC, requested amount, purpose
- Loan Management System (LMS) – repayment schedule & live status
- Customer Communication Logs – call center notes, emails, WhatsApp interactions
- Payment History – monthly on-time / delayed / default flags
- Risk Engines – debt-to-income, credit bureau scores, internal risk factors
In production these arrive as JSON events (Kafka, webhooks, SFTP dumps). For demo we synthesise them via an LLM to guarantee realism.
File excerpt ↓
export type LoanApplication = {
application_id: string;
applicant_details: {
name: string;
age: number;
occupation: string;
monthly_income: number;
existing_loans: number;
credit_score: number;
sector: string;
years_of_experience: number;
};
loan_details: {
amount: number;
tenure: number;
interest_rate: number;
purpose: string;
category: 'personal' | 'business' | 'home_improvement';
emi_amount: number;
};
risk_metrics: {
debt_to_income: number;
payment_to_income: number;
existing_emi_ratio: number;
risk_score: number;
};
application_status: 'approved' | 'rejected' | 'pending';
performance_data: {
payment_history: Array<{ month:number; status:'on_time'|'delayed'|'default'; payment_date:Date; amount_paid:number }>;
current_status: 'active' | 'closed' | 'defaulted';
};
metadata: {
submission_date: Date;
processing_time: number;
analyst_notes: string;
last_updated: Date;
vector_embedding?: number[]; // ✨ Atlas Vector Search field
};
}Why MongoDB?
- Document model fits nested sub-objects (
applicant_details,payment_history) with no joins. - Fields can evolve independently (flexible schema) – e.g. adding
mitigating_factors[]later.
sequenceDiagram
autonumber
participant Source System
participant LLM as LLM
participant Emb as Embeddings API
participant DB as MongoDB Atlas
Source System->>DB: JSON loan doc
DB->>LLM: prompt(generateAnalystNotes)
LLM-->>DB: analyst_notes
DB->>Emb: embedQuery(analyst_notes)
Emb-->>DB: 1536-dim vector
seed-data.ts › generateAnalystNotes() crafts a role-tailored prompt:
As a loan analyst, write detailed notes for the following loan application:
Applicant: {{name}}
Occupation: {{occupation}} at {{company}}
Loan Amount: ₹{{amount}}
Purpose: {{detailed_purpose}}
Risk Score: {{risk_score}}
Include:
1. Key strengths and concerns
2. Analysis of repayment capacity
3. Evaluation of purpose and amount justification
4. Overall recommendation
- Uses Indian Rupee symbol, domain-specific language.
- Temperature=0.8 → diverse but factual narratives.
The resulting paragraph (≈150–200 words) is persisted under metadata.analyst_notes in the same document. This co-location enables hybrid search (vector + metadata filters) without joins.
- Human bankers can review LLM-generated rationale.
- Vector embedding (§4) turns unstructured prose into semantic search keys for the agent.
const embeddings = new OpenAIEmbeddings({ modelName: 'text-embedding-3-small' })
const embedding = await embeddings.embedQuery(analystNotes) // returns number[1536]
loanData.metadata.vector_embedding = embedding- You can swap
OpenAIEmbeddingswith VoyageAI for domain-tuned representations:
import { VoyageEmbeddings } from '@langchain/community/embeddings/voyage'
const embeddings = new VoyageEmbeddings({ modelName:'voyage-2', apiKey:process.env.VOYAGE_API })VoyageAI is noted for finance-oriented semantic nuance.
mongosh <<'EOF'
use loan_portfolio
db.loan_applications_demo.createVectorSearchIndex({
name: 'vector_index',
definition: {
fields: [{
type: 'vector',
path: 'metadata.vector_embedding',
numDimensions: 1536,
similarity: 'cosine'
}]
}
})
EOF- Similarity options:
cosine|euclidean|dotProduct - Vector quantization can be enabled for memory-efficient HNSW graphs. See Atlas Vector Quantization.
const vectorResults = await collection.aggregate([
{
$vectorSearch: {
index: 'vector_index',
queryVector: embedding,
path: 'metadata.vector_embedding',
k: 10,
filter: { 'loan_details.category': 'business', 'risk_metrics.risk_score': { $lte: 6 } }
}
}
])- Combines semantic proximity with exact match facets.
- Supported in v7.2+ clusters across AWS, Azure, GCP.
src/app/lib/agent.ts wires LangGraph state machine & seven domain tools that tap MongoDB.
| Tool | Purpose | Underlying Mongo Query |
|---|---|---|
portfolio_metrics |
Portfolio-wide KPI aggregation | $group + $avg + $cond |
loan_analysis |
Vector similarity search + quality metrics | $vectorSearch + LangChain vector store |
risk_analysis |
Top loans beyond threshold | find({ 'risk_metrics.risk_score': {$gte:t} }) |
applicant_search_by_applicant_details |
Regex / exact lookup on nested applicant fields | find with dynamic $regex |
applicant_search_by_field |
Flexible Id-based search | plain _id or any other property |
advanced_analytics |
Aggregation pipelines (sector risk, monthly disbursement, risk bands) | $group, $bucket, $dateToString |
payment_analysis |
Calculate reliability & delay trends | JavaScript post-processing |
graph TD
Start((Human msg)) --> Agent
Agent -->|Decision: tool?\n> 0 tool calls| Final[Return answer]
Agent -->|1‒N tool calls| Tools
Tools -->|MongoDB actions| Agent
Agent --> Final
shouldContinue()checks last AI message fortool_calls[].- Checkpointing:
MongoDBSaverrecords thread state inloan_portfolio.workflow_states– persistence = memory for our AI.
👤: List top 3 sectors with highest avg risk score.
🤖 (agent): invokes advanced_analytics(metric_type='sector_risk_analysis')
🔧 Tool response: JSON array [{sector:'Transport',avg_risk:7.9,…},…]
🤖: "Transport, Real-Estate and Agriculture show elevated \>7 average risk. Recommend tighter LTV ratios and collateral verification."
| Capability | Benefit to Agentic AI |
|---|---|
| Flexible Document Model | Nested objects (payment_history[]) without complex joins; adapt schema per loan type (home vs business) |
| Transactional Guarantees | Atomic upserts of doc + embedding + notes |
| Embedded Vector Search | No side-car vector DB; low-latency sub-10ms ANN |
| Hybrid Search | Combine semantic & structured filters in single operator |
| Horizontal Scaling | Sharding with vector indexes from 7.2 onward |
| Atlas Triggers | Auto-re-embed notes on update ; push to Slack |
| Queryable Encryption | PII like PAN, Aadhaar remains encrypted yet queryable |
📚 Docs: Hybrid Search • Create Embeddings • Explain plans
- Unified Operational + Vector Store – keep source of truth and embeddings together; avoid ETL drift seen in dual-DB setups.
- MQL-native
$vectorSearch– one query language for CRUD, aggregations, & ANN. - ACID Transactions on Vectors – update doc & vector atomically (critical for RLHF feedback).
- Explainable ANN –
explain()shows HNSW graph traversal stats – crucial for latency tuning. - Vector Quantization & Memory-Tier – shrink 1536-dim vectors ~4× with minimal accuracy drop.
- Cloud Agnostic – Atlas runs on AWS, Azure, GCP; comparable vector latency on each.
- Enterprise-grade Security – SOC2, KPI encryption, network isolation.
Reference: Vector Search Types
- Cluster tier: M10+ (vector search requires dedicated tier)
- Feature flag:
enableVectorSearch=true(auto-on for v7.2 clusters) - Search index backup included in online snapshots.
- Atlas Performance Advisor surfaces slow vector queries & suggests HNSW params.

{ "_id": "666faa...", "applicant_details": { "name": "Ananya Iyer", "occupation": "Software Eng @ Infosys", "sector": "IT", "monthly_income": 120000 }, "loan_details": { "amount": 750000, "interest_rate": 11.25, "category": "personal" }, "risk_metrics": { "risk_score": 4.7 }, "application_status": "approved", "performance_data": { "current_status": "active" }, "metadata": { "analyst_notes": "Applicant's DTI comfortable...", "vector_embedding": [-0.0123, 0.081, ... ] } }