Shared database layer providing SQLAlchemy models, Alembic migrations, and query helpers for all GrantFlow.AI services. This package implements a PostgreSQL-based multi-tenant architecture with soft-delete patterns, vector embeddings for RAG operations, and comprehensive relationship management.
For prerequisites, environment setup, and general development workflow, see the Contributing Guide.
This README covers db package-specific architecture and development details.
packages/db/
├── alembic/ # Database migrations
│ ├── versions/ # Individual migration scripts
│ ├── env.py # Alembic environment configuration
│ └── script.py.mako # Migration template
├── alembic.ini # Alembic configuration
├── pg-extensions.sql # PostgreSQL extensions (pgvector)
└── src/
├── __init__.py # Package exports
├── connection.py # Database connection utilities
├── constants.py # Constants (EMBEDDING_DIMENSIONS, source types)
├── enums.py # Shared enumerations (UserRoleEnum, StatusEnum, etc.)
├── json_objects.py # TypedDict definitions for JSON columns
├── query_helpers.py # Soft-delete and metadata filtering helpers
├── tables.py # SQLAlchemy model definitions
└── utils.py # Database utilities
The database schema follows a hierarchical multi-tenant structure with polymorphic RAG sources and collaborative editing support:
erDiagram
Organization ||--o{ OrganizationUser : has
Organization ||--o{ Project : contains
Organization ||--o{ Notification : receives
OrganizationUser ||--o{ ProjectAccess : grants
OrganizationUser }o--|| User : represents
Project ||--o{ GrantApplication : contains
Project ||--o{ ProjectAccess : controls
GrantApplication ||--|| GrantTemplate : has
GrantApplication ||--o{ GrantApplicationSource : references
GrantApplication ||--o{ RagGenerationJob : triggers
GrantApplication ||--o{ EditorDocument : owns
GrantApplication ||--o{ GenerationNotification : receives
GrantApplication }o--o| GrantApplication : "parent/children"
GrantTemplate ||--o{ GrantTemplateSource : references
GrantTemplate ||--o{ RagGenerationJob : triggers
GrantTemplate }o--|| GrantingInstitution : "funded by"
GrantingInstitution ||--o{ Grant : publishes
GrantingInstitution ||--o{ GrantingInstitutionSource : references
RagSource ||--o{ TextVector : vectorized
RagSource ||--o{ GrantApplicationSource : "linked to"
RagSource ||--o{ GrantTemplateSource : "linked to"
RagSource ||--o{ GrantingInstitutionSource : "linked to"
RagSource }o--o| RagSource : "parent/children"
RagFile --|> RagSource : inherits
RagUrl --|> RagSource : inherits
RagGenerationJob }o--o| RagGenerationJob : "parent/children"
EditorDocument }o--|| GrantApplication : belongs_to
Multi-tenancy & Access Control:
Organization: Top-level tenant entityOrganizationUser: User membership with role-based permissions (OWNER, ADMIN, COLLABORATOR)Project: Organizational unit for grant applicationsProjectAccess: Fine-grained project access control
Grant Management:
GrantApplication: Main entity for grant proposals with deep dive questions and research objectivesGrantTemplate: Structured grant template with sections and CFP analysisGrantingInstitution: Funding organizations (NIH, NSF, etc.)Grant: Published grant opportunities with deadlines and requirements
RAG (Retrieval-Augmented Generation):
RagSource: Polymorphic base for indexable content (files and URLs)RagFile: GCS-stored files with extraction metadataRagUrl: Crawled web pages with title and descriptionTextVector: Embeddings with 384-dimensional vectors for similarity search
Workflow Management:
RagGenerationJob: Async job tracking for template/application generation stagesGenerationNotification: Real-time updates during generation pipelineNotification: User notifications for deadlines and system events
Collaboration:
EditorDocument: CRDT-based collaborative editing state (Y.js binary format)
All models inherit from Base which implements a soft-delete pattern using the deleted_at timestamp. Soft-deleted records remain in the database but are excluded from queries.
Always use the select_active() and update_active() helpers from /packages/db/src/query_helpers.py to automatically filter out soft-deleted records:
from packages.db.src.query_helpers import select_active, update_active
from packages.db.src.tables import Project
from sqlalchemy import select
# CORRECT: Use helper to exclude deleted records
query = select_active(Project).where(Project.organization_id == org_id)
projects = await session.scalars(query)
# INCORRECT: Manual filtering is error-prone and violates soft-delete pattern
query = select(Project).where(
Project.organization_id == org_id,
Project.deleted_at.is_(None) # Don't do this
)from packages.db.src.tables import GrantApplication
# Soft delete a record
application.soft_delete() # Sets deleted_at to current timestamp
await session.commit()
# Restore a soft-deleted record
application.restore() # Sets deleted_at to None
await session.commit()
# Check if deleted
if application.is_deleted:
print("This record has been soft-deleted")The package provides comprehensive helpers in /packages/db/src/query_helpers.py:
Basic Filters:
select_active(model): SELECT query excluding deleted recordsselect_active_by_id(model, id): SELECT by ID excluding deleted recordsupdate_active(model): UPDATE query excluding deleted recordsupdate_active_by_id(model, id): UPDATE by ID excluding deleted recordsadd_active_filter(query, *models): Add soft-delete filter to existing query
Metadata Filters (for document_metadata JSON columns):
metadata_has_entity_type(column, entity_type): Filter by entity type in metadatametadata_has_categories(column, categories, match_mode): Filter by categories (any/all)metadata_has_keyword(column, keyword, min_confidence): Filter by keyword with confidence thresholdbuild_metadata_filter(column, ...): Composite filter builder combining entity types, categories, and quality scores
The package exports the following from /packages/db/src/__init__.py:
Models:
All table classes from tables.py including Organization, Project, GrantApplication, GrantTemplate, RagSource, RagFile, RagUrl, TextVector, EditorDocument, etc.
Enumerations:
UserRoleEnum: OWNER, ADMIN, COLLABORATORApplicationStatusEnum: WORKING_DRAFT, IN_PROGRESS, GENERATING, CANCELLEDSourceIndexingStatusEnum: CREATED, INDEXING, FINISHED, FAILEDRagGenerationStatusEnum: PENDING, PROCESSING, COMPLETED, FAILED, CANCELLEDGrantType: RESEARCH, TRANSLATIONALGrantTemplateStageEnum: CFP_ANALYSIS, TEMPLATE_GENERATIONGrantApplicationStageEnum: BLUEPRINT_PREP, WORKPLAN_GENERATION, SECTION_SYNTHESIS
TypedDict Structures:
Chunk: Text chunk with optional page numberResearchTask: Task with number, title, descriptionResearchObjective: Objective with research tasksResearchDeepDive: Research-focused deep dive questionsTranslationalResearchDeepDive: Translational research deep dive questionsCFPAnalysis: Call for Proposals analysis structureGrantElement: Base grant section elementGrantLongFormSection: Extended section with generation instructions
Constants:
EMBEDDING_DIMENSIONS: 384 (dimensionality for text embeddings)RAG_FILE,RAG_URL: Polymorphic identity constants
Query Helpers:
All functions from query_helpers.py for soft-delete filtering and metadata queries.
Connection:
get_async_session_maker(): Factory for async database sessions
All database operations must use async context managers with explicit transaction boundaries:
from packages.db.src.connection import get_async_session_maker
async_session_maker = get_async_session_maker()
# CORRECT: Explicit transaction boundary
async with async_session_maker() as session, session.begin():
project = await session.scalar(
select_active(Project).where(Project.id == project_id)
)
# Modifications are automatically committed
# INCORRECT: Reusing sessions across requests
session = async_session_maker() # Don't do this
# ... multiple operations ...The TextVector table stores 384-dimensional embeddings generated from indexed content:
from packages.db.src.tables import TextVector
from packages.db.src.constants import EMBEDDING_DIMENSIONS
# Query for similar vectors using cosine distance
from sqlalchemy import select, func
query = select(TextVector).order_by(
TextVector.embedding.cosine_distance(query_embedding)
).limit(10)
similar_chunks = await session.scalars(query)The embedding column uses pgvector's HNSW index (Hierarchical Navigable Small World) for efficient approximate nearest neighbor search with the following parameters:
m: 48 (max connections per layer)ef_construction: 256 (size of dynamic candidate list during construction)vector_cosine_ops: Cosine distance optimization
Several columns use JSON/JSONB storage with strict TypedDict schemas:
GrantApplication:
form_inputs:ResearchDeepDive | TranslationalResearchDeepDive- Deep dive questionnaire responsesresearch_objectives:list[ResearchObjective]- Structured research objectives with tasks
GrantTemplate:
grant_sections:list[GrantLongFormSection | GrantElement]- Template structurecfp_analysis:CFPAnalysis- Analyzed call for proposals
TextVector:
chunk:Chunk- Text content with metadata
RagSource:
document_metadata:DocumentMetadata- Extracted document metadata (entities, keywords, categories)
These TypedDict structures provide type safety when accessing JSON fields:
from packages.db.src.json_objects import ResearchDeepDive
application = await session.scalar(
select_active(GrantApplication).where(GrantApplication.id == app_id)
)
# Type-safe access to JSON fields
if application.form_inputs:
deep_dive: ResearchDeepDive = application.form_inputs
hypothesis = deep_dive.get("hypothesis")RagSource uses SQLAlchemy's single-table inheritance pattern:
from packages.db.src.tables import RagSource, RagFile, RagUrl
from packages.db.src.constants import RAG_FILE, RAG_URL
# Query all sources (returns RagFile and RagUrl instances)
all_sources = await session.scalars(select_active(RagSource))
# Query only files
files = await session.scalars(select_active(RagFile))
# Query only URLs
urls = await session.scalars(select_active(RagUrl))
# Check polymorphic type
if source.source_type == RAG_FILE:
# source is a RagFile instance
print(f"File: {source.filename}")
elif source.source_type == RAG_URL:
# source is a RagUrl instance
print(f"URL: {source.url}")Several tables support hierarchical structures:
GrantApplication (parent/children):
# Create revision from existing application
new_version = GrantApplication(
title=original.title,
parent_id=original.id,
project_id=original.project_id,
)RagSource (parent/children):
# Create sub-document from parent source
sub_doc = RagUrl(
url=chapter_url,
parent_id=main_doc.id,
)RagGenerationJob (parent/child jobs):
# Create dependent job
child_job = RagGenerationJob(
parent_job_id=parent_job.id,
grant_template_id=template.id,
template_stage=GrantTemplateStageEnum.TEMPLATE_GENERATION,
)