Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .cursor/scratchpad.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Scratchpad

Index status: `count_documents` на всех vector backends + `index_coverage.get_index_coverage`, админка `/admin/graph-search/index-status/`, команда `search_index_status` с таблицей покрытия. Searcher: если у `graph.invoke()` нет ключа `final_results`, вызывается `postprocess_results_node` (LangGraph + dict state).
Release 0.3.3: setup.cfg, CHANGELOG, README, RELEASE_NOTES_0.3.3.md, dist/ built.

DONE — полный pytest 87 passed, 1 skipped; ruff на изменённых файлах — ok.
DONE — pytest 117 passed, 1 skipped; python -m build OK.
1 change: 0 additions & 1 deletion .pylintrc
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,6 @@ disable=
invalid-name,
too-few-public-methods,
too-many-arguments,
too-many-positional-arguments,
too-many-instance-attributes,
too-many-locals,
broad-exception-caught,
Expand Down
39 changes: 39 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,44 @@ project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased]

## [0.3.3] — 2026-05-19

Stable **0.3** release (replaces pre-releases `0.3.0a1` and `0.3.1a1`).

```bash
pip install django-graph-search==0.3.3
```

### Added
- **REST search:** each hit includes `score` (0.0–1.0) and `text`; optional `min_score` filters weak matches; response may include `min_score_applied`.
- **Model weights:** `weight_fields` is always parsed (including with `fields: "__all__"`); weight `0.0` excludes a field from indexed text.
- **Async indexing:** `ASYNC_INDEXING` (Celery / daemon `thread` / django-q) plus `django_graph_search.tasks` so `AUTO_INDEX` signals can avoid blocking requests.
- **Non-blocking auto-index (default):** with local SentenceTransformer embeddings, `AUTO_INDEX_NON_BLOCKING` runs indexing in a daemon thread without enabling `ASYNC_INDEXING`.
- **Skip noisy saves:** global `AUTO_INDEX_SKIP_UPDATE_FIELDS` (default `last_login`) and per-model `skip_update_fields` skip re-index when only those fields change (`update_fields` or full save with no other diffs).
- **Pgvector backend:** `django_graph_search.backends.PgvectorBackend` (extra `[pgvector]`).
- **Cloud embeddings:** `OpenAIEmbeddingBackend` and `CohereEmbeddingBackend` (extras `[openai]`, `[cohere]`).
- **Admin index coverage:** `/admin/graph-search/index-status/` shows DB row counts vs vector-store document counts per model, overall percentage, and static progress bars. Sidebar entries **Поиск** and **Статус индексации** via unmanaged models `GraphSearch` / `GraphSearchIndexStatus`.
- **`count_documents(filters)`** on ChromaDB, FAISS, Qdrant, and pgvector backends; used by coverage UI and `search_index_status` management command.
- **Admin search:** optional `min_score` query parameter on the Graph Search admin page (same semantics as REST).
- **Component registry:** vector store, embedding backend, and `GraphResolver` are cached per worker configuration (shared by `Searcher`, `Indexer`, signals).

### Changed
- **Vector scores:** ChromaDB / FAISS / Qdrant normalize distances to similarity scores in 0–1; ChromaDB reads the collection’s effective HNSW `space` and maps L2 / cosine / inner-product distances accordingly.
- **Factory / signals:** indexing and search reuse `get_shared_components()` from `component_registry`.

### Security
- **REST API access control:** optional `GRAPH_SEARCH["API"]` (`PERMISSION_CLASSES`, `THROTTLE_CLASSES`, `THROTTLE_RATES`, `REQUIRE_AUTHENTICATION`) via `django_graph_search.permissions`.
- **Safe integer parsing for `limit`:** invalid or negative values return HTTP 400; values above 1000 are clamped with a log warning.

### Fixed
- **LangGraph + `graph.invoke()`:** when the compiled graph omits `final_results`, `Searcher` runs `postprocess_results_node` so results are not empty.
- **ChromaDB:** cosine collections use `hnsw:space=cosine`; query distances mapped to similarity per metric.
- **File delta cache TTL:** `FileDeltaCache` enforces expiry on read; `purge_expired(dry_run=)` and `purge_search_cache` management command.
- **Conversational memory registry:** per-process backends with a lock; `RuntimeWarning` when `inmemory` + conversational enabled + `DEBUG` is false.

### Tests
- **117** tests passing (+59 vs 0.2.0): admin sidebar, Chroma score mapping, component registry, non-blocking signals, `skip_update_fields`.

## [0.3.1a1] — 2026-05-19

**Pre-release** of the **0.3.1** line. Install for smoke tests:
Expand Down Expand Up @@ -202,6 +240,7 @@ and signal handlers behave exactly as before.
- REST endpoints `/api/search/` and `/api/search/similar/<model>/<pk>/`.
- `build_search_index` management command.

[0.3.3]: https://github.com/svalench/django_graph_search/releases/tag/v0.3.3
[0.3.1a1]: https://github.com/svalench/django_graph_search/releases/tag/v0.3.1a1
[0.3.0a1]: https://github.com/svalench/django_graph_search/releases/tag/v0.3.0a1
[0.2.0]: https://github.com/svalench/django_graph_search/releases/tag/v0.2.0
Expand Down
71 changes: 61 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,12 +45,12 @@ pip install django-graph-search[cohere]
pip install django-graph-search[all]
```

## What's new in 0.3 (pre-release **0.3.1a1**)
## What's new in **0.3.3**

This line is a **pre-release** for smoke-testing packaging and integrations. Install with:
Stable **0.3** line. Install with:

```bash
pip install --pre django-graph-search==0.3.1a1
pip install django-graph-search==0.3.3
```

Highlights vs **0.2.0** (full detail in [CHANGELOG.md](CHANGELOG.md)):
Expand All @@ -59,12 +59,13 @@ Highlights vs **0.2.0** (full detail in [CHANGELOG.md](CHANGELOG.md)):
|------|--------|
| **REST hits** | Each result includes `score` (0.0–1.0) and `text`. Optional `min_score` query parameter filters weak matches; responses may include `min_score_applied`. |
| **Indexing** | `weight_fields` is always honored, including with `fields: "__all__"`; weight `0.0` drops a field from indexed text. |
| **Async signals** | `ASYNC_INDEXING` (Celery, `thread`, or django-q) plus `django_graph_search.tasks` so `AUTO_INDEX` can avoid blocking the request thread. |
| **Backends / embeddings** | **Pgvector** backend (`[pgvector]`). **OpenAI** / **Cohere** embedding backends (`[openai]`, `[cohere]`). |
| **Scores** | ChromaDB / FAISS / Qdrant normalize distances to similarity scores in 0–1 for consistent API output. |
| **Security / API** | Optional `GRAPH_SEARCH["API"]`: `PERMISSION_CLASSES`, `THROTTLE_CLASSES`, `THROTTLE_RATES`, `REQUIRE_AUTHENTICATION` via `django_graph_search.permissions` (defaults keep behaviour open). |
| **Validation** | Invalid or negative `limit` on search, streaming, conversational, and similar endpoints returns **400** (not 500); values above 1000 are clamped with a log warning. |
| **Fixes** | ChromaDB cosine metadata and distance mapping; file delta cache TTL and `purge_search_cache`; conversational in-memory registry + `RuntimeWarning` when `DEBUG` is false. |
| **Async / non-blocking signals** | `ASYNC_INDEXING` (Celery, `thread`, django-q) or default `AUTO_INDEX_NON_BLOCKING` (daemon thread for local SentenceTransformer). `AUTO_INDEX_SKIP_UPDATE_FIELDS` / per-model `skip_update_fields` skip noisy saves (`last_login`, etc.). |
| **Admin** | Sidebar **Поиск** and **Статус индексации**; index coverage page; `min_score` on admin search. |
| **Backends / embeddings** | **Pgvector** (`[pgvector]`). **OpenAI** / **Cohere** (`[openai]`, `[cohere]`). Shared component registry per worker. |
| **Scores** | ChromaDB / FAISS / Qdrant normalize distances to 0–1; Chroma respects collection metric (L2 / cosine / IP). |
| **Security / API** | Optional `GRAPH_SEARCH["API"]`: permissions, throttling, `REQUIRE_AUTHENTICATION` (defaults stay open). |
| **Validation** | Invalid `limit` → **400**; values above 1000 clamped with a log warning. |
| **Fixes** | LangGraph empty `final_results`; Chroma metadata; delta cache TTL; conversational memory registry warning. |

## Quick Start (5 minutes)

Expand Down Expand Up @@ -128,6 +129,8 @@ GRAPH_SEARCH = {
To restrict access to the main search, streaming, and conversational HTTP endpoints,
add an `"API"` block as described [below](#securing-the-rest-api-optional).

**Search relevance (semantic noise).** Vector search scores the full string built for indexing (all configured fields plus related rows when `follow_relations` is true). If results feel noisy or scores look flat, narrow `fields` to the attributes users actually query (e.g. `username`, `email`), set `follow_relations` / `relation_depth` lower, then rebuild the index. Admin Graph Search shows a **text preview** of indexed text per hit and supports optional **`min_score`** (same semantics as the REST API).

### 3. Add URLs

```python
Expand Down Expand Up @@ -253,6 +256,45 @@ When ``AUTO_INDEX`` is on, saves can block on large graphs. Enable ``ASYNC_INDEX
With ``thread``, indexing runs in a daemon thread (no retries). With ``celery``, install Celery
and register tasks; if Celery is missing, the task module falls back to synchronous execution with a warning.

### Production: zero impact on unrelated requests

``AUTO_INDEX`` hooks **every** ``post_save`` for models listed in ``MODELS``. A login that updates
``auth.User.last_login``, or any frequent save on an indexed model, can load a local
**sentence-transformers** model and block the request thread for seconds.

Recommended for production web workers:

| Setting | Recommendation |
|---------|----------------|
| ``AUTO_INDEX`` | ``False`` if you rebuild with ``build_search_index`` or a Celery beat job |
| ``ASYNC_INDEXING`` | ``ENABLED: True`` with ``thread`` or ``celery`` when ``AUTO_INDEX`` stays on |
| ``MODELS`` | Do **not** index ``auth.User`` (or similar) unless you need user search; login saves are noisy |
| ``EMBEDDINGS`` | Prefer ``OpenAIEmbeddingBackend`` / ``CohereEmbeddingBackend`` in Gunicorn workers to avoid PyTorch in-process |
| ``AUTO_INDEX_SKIP_UPDATE_FIELDS`` | Default ``["last_login"]`` — skips indexing when ``save(update_fields=...)`` touches only those fields |
| ``AUTO_INDEX_NON_BLOCKING`` | Default ``True`` — with local **SentenceTransformer**, signal indexing runs in a **daemon thread** so login/API are not blocked (model may still load in background) |

Example minimal fix for login latency:

```python
GRAPH_SEARCH = {
"AUTO_INDEX": False,
# or keep AUTO_INDEX and offload:
# "ASYNC_INDEXING": {"ENABLED": True, "BACKEND": "thread"},
# "AUTO_INDEX_SKIP_UPDATE_FIELDS": ["last_login"],
"MODELS": [
# avoid auth.User unless required
{"model": "shop.Product", "fields": ["name", "description"]},
],
}
```

Heavy components (vector store client, embedding backend, graph resolver) are **cached once per
worker process** after the first search or index operation. Restart workers after changing
``GRAPH_SEARCH`` backends or embedding models.

For local **sentence-transformers**, run indexing in a dedicated Celery worker if web workers
must stay lean.

### Securing the REST API (optional)

**Scope:** Settings under `GRAPH_SEARCH["API"]` apply only to **`GET /api/search/`**,
Expand Down Expand Up @@ -300,7 +342,16 @@ python manage.py purge_search_cache --dry-run # Count expired entries wit

## Admin UI

After installation, navigate to `/admin/graph-search/` for a semantic search interface directly in Django Admin — useful for content managers and debugging.
With `django.contrib.admin` installed, the app adds a **Django Graph Search** section on the admin index (`/admin/`) with **Поиск** and **Статус индексации** entries. The legacy URL `/admin/graph-search/` still works for bookmarks and docs.

Disable the admin section and custom URLs with:

```python
GRAPH_SEARCH = {
# ...
"ADMIN_SEARCH_ENABLED": False,
}
```

## Supported Backends

Expand Down
66 changes: 66 additions & 0 deletions RELEASE_NOTES_0.3.3.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# django-graph-search 0.3.3

**Release date:** 2026-05-19
**Type:** Stable (0.3 line — replaces pre-releases `0.3.0a1`, `0.3.1a1`)

```bash
pip install django-graph-search==0.3.3
# optional extras unchanged, e.g.:
pip install django-graph-search[pgvector,openai,all]
```

## Summary

First **stable** 0.3 release: REST scores and `min_score`, smarter indexing weights, async/non-blocking auto-index, pgvector + cloud embeddings, hardened REST API settings, admin index coverage with sidebar navigation, and ChromaDB score fixes aligned with collection metrics.

Upgrading from **0.2.x** is backward-compatible — new settings default to safe/off or sensible production defaults (`AUTO_INDEX_NON_BLOCKING=True` only affects local SentenceTransformer profiles).

## Highlights

### Search & API
- Result objects include **`score`** (0.0–1.0) and indexed **`text`**
- Query param **`?min_score=`** on REST and admin search
- Optional **`GRAPH_SEARCH["API"]`**: DRF-style permission/throttle hooks, `REQUIRE_AUTHENTICATION`
- Invalid **`limit`** → HTTP 400; values above 1000 clamped with warning

### Indexing & signals
- **`weight_fields`** always applied (`fields: "__all__"` supported; `0.0` = exclude field)
- **`ASYNC_INDEXING`**: Celery, `thread`, or django-q via `django_graph_search.tasks`
- **`AUTO_INDEX_NON_BLOCKING`** (default **on**): daemon-thread indexing for local ST without Celery
- **`AUTO_INDEX_SKIP_UPDATE_FIELDS`** / per-model **`skip_update_fields`**: skip re-index on `last_login`-only updates
- **`component_registry`**: one vector store + embedder + resolver per worker config

### Backends & embeddings
- **Pgvector** (`pip install django-graph-search[pgvector]`)
- **OpenAI** / **Cohere** embedding backends
- Normalized **0–1 similarity** across ChromaDB, FAISS, Qdrant; Chroma reads effective HNSW metric

### Admin
- Sidebar: **Поиск**, **Статус индексации**
- **`/admin/graph-search/index-status/`** — DB vs vector store coverage (static snapshot)
- Legacy URLs **`/admin/graph-search/`** preserved

### Fixes
- LangGraph: empty results when `final_results` missing from invoke output
- Chroma cosine / L2 / IP distance mapping
- File delta cache TTL + `purge_search_cache` command
- Conversational in-memory backend warning in production multi-worker setups

## Upgrade notes

| From | Action |
|------|--------|
| `0.2.x` | `pip install -U django-graph-search==0.3.3` — no mandatory settings changes |
| `0.3.0a1` / `0.3.1a1` | Drop `--pre`; pin `==0.3.3`. Behaviour matches pre-releases plus admin sidebar, skip-fields, non-blocking default, component registry |

Re-indexing is **not** required unless you change embedding model or smart-indexing templates.

## Tests

117 passed, 1 skipped (pytest suite in CI).

## Links

- [CHANGELOG.md](CHANGELOG.md) — full categorized list
- [PyPI](https://pypi.org/project/django-graph-search/)
- [Documentation](https://github.com/svalench/django_graph_search#readme)
2 changes: 1 addition & 1 deletion setup.cfg
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[metadata]
name = django-graph-search
version = 0.3.1a1
version = 0.3.3
description = Vector search for Django models with graph relations, optional LangGraph pipeline, conversational search, smart indexing and streaming.
long_description = file: README.md
long_description_content_type = text/markdown
Expand Down
81 changes: 75 additions & 6 deletions src/django_graph_search/admin.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,43 +5,92 @@
from django.urls import path

from .index_coverage import get_index_coverage
from .models import GraphSearch, GraphSearchIndexStatus
from .searcher import Searcher
from .settings import get_settings
from .views import _parse_float_param

_admin_site_configured: set[int] = set()

def graph_search_view(request):

def graph_search_view(request, admin_site=None):
site = admin_site if admin_site is not None else admin.site
config = get_settings()
query = request.GET.get("q", "").strip()
models = request.GET.get("models")
model_list = [m.strip() for m in models.split(",")] if models else None
min_score, min_score_err = _parse_float_param(
request.GET.get("min_score"),
"min_score",
default=None,
min_value=0.0,
max_value=1.0,
)
min_score_error = None
if min_score_err is not None:
min_score = None
min_score_error = "Параметр min_score: число от 0.0 до 1.0."

results = []
if query:
if query and min_score_error is None:
searcher = Searcher(config=config)
results = searcher.search(query, models=model_list, limit=config.default_results_limit)
if min_score is not None:
results = [r for r in results if float(r.get("score") or 0) >= min_score]

context = dict(
admin.site.each_context(request),
site.each_context(request),
title="Graph Search",
query=query,
results=results,
model_list=models or "",
available_models=[cfg.model for cfg in config.models],
min_score=request.GET.get("min_score", "").strip(),
min_score_applied=min_score,
min_score_error=min_score_error,
)
return TemplateResponse(request, "django_graph_search/admin/search.html", context)


def graph_search_index_status_view(request):
def graph_search_index_status_view(request, admin_site=None):
"""Статичный снимок покрытия индекса (без автообновления)."""
site = admin_site if admin_site is not None else admin.site
report = get_index_coverage()
context = dict(
admin.site.each_context(request),
site.each_context(request),
title="Статус индексации",
report=report,
)
return TemplateResponse(request, "django_graph_search/admin/index_status.html", context)


class _GraphSearchMenuAdmin(admin.ModelAdmin):
"""Базовый ModelAdmin для пунктов меню без CRUD."""

def has_add_permission(self, request):
return False

def has_change_permission(self, request, obj=None):
return False

def has_delete_permission(self, request, obj=None):
return False


class GraphSearchAdmin(_GraphSearchMenuAdmin):
def changelist_view(self, request, extra_context=None):
return graph_search_view(request, admin_site=self.admin_site)


class GraphSearchIndexStatusAdmin(_GraphSearchMenuAdmin):
def changelist_view(self, request, extra_context=None):
return graph_search_index_status_view(request, admin_site=self.admin_site)


def _inject_admin_urls(admin_site):
if getattr(admin_site, "_graph_search_urls_injected", False):
return

original_get_urls = admin_site.get_urls

def get_urls():
Expand All @@ -61,7 +110,27 @@ def get_urls():
return custom + urls

admin_site.get_urls = get_urls
admin_site._graph_search_urls_injected = True


def _register_menu_models(admin_site):
if not admin_site.is_registered(GraphSearch):
admin_site.register(GraphSearch, GraphSearchAdmin)
if not admin_site.is_registered(GraphSearchIndexStatus):
admin_site.register(GraphSearchIndexStatus, GraphSearchIndexStatusAdmin)


def setup_admin_site(admin_site=None):
"""Регистрация раздела админки и legacy-URL (идемпотентно)."""
site = admin_site if admin_site is not None else admin.site
site_id = id(site)
if site_id in _admin_site_configured:
return

_inject_admin_urls(admin.site)
if not get_settings().admin_search_enabled:
_admin_site_configured.add(site_id)
return

_register_menu_models(site)
_inject_admin_urls(site)
_admin_site_configured.add(site_id)
4 changes: 4 additions & 0 deletions src/django_graph_search/apps.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,4 +12,8 @@ def ready(self) -> None:
from . import signals # noqa: WPS433,F401

get_settings()
if get_settings().admin_search_enabled:
from . import admin # noqa: WPS433

admin.setup_admin_site()

Loading
Loading