Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,10 @@ and this project adheres to

## [unreleased] - 2026-05-21

### Added

- ✨(maintenance) add maintenance mode

### Changed

- 🔒️(front) disable yarn install scripts in docker build
Expand Down
115 changes: 115 additions & 0 deletions docs/service_status.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
# Service Status

Two admin-controlled mechanisms communicate service state to users:

- **Status banner** — non-blocking notice shown at the top of the SPA
(announcements, incidents, planned-but-not-started maintenance).
- **Maintenance mode** — blocking; the app returns `503` and the SPA renders a
dedicated maintenance page.

Both are exposed to the frontend through `/api/<version>/config/`.

---

## Status banner

A single, time-windowed banner driven by the `SiteConfiguration` singleton.

### Admin fields

Edit at **Core > Site Configuration**:

- `status_banner_level` — `info` / `warning` / `alert` (controls styling).
- `status_banner_title` — required; the banner is hidden when blank.
- `status_banner_content` — body text (markdown rendered by the SPA).
- `status_banner_starts_at` / `status_banner_ends_at` — optional window.
Outside the window, the banner is hidden even with a title set.

### Visibility logic

The banner is visible when **all** of these hold:

1. `status_banner_title` is non-empty.
2. `starts_at` is unset or in the past.
3. `ends_at` is unset or in the future.

When hidden, `/config/` returns `status_banner: null`.

---

## Maintenance mode

When active, the backend short-circuits every non-exempt request with HTTP
`503` and the SPA flips to a dedicated maintenance page.

### Two toggles, OR-combined

Maintenance is ON when **either** is true:

1. **Env var** `MAINTENANCE_MODE=true` (escape hatch — wins over the DB).
2. **DB singleton** `MaintenanceMode` has `enabled=True` and the current time
falls inside `[starts_at, ends_at]` (both optional).

If the env var is set, the admin form shows a warning that the DB value is
overridden.

### Toggling via Django admin

Go to **Core > Maintenance Mode** and edit the singleton:

- `enabled` — master switch.
- `message` — shown on the maintenance page (blank = default copy).
- `starts_at` / `ends_at` — optional window. Outside it, `enabled` has no
effect.

`updated_at` / `updated_by` are filled automatically. State changes are logged
at `WARNING` level.

### Exempt paths

`MaintenanceMiddleware` lets these through even when maintenance is on:

- `/admin/...` — so you can toggle it back off.
- `/__heartbeat__`, `/__lbheartbeat__` — load-balancer health checks.
- `/api/<version>/config/` — the SPA polls this to detect maintenance state.

Static files are served by `WhiteNoiseMiddleware` upstream and never reach the
maintenance middleware.

### Response

Non-exempt requests get:

```json
HTTP/1.1 503 Service Unavailable
Retry-After: <seconds-until-ends_at> (only if ends_at is set and in the future)

{"code": "maintenance_mode", "detail": "Service under maintenance"}
```

### Frontend behavior

`ConfigProvider` reads `maintenance` from `/api/<version>/config/`. When it is
non-null, the SPA renders the maintenance page instead of the app shell.

Any `503 maintenance_mode` returned by another API call (query or mutation)
invalidates the `config` query, so users flip to the maintenance page on the
next interaction without a manual reload.

---

## Performance

Both `SiteConfiguration` and `MaintenanceMode` are `django-solo` singletons
cached in the default cache (`SOLO_CACHE_TIMEOUT = 5 min`). `save()`
invalidates the cache key immediately, so changes are effectively instant;
the timeout is just a safety net.

## Choosing between them

| Situation | Use |
|--------------------------------------------|--------------------|
| Heads-up about an upcoming change | Status banner |
| Ongoing degraded service, app still usable | Status banner (`warning` or `alert`) |
| Hard downtime — block all user traffic | Maintenance mode |
| Emergency lockout (admin DB unreachable) | `MAINTENANCE_MODE` env var |
4 changes: 4 additions & 0 deletions src/backend/chat/tests/views/test_file_stream.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,10 @@

from django.core.cache import cache

import pytest
Comment thread
providenz marked this conversation as resolved.

pytestmark = pytest.mark.django_db


def test_file_stream_invalid_key(api_client):
"""Test that invalid temporary keys return 404."""
Expand Down
14 changes: 14 additions & 0 deletions src/backend/conversations/settings.py
Original file line number Diff line number Diff line change
Expand Up @@ -326,6 +326,7 @@ class Base(BraveSettings, Configuration):
MIDDLEWARE = [
"django.middleware.security.SecurityMiddleware",
"whitenoise.middleware.WhiteNoiseMiddleware",
"core.middleware.MaintenanceMiddleware",
"django.contrib.sessions.middleware.SessionMiddleware",
"django.middleware.locale.LocaleMiddleware",
"django.middleware.clickjacking.XFrameOptionsMiddleware",
Expand Down Expand Up @@ -481,6 +482,19 @@ class Base(BraveSettings, Configuration):
FRONTEND_SILENT_LOGIN_ENABLED = values.BooleanValue(
default=True, environ_name="FRONTEND_SILENT_LOGIN_ENABLED", environ_prefix=None
)

# Maintenance mode. When true, the app returns 503 to end-users for every
# non-exempt request. Always OR'd with the DB-backed `MaintenanceMode` singleton.
MAINTENANCE_MODE = values.BooleanValue(
default=False, environ_name="MAINTENANCE_MODE", environ_prefix=None
)

# django-solo cache: avoids a DB hit per request for singleton lookups
# (MaintenanceMode, SiteConfiguration). save() invalidates the cache key
# instantly; the timeout is a safety net.
SOLO_CACHE = "default"
SOLO_CACHE_TIMEOUT = 60 * 5
SOLO_CACHE_PREFIX = "solo"
THEME_CUSTOMIZATION_FILE_PATH = values.Value(
os.path.join(BASE_DIR, "conversations/configuration/theme/default.json"),
environ_name="THEME_CUSTOMIZATION_FILE_PATH",
Expand Down
26 changes: 25 additions & 1 deletion src/backend/core/admin.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
"""Admin classes and registrations for core app."""

from django.contrib import admin
from django.conf import settings
from django.contrib import admin, messages
from django.contrib.auth import admin as auth_admin
from django.utils.translation import gettext_lazy as _

Expand Down Expand Up @@ -147,3 +148,26 @@ class SiteConfigurationAdmin(SingletonModelAdmin):
},
),
)


@admin.register(models.MaintenanceMode)
class MaintenanceModeAdmin(SingletonModelAdmin):
"""Admin class for the MaintenanceMode singleton."""

fields = ("enabled", "message", "starts_at", "ends_at", "updated_at", "updated_by")
readonly_fields = ("updated_at", "updated_by")

def save_model(self, request, obj, form, change):
obj.updated_by = request.user
super().save_model(request, obj, form, change)

def changeform_view(self, request, object_id=None, form_url="", extra_context=None):
if settings.MAINTENANCE_MODE:
messages.warning(
request,
_(
"The MAINTENANCE_MODE environment variable is set: maintenance is "
"forced ON regardless of the value below."
),
)
return super().changeform_view(request, object_id, form_url, extra_context)
13 changes: 13 additions & 0 deletions src/backend/core/api/viewsets.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
from rest_framework.throttling import UserRateThrottle

from core import models, permissions
from core.middleware import is_maintenance_active

from . import serializers

Expand Down Expand Up @@ -228,6 +229,7 @@ def get(self, request):
dict_settings["project_images_max_count"] = settings.PROJECT_IMAGES_MAX_COUNT

dict_settings["status_banner"] = self._get_banner()
dict_settings["maintenance"] = self._get_maintenance()

return drf.response.Response(dict_settings)

Expand Down Expand Up @@ -274,3 +276,14 @@ def _get_banner(self):
"title": config.status_banner_title,
"content": config.status_banner_content,
}

def _get_maintenance(self):
"""Return maintenance state for the SPA, or None if inactive."""
if not is_maintenance_active():
return None

config = models.MaintenanceMode.get_solo()
return {
"enabled": True,
"message": config.message,
}
72 changes: 72 additions & 0 deletions src/backend/core/middleware.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
"""Middlewares for the core app."""

import re
from logging import getLogger

from django.conf import settings
from django.db import DatabaseError
from django.http import JsonResponse
from django.utils import timezone

from core.models import MaintenanceMode

logger = getLogger(__name__)


# Paths that must remain reachable while maintenance mode is active.
# Anchored prefixes / exact paths. Static files are handled by WhiteNoiseMiddleware
# upstream, so they never reach this middleware.
_EXEMPT_PATH_RE = re.compile(
r"^/(?:"
r"admin(?:/|$)"
r"|__heartbeat__/?$"
r"|__lbheartbeat__/?$"
r"|api/[^/]+/config/?$"
r")"
)


def is_maintenance_active() -> bool:
"""Whether maintenance mode is currently active.

OR-combination of the env-var escape hatch and the DB-backed singleton.
"""
if settings.MAINTENANCE_MODE:
return True
return MaintenanceMode.get_solo().is_active_now()


class MaintenanceMiddleware:
"""Short-circuit non-exempt requests with 503 when maintenance is active."""

def __init__(self, get_response):
self.get_response = get_response

def __call__(self, request):
if _EXEMPT_PATH_RE.match(request.path) or not is_maintenance_active():
return self.get_response(request)

response = JsonResponse(
{"code": "maintenance_mode", "detail": "Service under maintenance"},
status=503,
)

# Env-var escape hatch: skip the DB entirely. This path is typically
# used precisely when the DB is unreachable, so a lookup here would
# turn the 503 into a 500. No singleton means no ends_at → no
# Retry-After.
if settings.MAINTENANCE_MODE:
return response

# DB-driven: best-effort Retry-After. Swallow DB errors so a transient
# failure between the active-check cache hit and this lookup still
# yields a 503 rather than a 500.
try:
ends_at = MaintenanceMode.get_solo().ends_at
except DatabaseError:
return response
if ends_at:
retry_after = int((ends_at - timezone.now()).total_seconds())
if retry_after > 0:
response["Retry-After"] = str(retry_after)
return response
72 changes: 72 additions & 0 deletions src/backend/core/migrations/0008_maintenancemode.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
from django.conf import settings
from django.db import migrations, models


class Migration(migrations.Migration):
dependencies = [
("core", "0007_siteconfiguration_status_banner_content_and_more"),
migrations.swappable_dependency(settings.AUTH_USER_MODEL),
]

operations = [
migrations.CreateModel(
name="MaintenanceMode",
fields=[
(
"id",
models.AutoField(
auto_created=True, primary_key=True, serialize=False, verbose_name="ID"
),
),
(
"enabled",
models.BooleanField(
default=False,
help_text="When checked, the app is in maintenance mode for end-users.",
verbose_name="Enabled",
),
),
(
"message",
models.TextField(
blank=True,
default="",
help_text="Shown on the maintenance page. Leave blank for the default message.",
verbose_name="Message",
),
),
(
"starts_at",
models.DateTimeField(
blank=True,
help_text="If set, maintenance is inactive before this date.",
null=True,
verbose_name="Starts at",
),
),
(
"ends_at",
models.DateTimeField(
blank=True,
help_text="If set, maintenance is inactive after this date.",
null=True,
verbose_name="Ends at",
),
),
("updated_at", models.DateTimeField(auto_now=True)),
(
"updated_by",
models.ForeignKey(
blank=True,
null=True,
on_delete=models.deletion.SET_NULL,
related_name="+",
to=settings.AUTH_USER_MODEL,
),
),
],
options={
"verbose_name": "Maintenance Mode",
},
),
]
Loading
Loading