Skip to content

feat(store): add PostgreSQL backend to openai_response_store filter#605

Merged
shaneutt merged 3 commits into
praxis-proxy:mainfrom
leseb:leseb/postgres-integration-tests
Jun 16, 2026
Merged

feat(store): add PostgreSQL backend to openai_response_store filter#605
shaneutt merged 3 commits into
praxis-proxy:mainfrom
leseb:leseb/postgres-integration-tests

Conversation

@leseb

@leseb leseb commented Jun 15, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Wire PostgresResponseStore into the openai_response_store filter config (StorageBackend::Postgres) and initialization, with SSL mode and root cert support
  • Add PostgresGuard RAII container helper in test utils for spinning up ephemeral postgres containers (podman/docker)
  • Add two #[ignore] integration tests that start a real postgres container to verify end-to-end persistence and passthrough behavior
  • Document PostgreSQL backend options as commented-out alternatives in the existing response-store.yaml example config
  • Add config validation: postgres URL scheme check, SSL field rejection for sqlite backend, and 7 new unit tests

Test plan

  • cargo test -p praxis-proxy-filter --lib -- store::tests — all unit tests pass (including 7 new postgres config tests)
  • cargo test -p praxis-tests-integration --test suite -- openai_response_store_postgres --ignored — 2 postgres integration tests pass
  • cargo test -p praxis-tests-integration --test suite -- openai_response_store — 7 SQLite integration tests pass, 2 postgres ignored
  • cargo test -p praxis-tests-schema — 159 schema tests pass
  • make lint — clippy, fmt, lint-deps, lint-example-tests all clean

🤖 Generated with Claude Code

@leseb leseb requested review from a team June 15, 2026 15:20
@praxis-bot

Copy link
Copy Markdown
Collaborator

PR too large: 1609 lines added (limit: 750, excludes Cargo files, tests, docs, examples, and benchmarks). Please split into smaller PRs. Add skip/pr-hygiene label to override.

@praxis-bot praxis-bot left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: feat(store): add PostgreSQL backend to openai_response_store filter

Well-structured PR that adds PostgreSQL as a storage backend with thorough SSRF protection, SSL configuration validation, and a Postgres retry-on-failure init strategy. The SSRF validation is notably thorough, covering legacy IPv4 encodings, IPv4-mapped IPv6, DNS rebinding TOCTOU, and Unix socket path traversal. The test coverage for the config validation layer is extensive. Code quality is high overall.

Findings

Severity Area Finding
Medium Security Cloud metadata IP lacks explicit test; CGNAT/shared range (100.64.0.0/10) not blocked
Medium Correctness validate_config called redundantly on every Postgres build_store attempt
Small Correctness get_or_try_init retry docstring could clarify that retry stops after first success
Small Security is_postgres_localhost_name only checks localhost, not other loopback aliases
Small Robustness Integration test config patching uses string replace, coupled to exact format
Medium Correctness Duplicate table identifier validation: both Postgres-specific and generic checks run
Nit Style PostgresGuard hardcodes postgres:17-alpine; consider env var override
Nit Positive Dump redaction tests for both top-level and branch-chain database_url are thorough

See inline comments for details.

IpAddr::V4(v4) => v4.is_loopback() || v4.is_private() || v4.is_link_local() || v4.is_unspecified(),
IpAddr::V6(v6) => v6.is_loopback() || v6.is_unique_local() || v6.is_unicast_link_local() || v6.is_unspecified(),
}
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Medium] The SSRF check covers loopback, private, link-local, and unspecified, which is solid. The 169.254.169.254 cloud metadata endpoint falls within 169.254.0.0/16 and is covered by v4.is_link_local(), so the most critical SSRF target is blocked. However:

  1. There is no unit test explicitly asserting that 169.254.169.254 is rejected. Adding one would lock in this guarantee and document the intent — the existing postgres_config_rejects_link_local_database_host test uses 169.254.169.254 but it would be worth adding it to the legacy IPv4 test list too.

  2. The CGNAT/shared range 100.64.0.0/10 is not covered. While less critical than the metadata endpoint, it is used in cloud environments (AWS VPC peering, Tailscale, carrier NAT) and could be an SSRF vector. Consider whether it should be blocked in strict mode.

  3. For IPv6, v6.is_unicast_link_local() — verify this method is stable on Rust 1.94+. It was stabilized in 1.86, so it should be fine, but worth confirming it compiles on the MSRV.

validate_config(&self.config).map_err(|e| {
StoreError::Unavailable(format!("postgres config validation failed before connect: {e}"))
})?;
let ssl_root_cert = self.config.ssl_root_cert.as_ref().map(|s| {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Medium] build_store calls validate_config(&self.config) on every Postgres init attempt. The comment explains this is for DNS re-validation before SQLx connects, but validate_config performs ~15 checks beyond host validation: URL scheme, table identifiers, SSL config, and SQLite/Postgres field cross-checks — all of which are immutable after construction.

Consider extracting a narrower revalidate_postgres_host function that only re-checks the DNS/IP host portions. This would:

  • Make the defense-in-depth intent clearer
  • Avoid redundant work on every retry
  • Prevent confusion when a future maintainer adds a failing check to validate_config and wonders why it fires at connection time

}

/// Build the store and log successful initialization.
async fn build_logged_store(&self) -> Result<Arc<dyn ResponseStore>, StoreError> {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Small] The get_or_try_init semantics here are correct: on Err, the OnceCell remains unset so the next request retries. On Ok, it is set permanently. The docstring says "retrying transient Postgres failures" which is accurate, but could be misread as "retries on every request even after success." Consider adding a note like "once initialization succeeds, the store is cached permanently (the connection pool handles reconnection internally)" to prevent future confusion.


u32::from_str_radix(digits, radix).ok()
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Small] is_postgres_localhost_name only checks the literal name "localhost" (case-insensitive, with optional trailing dot). Some systems also resolve localhost.localdomain, ip6-localhost, or ip6-loopback to loopback addresses. These are uncommon in database URLs but could theoretically bypass the check. The IP-based checks (is_loopback()) are the primary defense, but documenting that this function is just an early-exit optimization (not the security boundary) would help future readers.

"database_url: \"sqlite://responses.db?mode=rwc\"",
&format!("database_url: \"{}\"", pg.url()),
)
.replace(

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Small] The config patching uses .replace("backend: sqlite", "backend: postgres") on the raw YAML string. This is fragile — if the example config ever adds a comment containing "backend: sqlite", changes indentation, or reorders fields, the replacement could produce invalid YAML or replace the wrong occurrence.

The Config::from_yaml call below does catch parse failures, so this won't silently produce a broken config. But the coupling to exact formatting is worth noting. Consider adding a brief comment documenting the assumed format.

Comment thread server/src/dump.rs
);
}

#[test]

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Nit] Good addition. The database_url redaction test and the branch-chain variant below are both thorough. This ensures credentials embedded in Postgres connection URLs are never exposed in --dump output, even when the store filter is nested inside a branch chain.

@shaneutt shaneutt moved this to Review in AI Gateway Jun 15, 2026
@shaneutt shaneutt added this to the v0.4.0 milestone Jun 15, 2026
Ok(store)
}

/// Initialize a SQLite store once, caching failed init permanently.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Initialize a SQLite store once, caching failed init permanently.
/// Initialize a DB once, caching failed init permanently.

//! Functional tests for the `openai_response_store` example config
//! with PostgreSQL backend.

use std::collections::HashMap;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be good to add postgres to the CI so the integration tests could run

@franciscojavierarceo franciscojavierarceo left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some small nits than can be handled in a follow up PR

@praxis-bot praxis-bot left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review

Summary: Adds PostgreSQL backend support to the openai_response_store filter, with SSRF-safe URL validation, SSL/TLS config, retry-on-failure semantics, container-based integration tests, and config dump redaction.

Overall: Solid, security-conscious implementation. The SSRF validation is thorough (legacy IPv4, DNS rebinding TOCTOU, mapped-v4 normalization, socket path traversal). The init-retry split between SQLite (permanent) and Postgres (transient) is well-designed. Test coverage is extensive with 50+ new test cases covering config validation, SSRF vectors, SSL modes, and behavioral scenarios. A few items worth addressing below.

Severity Count
Critical 0
Large 1
Medium 2
Small 2
Nit 2

Findings without inline placement

  • [Small] filter/src/builtins/http/ai/openai/responses/store/config.rs -- StorageBackend derives Clone but not Default. Since backend is a required config field this is fine functionally, but all other backend-like enums in the codebase derive Default for the most common variant. Consider whether #[serde(default)] on backend with Sqlite as default would reduce config boilerplate for the common case, or remove Clone if it is not needed (the struct itself does not derive Clone since SecretString is not Clone).

StorageBackend::Postgres => {
validate_postgres_database_url(database_url, cfg.allow_private_database_url)?;
validate_postgres_table_identifiers(&cfg.responses_table, &cfg.conversations_table)
.map_err(|e| format!("openai_response_store: invalid postgres table identifier: {e}"))?;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Medium] The Postgres branch calls validate_postgres_table_identifiers (which enforces the PostgreSQL 63-byte limit) and then validate_table_identifier runs again unconditionally on lines 106-109. This means Postgres table names are validated twice by validate_table_identifier -- once inside validate_postgres_table_identifiers (via validate_table_names -> validate_identifier) and again here. The duplicate work is harmless but obscures the flow. Consider either (a) moving the common validate_table_identifier calls above the match (they apply to both backends) and only calling validate_postgres_table_identifiers for the Postgres-specific length check, or (b) skipping the common calls for Postgres since validate_postgres_table_identifiers already covers them.

backend = ?self.config.backend,
error = %e,
"response store initialization failed (permanent)"
);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Medium] get_or_init_store uses get_or_try_init for Postgres, which means a failed init does not populate the OnceCell, allowing retries. However, there is no backoff or rate-limiting on these retries. Every request that arrives while the database is down will attempt a full connection + schema migration. Under load, this could create a thundering herd of connection attempts against an already-struggling database. Consider adding a simple time-based gate (e.g., skip retries for N seconds after the last failure) or limiting concurrent init attempts.

// PostgresGuard
// -----------------------------------------------------------------------------

/// RAII guard that manages a `PostgreSQL` container lifecycle.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Nit] READY_POLL_INTERVAL is 100ms, which means up to 300 polls over the 30s timeout. A slightly longer interval like 250ms would reduce CPU overhead during container startup without meaningfully impacting test latency (PostgreSQL typically takes 2-5 seconds to start).

.replace("backend: sqlite", "backend: postgres")
.replace(
"database_url: \"sqlite://responses.db?mode=rwc\"",
&format!("database_url: \"{}\"", pg.url()),

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Small] The config patching here uses string replacements (.replace("backend: sqlite", "backend: postgres") etc.) which is fragile -- it depends on the exact whitespace and formatting of the example config. If the example config is reformatted or reordered, these tests will silently produce invalid YAML rather than failing at the replacement step. Consider asserting that each replacement actually changed the string, or using a YAML parse-modify-serialize approach.

let yaml: serde_yaml::Value = serde_yaml::from_str(&format!(
r#"
backend: postgres
database_url: "postgres://user:pass@203.0.113.10:5432/praxis?host={}"

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Nit] The postgres_store_init_failure_is_not_cached test constructs a URL with host=<socket_dir> targeting a nonexistent Unix socket path. The authority host 203.0.113.10 (RFC 5737 TEST-NET-3) is a good choice. However, the test relies on the socket connection failing because the directory does not exist. Consider adding a brief message to the final two assertions explaining why the OnceCell should remain unset -- e.g., "failed postgres initialization via missing socket should leave OnceCell unset for retry" -- so the test intent is clear if the failure mode changes.

Wire the PostgresResponseStore into the response store filter config
and initialization. Add PostgresGuard RAII container helper for
integration tests, and two #[ignore] integration tests that spin up a
real postgres container to verify end-to-end persistence and
passthrough behavior.

Signed-off-by: Sébastien Han <seb@redhat.com>
@leseb leseb force-pushed the leseb/postgres-integration-tests branch from 1b1ebf4 to 1e75790 Compare June 16, 2026 06:56
- Extract revalidate_postgres_host() for focused SSRF re-check at
  connect time instead of re-running full validate_config on every
  Postgres retry
- Hoist common table identifier validation above the backend match
  to eliminate duplicate validation for Postgres
- Fix init_permanent_store docstring to say "store" not "SQLite store"
- Add postgres integration tests to CI workflow

Signed-off-by: Sébastien Han <seb@redhat.com>
@praxis-proxy praxis-proxy deleted a comment from praxis-bot Jun 16, 2026
@shaneutt shaneutt merged commit b82f419 into praxis-proxy:main Jun 16, 2026
21 of 22 checks passed
@github-project-automation github-project-automation Bot moved this from Review to Done in AI Gateway Jun 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants