feat(store): add PostgreSQL backend to openai_response_store filter#605
Conversation
|
PR too large: 1609 lines added (limit: 750, excludes Cargo files, tests, docs, examples, and benchmarks). Please split into smaller PRs. Add |
praxis-bot
left a comment
There was a problem hiding this comment.
Review: feat(store): add PostgreSQL backend to openai_response_store filter
Well-structured PR that adds PostgreSQL as a storage backend with thorough SSRF protection, SSL configuration validation, and a Postgres retry-on-failure init strategy. The SSRF validation is notably thorough, covering legacy IPv4 encodings, IPv4-mapped IPv6, DNS rebinding TOCTOU, and Unix socket path traversal. The test coverage for the config validation layer is extensive. Code quality is high overall.
Findings
| Severity | Area | Finding |
|---|---|---|
| Medium | Security | Cloud metadata IP lacks explicit test; CGNAT/shared range (100.64.0.0/10) not blocked |
| Medium | Correctness | validate_config called redundantly on every Postgres build_store attempt |
| Small | Correctness | get_or_try_init retry docstring could clarify that retry stops after first success |
| Small | Security | is_postgres_localhost_name only checks localhost, not other loopback aliases |
| Small | Robustness | Integration test config patching uses string replace, coupled to exact format |
| Medium | Correctness | Duplicate table identifier validation: both Postgres-specific and generic checks run |
| Nit | Style | PostgresGuard hardcodes postgres:17-alpine; consider env var override |
| Nit | Positive | Dump redaction tests for both top-level and branch-chain database_url are thorough |
See inline comments for details.
| IpAddr::V4(v4) => v4.is_loopback() || v4.is_private() || v4.is_link_local() || v4.is_unspecified(), | ||
| IpAddr::V6(v6) => v6.is_loopback() || v6.is_unique_local() || v6.is_unicast_link_local() || v6.is_unspecified(), | ||
| } | ||
| } |
There was a problem hiding this comment.
[Medium] The SSRF check covers loopback, private, link-local, and unspecified, which is solid. The 169.254.169.254 cloud metadata endpoint falls within 169.254.0.0/16 and is covered by v4.is_link_local(), so the most critical SSRF target is blocked. However:
-
There is no unit test explicitly asserting that
169.254.169.254is rejected. Adding one would lock in this guarantee and document the intent — the existingpostgres_config_rejects_link_local_database_hosttest uses169.254.169.254but it would be worth adding it to the legacy IPv4 test list too. -
The CGNAT/shared range
100.64.0.0/10is not covered. While less critical than the metadata endpoint, it is used in cloud environments (AWS VPC peering, Tailscale, carrier NAT) and could be an SSRF vector. Consider whether it should be blocked in strict mode. -
For IPv6,
v6.is_unicast_link_local()— verify this method is stable on Rust 1.94+. It was stabilized in 1.86, so it should be fine, but worth confirming it compiles on the MSRV.
| validate_config(&self.config).map_err(|e| { | ||
| StoreError::Unavailable(format!("postgres config validation failed before connect: {e}")) | ||
| })?; | ||
| let ssl_root_cert = self.config.ssl_root_cert.as_ref().map(|s| { |
There was a problem hiding this comment.
[Medium] build_store calls validate_config(&self.config) on every Postgres init attempt. The comment explains this is for DNS re-validation before SQLx connects, but validate_config performs ~15 checks beyond host validation: URL scheme, table identifiers, SSL config, and SQLite/Postgres field cross-checks — all of which are immutable after construction.
Consider extracting a narrower revalidate_postgres_host function that only re-checks the DNS/IP host portions. This would:
- Make the defense-in-depth intent clearer
- Avoid redundant work on every retry
- Prevent confusion when a future maintainer adds a failing check to
validate_configand wonders why it fires at connection time
| } | ||
|
|
||
| /// Build the store and log successful initialization. | ||
| async fn build_logged_store(&self) -> Result<Arc<dyn ResponseStore>, StoreError> { |
There was a problem hiding this comment.
[Small] The get_or_try_init semantics here are correct: on Err, the OnceCell remains unset so the next request retries. On Ok, it is set permanently. The docstring says "retrying transient Postgres failures" which is accurate, but could be misread as "retries on every request even after success." Consider adding a note like "once initialization succeeds, the store is cached permanently (the connection pool handles reconnection internally)" to prevent future confusion.
|
|
||
| u32::from_str_radix(digits, radix).ok() | ||
| } | ||
|
|
There was a problem hiding this comment.
[Small] is_postgres_localhost_name only checks the literal name "localhost" (case-insensitive, with optional trailing dot). Some systems also resolve localhost.localdomain, ip6-localhost, or ip6-loopback to loopback addresses. These are uncommon in database URLs but could theoretically bypass the check. The IP-based checks (is_loopback()) are the primary defense, but documenting that this function is just an early-exit optimization (not the security boundary) would help future readers.
| "database_url: \"sqlite://responses.db?mode=rwc\"", | ||
| &format!("database_url: \"{}\"", pg.url()), | ||
| ) | ||
| .replace( |
There was a problem hiding this comment.
[Small] The config patching uses .replace("backend: sqlite", "backend: postgres") on the raw YAML string. This is fragile — if the example config ever adds a comment containing "backend: sqlite", changes indentation, or reorders fields, the replacement could produce invalid YAML or replace the wrong occurrence.
The Config::from_yaml call below does catch parse failures, so this won't silently produce a broken config. But the coupling to exact formatting is worth noting. Consider adding a brief comment documenting the assumed format.
| ); | ||
| } | ||
|
|
||
| #[test] |
There was a problem hiding this comment.
[Nit] Good addition. The database_url redaction test and the branch-chain variant below are both thorough. This ensures credentials embedded in Postgres connection URLs are never exposed in --dump output, even when the store filter is nested inside a branch chain.
| Ok(store) | ||
| } | ||
|
|
||
| /// Initialize a SQLite store once, caching failed init permanently. |
There was a problem hiding this comment.
| /// Initialize a SQLite store once, caching failed init permanently. | |
| /// Initialize a DB once, caching failed init permanently. |
| //! Functional tests for the `openai_response_store` example config | ||
| //! with PostgreSQL backend. | ||
|
|
||
| use std::collections::HashMap; |
There was a problem hiding this comment.
it would be good to add postgres to the CI so the integration tests could run
franciscojavierarceo
left a comment
There was a problem hiding this comment.
some small nits than can be handled in a follow up PR
praxis-bot
left a comment
There was a problem hiding this comment.
PR Review
Summary: Adds PostgreSQL backend support to the openai_response_store filter, with SSRF-safe URL validation, SSL/TLS config, retry-on-failure semantics, container-based integration tests, and config dump redaction.
Overall: Solid, security-conscious implementation. The SSRF validation is thorough (legacy IPv4, DNS rebinding TOCTOU, mapped-v4 normalization, socket path traversal). The init-retry split between SQLite (permanent) and Postgres (transient) is well-designed. Test coverage is extensive with 50+ new test cases covering config validation, SSRF vectors, SSL modes, and behavioral scenarios. A few items worth addressing below.
| Severity | Count |
|---|---|
| Critical | 0 |
| Large | 1 |
| Medium | 2 |
| Small | 2 |
| Nit | 2 |
Findings without inline placement
- [Small]
filter/src/builtins/http/ai/openai/responses/store/config.rs--StorageBackendderivesClonebut notDefault. Sincebackendis a required config field this is fine functionally, but all other backend-like enums in the codebase deriveDefaultfor the most common variant. Consider whether#[serde(default)]onbackendwithSqliteas default would reduce config boilerplate for the common case, or removeCloneif it is not needed (the struct itself does not deriveClonesinceSecretStringis notClone).
| StorageBackend::Postgres => { | ||
| validate_postgres_database_url(database_url, cfg.allow_private_database_url)?; | ||
| validate_postgres_table_identifiers(&cfg.responses_table, &cfg.conversations_table) | ||
| .map_err(|e| format!("openai_response_store: invalid postgres table identifier: {e}"))?; |
There was a problem hiding this comment.
[Medium] The Postgres branch calls validate_postgres_table_identifiers (which enforces the PostgreSQL 63-byte limit) and then validate_table_identifier runs again unconditionally on lines 106-109. This means Postgres table names are validated twice by validate_table_identifier -- once inside validate_postgres_table_identifiers (via validate_table_names -> validate_identifier) and again here. The duplicate work is harmless but obscures the flow. Consider either (a) moving the common validate_table_identifier calls above the match (they apply to both backends) and only calling validate_postgres_table_identifiers for the Postgres-specific length check, or (b) skipping the common calls for Postgres since validate_postgres_table_identifiers already covers them.
| backend = ?self.config.backend, | ||
| error = %e, | ||
| "response store initialization failed (permanent)" | ||
| ); |
There was a problem hiding this comment.
[Medium] get_or_init_store uses get_or_try_init for Postgres, which means a failed init does not populate the OnceCell, allowing retries. However, there is no backoff or rate-limiting on these retries. Every request that arrives while the database is down will attempt a full connection + schema migration. Under load, this could create a thundering herd of connection attempts against an already-struggling database. Consider adding a simple time-based gate (e.g., skip retries for N seconds after the last failure) or limiting concurrent init attempts.
| // PostgresGuard | ||
| // ----------------------------------------------------------------------------- | ||
|
|
||
| /// RAII guard that manages a `PostgreSQL` container lifecycle. |
There was a problem hiding this comment.
[Nit] READY_POLL_INTERVAL is 100ms, which means up to 300 polls over the 30s timeout. A slightly longer interval like 250ms would reduce CPU overhead during container startup without meaningfully impacting test latency (PostgreSQL typically takes 2-5 seconds to start).
| .replace("backend: sqlite", "backend: postgres") | ||
| .replace( | ||
| "database_url: \"sqlite://responses.db?mode=rwc\"", | ||
| &format!("database_url: \"{}\"", pg.url()), |
There was a problem hiding this comment.
[Small] The config patching here uses string replacements (.replace("backend: sqlite", "backend: postgres") etc.) which is fragile -- it depends on the exact whitespace and formatting of the example config. If the example config is reformatted or reordered, these tests will silently produce invalid YAML rather than failing at the replacement step. Consider asserting that each replacement actually changed the string, or using a YAML parse-modify-serialize approach.
| let yaml: serde_yaml::Value = serde_yaml::from_str(&format!( | ||
| r#" | ||
| backend: postgres | ||
| database_url: "postgres://user:pass@203.0.113.10:5432/praxis?host={}" |
There was a problem hiding this comment.
[Nit] The postgres_store_init_failure_is_not_cached test constructs a URL with host=<socket_dir> targeting a nonexistent Unix socket path. The authority host 203.0.113.10 (RFC 5737 TEST-NET-3) is a good choice. However, the test relies on the socket connection failing because the directory does not exist. Consider adding a brief message to the final two assertions explaining why the OnceCell should remain unset -- e.g., "failed postgres initialization via missing socket should leave OnceCell unset for retry" -- so the test intent is clear if the failure mode changes.
Wire the PostgresResponseStore into the response store filter config and initialization. Add PostgresGuard RAII container helper for integration tests, and two #[ignore] integration tests that spin up a real postgres container to verify end-to-end persistence and passthrough behavior. Signed-off-by: Sébastien Han <seb@redhat.com>
1b1ebf4 to
1e75790
Compare
- Extract revalidate_postgres_host() for focused SSRF re-check at connect time instead of re-running full validate_config on every Postgres retry - Hoist common table identifier validation above the backend match to eliminate duplicate validation for Postgres - Fix init_permanent_store docstring to say "store" not "SQLite store" - Add postgres integration tests to CI workflow Signed-off-by: Sébastien Han <seb@redhat.com>
Summary
PostgresResponseStoreinto theopenai_response_storefilter config (StorageBackend::Postgres) and initialization, with SSL mode and root cert supportPostgresGuardRAII container helper in test utils for spinning up ephemeral postgres containers (podman/docker)#[ignore]integration tests that start a real postgres container to verify end-to-end persistence and passthrough behaviorresponse-store.yamlexample configTest plan
cargo test -p praxis-proxy-filter --lib -- store::tests— all unit tests pass (including 7 new postgres config tests)cargo test -p praxis-tests-integration --test suite -- openai_response_store_postgres --ignored— 2 postgres integration tests passcargo test -p praxis-tests-integration --test suite -- openai_response_store— 7 SQLite integration tests pass, 2 postgres ignoredcargo test -p praxis-tests-schema— 159 schema tests passmake lint— clippy, fmt, lint-deps, lint-example-tests all clean🤖 Generated with Claude Code