feat: cache autoconfig in store, init from store#593
feat: cache autoconfig in store, init from store#593LD-Sfalzon wants to merge 33 commits intolaunchdarkly:v8from
Conversation
- Add initFromStoreFirst, cacheKey, and cacheEncryptionKey to AutoConfig config - Cache encrypted AutoConfig snapshot in Redis or DynamoDB; load on startup when LaunchDarkly is unreachable so Relay can serve from last known config - Update cache whenever stream sends a full put - Encryption key optional: defaults to AutoConfig key (SHA-256 derived) - Document new options in configuration.md Made-with: Cursor
…eletes correctly - Add MessageReceiver.Seed() to record cached envs/filters without emitting actions - Add StreamManager.SeedFromPutContent() so envReceiver/filterReceiver stay in sync with cache - Load from store returns PutContent; relay applies to handler then seeds stream before Start() - Fixes: cached envs no longer skip updates (ActionUpdate) and stale envs are removed by Retain Made-with: Cursor
- Close autoConfigCache in Relay.Close() to avoid leaking Redis/DynamoDB connections - Validate DYNAMODB_TABLE when InitFromStoreFirst with DynamoDB (cache uses global table) - Use strings.TrimSpace for cache key to match validation (remove custom trimCacheKey) Made-with: Cursor
When REDIS_TLS=true but URL is redis://, set TLSConfig from config so the cache store uses TLS (matches bigsegments/store_redis.go behavior). Made-with: Cursor
Replace the single-blob cache design with per-item storage matching the patterns in go-server-sdk-dynamodb and go-server-sdk-redis-redigo. This avoids DynamoDB's 400KB item size limit for large customers. Key changes: - Store interface changed from Get/Set []byte to GetAll/SetAll PutContent - Redis: uses a Hash (HGETALL/HSET) with MULTI/EXEC transactions - DynamoDB: individual items per env/filter with Query + BatchWriteItem, plus a checkSizeLimit guard matching the SDK pattern - Cached data now flows through the normal handlePut path via StreamManager.ApplyCachedPut, eliminating SeedFromPutContent, the race condition it caused, and the separate applyPutContentToHandler code path - StreamManager owns the cache write directly, removing the PutContentReceiver interface, cachingAutoConfigHandler wrapper, and the forwarding method on ProjectRouter
StreamManager now owns the full cache lifecycle: it reads from the cache on Start() before connecting the stream, and writes after each PUT. - Cache interface (GetAll + SetAll + Close) defined in autoconfig package - noopStore implements the interface as a null object, returned by NewStore when no backing store is configured - Eliminates nil checks — StreamManager always has a valid cache - Removes ApplyCachedPut — cache read/apply is internal to Start() - relay.go simplified: just creates the cache and passes it through
Derive the AES-256 key via SHA-256 from whatever string the user provides, rather than requiring exactly 32 bytes or base64-encoded 32 bytes. Update docs to remove the length restriction.
The cacheKey is user-provided specifically to namespace cache entries. Use it as-is for the Redis hash key and DynamoDB partition key rather than prepending ld:autoconfig:. Suffixes can be added later if we need to store additional data types under the same key.
| } | ||
| } | ||
| return nil | ||
| } |
There was a problem hiding this comment.
DynamoDB batch write silently drops unprocessed items
Medium Severity
The batchWrite method discards the BatchWriteItemOutput response (assigned to _), ignoring UnprocessedItems. DynamoDB's BatchWriteItem API can return partial successes under throttling or provisioned throughput limits. Unprocessed items are silently lost, potentially leaving the cache in an incomplete state. On next cold startup, missing environments or filters would not be served.
There was a problem hiding this comment.
It isn't silently dropped as we issue a log statement. Continuing on failure in this case is better as this is serving solely as a remediation effort, and having some environments is better than no environments.
If Redis is temporarily down at startup, the cache should degrade gracefully rather than preventing the relay from starting. The ping was counterproductive to the resilience goal of the feature.
Log and continue instead of aborting if a single batch of 25 items fails. Partial cache data is better than none for resilience.
Add Upsert and Delete methods to the Cache interface so individual environment/filter changes from PATCH and DELETE stream events are persisted to the cache immediately. Previously only PUT events updated the cache, so changes between PUTs would be lost on restart. Also consolidates the env:/filter: key prefix constants into a shared cacheField helper used by both Redis and DynamoDB stores.
Setting AUTO_CONFIG_CACHE_KEY is sufficient to enable AutoConfig caching. The separate AUTO_CONFIG_INIT_FROM_STORE_FIRST boolean was redundant — if you set a cache key, you want caching.
Add a Persist flag to PutContent. Data from the live stream is marked Persist=true so handlePut writes it to cache. Data restored from cache on startup has Persist=false (the default), so it flows through handlePut without a redundant cache write.
Each cached environment/filter is now serialized inside a CachedItem wrapper with kind, modelVersion, and data fields. On read, items with an unrecognized model version are skipped with a warning, giving us a clean path to handle format migrations in the future.
Each cache store now holds an internal context that is cancelled on Close(), terminating any in-flight operations. The caller's context is combined with the store's context using context.AfterFunc so that either cancellation terminates the operation. The mergeContext helper is shared between Redis and DynamoDB implementations.
| @@ -0,0 +1,57 @@ | |||
| package autoconfigcache | |||
There was a problem hiding this comment.
@launchdarkly/team-product-security mind taking a look at this new addition and let me know if it seems acceptable?
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 2 total unresolved issues (including 1 from previous review).
Reviewed by Cursor Bugbot for commit b450cc7. Configure here.


Requirements
Related issues
if a relay proxy is unable to connec to LD Streaming service to retrieve auto config the persistent cache does not provide a "resileincy" and it cannot be used for auto scaling during a LD incident
Provide links to any issues in this repository or elsewhere relating to this pull request.
Describe the solution you've provided
auto config can now be saved to the persistent cache, encrypted by default with the auto config key (configurable)
I have tested this with Valkey only
Provide a clear and concise description of what you expect to happen.
Describe alternatives you've considered
Provide a clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context about the pull request here.
Note
Medium Risk
Adds a new encrypted persistence layer and startup race between cache and SSE stream, affecting initialization and consistency when AutoConfig is enabled. Risk centers on cache correctness, encryption/key management, and DynamoDB/Redis write behaviors under partial failures.
Overview
Adds persistent, encrypted caching for AutoConfig data so Relay can start serving environments/filters from Redis/Valkey or DynamoDB when
AUTO_CONFIG_CACHE_KEYis set.On startup,
StreamManagernow races an async cache read against establishing the SSE stream; cached content is applied immediately if it wins, and live streamputbecomes authoritative and cancels any in-flight cache read. Subsequentputevents write a full snapshot to the cache, whilepatch/deleteevents incrementally upsert/delete cached items.Introduces
internal/autoconfigcachewith Redis and DynamoDB store implementations and AES-GCM encryption (key derived fromAUTO_CONFIG_CACHE_ENCRYPTION_KEYor the AutoConfig key). Configuration/docs add the new cache settings plus validation to require Redis/DynamoDB (and a DynamoDB table when applicable).Reviewed by Cursor Bugbot for commit aa72bf4. Bugbot is set up for automated code reviews on this repo. Configure here.