Skip to content

feat: cache autoconfig in store, init from store#593

Open
LD-Sfalzon wants to merge 33 commits intolaunchdarkly:v8from
LD-Sfalzon:feature/autoconfig-cache-init-from-store
Open

feat: cache autoconfig in store, init from store#593
LD-Sfalzon wants to merge 33 commits intolaunchdarkly:v8from
LD-Sfalzon:feature/autoconfig-cache-init-from-store

Conversation

@LD-Sfalzon
Copy link
Copy Markdown
Contributor

@LD-Sfalzon LD-Sfalzon commented Mar 16, 2026

Requirements

  • I have added test coverage for new or changed functionality
  • [ X] I have followed the repository's pull request submission guidelines
  • [ X] I have validated my changes against all supported platform versions

Related issues
if a relay proxy is unable to connec to LD Streaming service to retrieve auto config the persistent cache does not provide a "resileincy" and it cannot be used for auto scaling during a LD incident
Provide links to any issues in this repository or elsewhere relating to this pull request.

Describe the solution you've provided
auto config can now be saved to the persistent cache, encrypted by default with the auto config key (configurable)
I have tested this with Valkey only

Provide a clear and concise description of what you expect to happen.

Describe alternatives you've considered

Provide a clear and concise description of any alternative solutions or features you've considered.

Additional context

Add any other context about the pull request here.


Note

Medium Risk
Adds a new encrypted persistence layer and startup race between cache and SSE stream, affecting initialization and consistency when AutoConfig is enabled. Risk centers on cache correctness, encryption/key management, and DynamoDB/Redis write behaviors under partial failures.

Overview
Adds persistent, encrypted caching for AutoConfig data so Relay can start serving environments/filters from Redis/Valkey or DynamoDB when AUTO_CONFIG_CACHE_KEY is set.

On startup, StreamManager now races an async cache read against establishing the SSE stream; cached content is applied immediately if it wins, and live stream put becomes authoritative and cancels any in-flight cache read. Subsequent put events write a full snapshot to the cache, while patch/delete events incrementally upsert/delete cached items.

Introduces internal/autoconfigcache with Redis and DynamoDB store implementations and AES-GCM encryption (key derived from AUTO_CONFIG_CACHE_ENCRYPTION_KEY or the AutoConfig key). Configuration/docs add the new cache settings plus validation to require Redis/DynamoDB (and a DynamoDB table when applicable).

Reviewed by Cursor Bugbot for commit aa72bf4. Bugbot is set up for automated code reviews on this repo. Configure here.

LD-Sfalzon and others added 4 commits March 5, 2026 13:52
- Add initFromStoreFirst, cacheKey, and cacheEncryptionKey to AutoConfig config
- Cache encrypted AutoConfig snapshot in Redis or DynamoDB; load on startup
  when LaunchDarkly is unreachable so Relay can serve from last known config
- Update cache whenever stream sends a full put
- Encryption key optional: defaults to AutoConfig key (SHA-256 derived)
- Document new options in configuration.md

Made-with: Cursor
@LD-Sfalzon LD-Sfalzon requested a review from a team as a code owner March 16, 2026 01:50
…eletes correctly

- Add MessageReceiver.Seed() to record cached envs/filters without emitting actions
- Add StreamManager.SeedFromPutContent() so envReceiver/filterReceiver stay in sync with cache
- Load from store returns PutContent; relay applies to handler then seeds stream before Start()
- Fixes: cached envs no longer skip updates (ActionUpdate) and stale envs are removed by Retain

Made-with: Cursor
- Close autoConfigCache in Relay.Close() to avoid leaking Redis/DynamoDB connections
- Validate DYNAMODB_TABLE when InitFromStoreFirst with DynamoDB (cache uses global table)
- Use strings.TrimSpace for cache key to match validation (remove custom trimCacheKey)

Made-with: Cursor
When REDIS_TLS=true but URL is redis://, set TLSConfig from config so the
cache store uses TLS (matches bigsegments/store_redis.go behavior).

Made-with: Cursor
@LD-Sfalzon LD-Sfalzon changed the title Feature/autoconfig cache init from store feat: cache autoconfig in store, init from store Mar 17, 2026
Replace the single-blob cache design with per-item storage matching the
patterns in go-server-sdk-dynamodb and go-server-sdk-redis-redigo. This
avoids DynamoDB's 400KB item size limit for large customers.

Key changes:
- Store interface changed from Get/Set []byte to GetAll/SetAll PutContent
- Redis: uses a Hash (HGETALL/HSET) with MULTI/EXEC transactions
- DynamoDB: individual items per env/filter with Query + BatchWriteItem,
  plus a checkSizeLimit guard matching the SDK pattern
- Cached data now flows through the normal handlePut path via
  StreamManager.ApplyCachedPut, eliminating SeedFromPutContent, the
  race condition it caused, and the separate applyPutContentToHandler
  code path
- StreamManager owns the cache write directly, removing the
  PutContentReceiver interface, cachingAutoConfigHandler wrapper, and
  the forwarding method on ProjectRouter
StreamManager now owns the full cache lifecycle: it reads from the cache
on Start() before connecting the stream, and writes after each PUT.

- Cache interface (GetAll + SetAll + Close) defined in autoconfig package
- noopStore implements the interface as a null object, returned by
  NewStore when no backing store is configured
- Eliminates nil checks — StreamManager always has a valid cache
- Removes ApplyCachedPut — cache read/apply is internal to Start()
- relay.go simplified: just creates the cache and passes it through
Derive the AES-256 key via SHA-256 from whatever string the user
provides, rather than requiring exactly 32 bytes or base64-encoded
32 bytes. Update docs to remove the length restriction.
The cacheKey is user-provided specifically to namespace cache entries.
Use it as-is for the Redis hash key and DynamoDB partition key rather
than prepending ld:autoconfig:. Suffixes can be added later if we
need to store additional data types under the same key.
}
}
return nil
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DynamoDB batch write silently drops unprocessed items

Medium Severity

The batchWrite method discards the BatchWriteItemOutput response (assigned to _), ignoring UnprocessedItems. DynamoDB's BatchWriteItem API can return partial successes under throttling or provisioned throughput limits. Unprocessed items are silently lost, potentially leaving the cache in an incomplete state. On next cold startup, missing environments or filters would not be served.

Fix in Cursor Fix in Web

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It isn't silently dropped as we issue a log statement. Continuing on failure in this case is better as this is serving solely as a remediation effort, and having some environments is better than no environments.

If Redis is temporarily down at startup, the cache should degrade
gracefully rather than preventing the relay from starting. The ping
was counterproductive to the resilience goal of the feature.
Log and continue instead of aborting if a single batch of 25 items
fails. Partial cache data is better than none for resilience.
Add Upsert and Delete methods to the Cache interface so individual
environment/filter changes from PATCH and DELETE stream events are
persisted to the cache immediately. Previously only PUT events updated
the cache, so changes between PUTs would be lost on restart.

Also consolidates the env:/filter: key prefix constants into a shared
cacheField helper used by both Redis and DynamoDB stores.
Setting AUTO_CONFIG_CACHE_KEY is sufficient to enable AutoConfig
caching. The separate AUTO_CONFIG_INIT_FROM_STORE_FIRST boolean
was redundant — if you set a cache key, you want caching.
Add a Persist flag to PutContent. Data from the live stream is marked
Persist=true so handlePut writes it to cache. Data restored from cache
on startup has Persist=false (the default), so it flows through
handlePut without a redundant cache write.
Each cached environment/filter is now serialized inside a CachedItem
wrapper with kind, modelVersion, and data fields. On read, items with
an unrecognized model version are skipped with a warning, giving us a
clean path to handle format migrations in the future.
Each cache store now holds an internal context that is cancelled on
Close(), terminating any in-flight operations. The caller's context
is combined with the store's context using context.AfterFunc so that
either cancellation terminates the operation. The mergeContext helper
is shared between Redis and DynamoDB implementations.
@@ -0,0 +1,57 @@
package autoconfigcache
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@launchdarkly/team-product-security mind taking a look at this new addition and let me know if it seems acceptable?

@keelerm84 keelerm84 requested a review from a team April 6, 2026 19:06
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

Fix All in Cursor

Reviewed by Cursor Bugbot for commit b450cc7. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants