Skip to content

feat(cdk-axum): support redis-cluster for caching#1936

Open
GEET3001 wants to merge 1 commit into
cashubtc:mainfrom
GEET3001:redis-cluster
Open

feat(cdk-axum): support redis-cluster for caching#1936
GEET3001 wants to merge 1 commit into
cashubtc:mainfrom
GEET3001:redis-cluster

Conversation

@GEET3001
Copy link
Copy Markdown

@GEET3001 GEET3001 commented Apr 24, 2026

closes #1578

Description

This PR adds support for Redis Cluster and Valkey Cluster to the cdk-axum HTTP caching backend.

Currently, the cache only supports a single-node Redis connection, which fails in clustered environments because it cannot handle "MOVED" redirections. This change introduces a cluster-aware client that automatically routes requests to the correct nodes and handles cluster topology changes.


Key Changes

1.Introduced RedisClient enum to support both Standard and Cluster connection types.
2. Added cluster and cluster-async features to the redis dependency.
3. Updated HttpCacheRedis to use the appropriate connection methods for both single-node and cluster setups.
4. Added support for cluster configuration via environment variables:
CDK_MINTD_CACHE_REDIS_USE_CLUSTER: Enable cluster mode.
CDK_MINTD_CACHE_REDIS_CLUSTER_NODES: Comma-separated list of node URLs.


Notes to the reviewers

  1. The Config struct in redis.rs was updated to make connection_string optional, as cluster users will use cluster_nodes instead. Backward compatibility is maintained for existing single-node setups.
    2 . The HttpCache::from(config) logic in mod.rs now intelligently selects the client type based on the provided configuration.
  2. Added specific error logging for cluster-specific failures to aid in debugging distributed environments.

Suggested CHANGELOG Updates

CHANGED

ADDED

Redis Cluster and Valkey support for HTTP caching in cdk-axum.
New environment variables for cluster configuration: CDK_MINTD_CACHE_REDIS_USE_CLUSTER and CDK_MINTD_CACHE_REDIS_CLUSTER_NODES.

REMOVED

FIXED

Cache failures in clustered Redis/Valkey environments due to lack of redirection handling.

Checklist

  • I followed the code style guidelines
  • I ran cargo check -p cdk-axum --features redis and verified the build passes
  • If the Wallet API was modified (added/removed/changed), I have reflected those changes in the FFI bindings (crates/cdk-ffi)— N/A (Mint-side cache change only)

@github-project-automation github-project-automation Bot moved this to Backlog in CDK Apr 24, 2026
@thesimplekid
Copy link
Copy Markdown
Collaborator

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 27, 2026

Codecov Report

❌ Patch coverage is 0% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 65.17%. Comparing base (e06f337) to head (0b74bf4).

Files with missing lines Patch % Lines
crates/cdk-axum/src/cache/config.rs 0.00% 3 Missing ⚠️
crates/cdk-axum/src/cache/mod.rs 0.00% 3 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1936   +/-   ##
=======================================
  Coverage   65.16%   65.17%           
=======================================
  Files         330      330           
  Lines       56742    56734    -8     
=======================================
  Hits        36976    36976           
+ Misses      19766    19758    -8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread crates/cdk-axum/src/cache/config.rs Outdated
Comment thread crates/cdk-axum/src/cache/config.rs Outdated
Copy link
Copy Markdown
Collaborator

@asmogo asmogo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this contribution. Do you think we can reuse the redis cluster connection instead of creating a new connection for every set / get call on the cluster instance?

Comment thread crates/cdk-axum/src/cache/backend/redis.rs Outdated
@thesimplekid
Copy link
Copy Markdown
Collaborator

Copy link
Copy Markdown
Collaborator

@thesimplekid thesimplekid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments and can you squash commits.

Comment thread crates/cdk-axum/src/cache/mod.rs Outdated
Comment thread crates/cdk-axum/src/cache/backend/redis.rs Outdated
Comment thread crates/cdk-axum/Cargo.toml Outdated
Comment thread crates/cdk-axum/Cargo.toml Outdated
Comment thread crates/cdk-axum/src/cache/mod.rs Outdated
@github-project-automation github-project-automation Bot moved this from Backlog to In progress in CDK May 13, 2026
@GEET3001 GEET3001 force-pushed the redis-cluster branch 2 times, most recently from 86d5220 to 9042f8a Compare May 14, 2026 05:55
@thesimplekid
Copy link
Copy Markdown
Collaborator

Code looks good. Waiting on testing and approval from @asmogo

@GEET3001
Copy link
Copy Markdown
Author

Code looks good. Waiting on testing and approval from @asmogo

any changes ?

- Add redis cluster client configuration logic
- Refactor redis configuration into environment parser
- Update redis dependencies to use ConnectionManager and exclude default features
@asmogo
Copy link
Copy Markdown
Collaborator

asmogo commented May 14, 2026

I got some panics with your current implementation on the mintd startup.

requesting some changes here.

GEET3001#1

@thesimplekid
Copy link
Copy Markdown
Collaborator

Panics stem from the use of block on.


title: "Panics in HttpCache initialization due to nested block_on calls"
notify: true
status: confirmed
severity: high
target: cashubtc/cdk
pr: 1936
skills_used: ["rust-security", "cashu"]
findings_count: 1

Summary

The changes introduced in PR #1936 add support for Redis clusters as a caching backend. However, the PR introduces a fatal regression where it initializes Redis clients and connections using tokio::runtime::Handle::current().block_on(...) synchronously inside HttpCache::from(config).

HttpCache::from is invoked inside configure_cache and start_services_with_shutdown in crates/cdk-mintd/src/lib.rs. These functions are part of the application's async startup flow, originating from cdk_mintd::run_mintd, which itself is executed inside an outer rt.block_on(async { ... }) in crates/cdk-mintd/src/main.rs. Because Tokio explicitly forbids calling Handle::block_on from within a thread that is already executing an active Tokio runtime, the .block_on(...) calls added in crates/cdk-axum/src/cache/mod.rs will inevitably panic and crash the cdk-mintd service upon startup if Redis is configured.

Skills Used

  • rust-security: Applied Rust and Tokio concurrency knowledge to identify that nesting block_on inside an already active Tokio thread leads to immediate panics ("Cannot start a runtime from within a runtime").
  • cashu: Analyzed the cdk-mintd caching startup logic where mint settings for supported tokens dictate caching behavior during service instantiation.

Root Cause

In crates/cdk-axum/src/cache/mod.rs (lines 114 and 143), the PR introduces synchronous blocks inside the From<config::Config> implementation for HttpCache:

match tokio::runtime::Handle::current().block_on(cluster_client.get_async_connection())
// ...
match tokio::runtime::Handle::current().block_on(redis::aio::ConnectionManager::new(single_client))

This synchronization happens via .into() when configuring the mint. In crates/cdk-mintd/src/lib.rs, configure_cache and start_services_with_shutdown invoke this exactly:

let cache: HttpCache = settings.info.http_cache.clone().into();

Both of these functions execute during the application startup process inside run_mintd_with_shutdown, which runs within the main Tokio runtime initialized in crates/cdk-mintd/src/main.rs:

    rt.block_on(async {
        // ...
        cdk_mintd::run_mintd(
            // ...
        ).await
    })

Because Handle::current().block_on is called on the current handle while already executing inside an active tokio worker thread (from the outer rt.block_on(async { ... })), Tokio will detect this and panic immediately with the error "Cannot start a runtime from within a runtime".

Attack Steps

  1. The administrator of the cashu mint updates to the version containing this PR.
  2. The administrator enables Redis caching in their configuration (setting backend to redis and specifying a valid connection string).
  3. The server is restarted, executing cdk-mintd.
  4. run_mintd is called inside rt.block_on(async { ... }).
  5. During startup, configure_mint_builder runs and calls configure_cache.
  6. configure_cache calls settings.info.http_cache.clone().into(), which triggers HttpCache::from.
  7. Inside HttpCache::from, Tokio's tokio::runtime::Handle::current().block_on(...) is executed to establish the Redis connection.
  8. Tokio immediately panics because block_on cannot be called from an async context inside an active runtime worker thread. The application crashes.
  9. The caching configuration behaves as an immediate Denial of Service killswitch, preventing the Mint service from successfully coming online.

Impact

This introduces a critical Denial of Service (startup crash) resulting directly from legitimate configurations. Any mint operator updating to use the Redis or Redis Cluster caching backend will face an immediate, persistent crash loop. The application will unconditionally panic and terminate on startup if Redis configuration is present.

Proof of Concept

A simple Rust test reproduces the exact Tokio panic occurring in the application:

fn main() {
    // 1. Emulate what `src/main.rs` does
    let rt = tokio::runtime::Runtime::new().unwrap();
    
    rt.block_on(async {
        // Simulates being within `run_mintd` and calling HttpCache::from()
        let handle = tokio::runtime::Handle::current();
        
        // Simulates the HttpCache::from call added in the PR
        // which naively uses `Handle::current().block_on(...)`
        handle.block_on(async {
            println!("This will not print, tokio will panic first.");
        });
    });
}

Output:

thread 'main' panicked at 'Cannot start a runtime from within a runtime. This happens because a function (like `block_on`) attempted to block the current thread while the thread is being used to drive async tasks.'

@thesimplekid
Copy link
Copy Markdown
Collaborator

@GEET3001 This paniced pretty early, did you run or test this code yourself?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In progress

Development

Successfully merging this pull request may close these issues.

Redis cluster

4 participants