Skip to content

Async Runtime Deadlock Issue with rgb-lib and Tokio Integration #66

@dcorral

Description

@dcorral

Summary

We're integrating rgb-lib into a Lightning node application (RLN) that uses tokio as its async runtime. We've encountered a deadlock that occurs when running with a single tokio worker thread but works fine with multiple threads. After investigation, we believe the root cause lies in how rgb-lib handles async operations internally.

The Problem

Our test getchannelid_success hangs indefinitely when run with:

#[tokio::test(flavor = "multi_thread", worker_threads = 1)]

But passes when run with worker_threads = 3 or higher.

This is a classic symptom of a blocking operation starving the async runtime.

Root Cause Analysis

1. Async Runtime Mismatch

rgb-lib uses async-std (deprecated) as its sea-orm runtime:

# rgb-lib Cargo.toml
sea-orm = { version = "1.1.12", default-features = false, features = [
    "macros",
    "runtime-async-std-rustls",  # async-std runtime
    "sqlx-sqlite",
    "with-json",
] }

Our application uses tokio:

# Our Cargo.toml
sea-orm = { version = "1", features = ["sqlx-sqlite", "runtime-tokio-rustls", "macros"] }

When we call rgb-lib methods from our tokio runtime, rgb-lib's internal block_on() calls create a separate async-std executor that competes with tokio for the thread. Given async-std is deprecated, consider moving to runtime-tokio-rustls. I already tested rgb-lib with tokio runtime locally and all test pass. I only changed the Cargo.toml file to use tokio runtime. Please check if further investigation needs to be done to make the change.

2. Pervasive block_on() Usage

Looking at rgb-lib/src/database/mod.rs, every database operation wraps async sea-orm calls with futures::executor::block_on():

pub(crate) fn set_asset(&self, asset: DbAssetActMod) -> Result<i32, InternalError> {
    let res = block_on(Asset::insert(asset).exec(self.get_connection()))?;
    Ok(res.last_insert_id)
}

pub(crate) fn get_asset(&self, asset_id: String) -> Result<Option<DbAsset>, InternalError> {
    Ok(block_on(
        Asset::find()
            .filter(asset::Column::Id.eq(asset_id))
            .one(self.get_connection()),
    )?)
}

There are 48+ instances of this pattern across all database operations.

3. Single Connection Pool

In rgb-lib/src/wallet/offline.rs, the connection pool is configured with only 1 connection:

let mut opt = ConnectOptions::new(connection_string);
opt.max_connections(1)      // Only 1 connection allowed
    .min_connections(0)
    .connect_timeout(Duration::from_secs(8))
    .idle_timeout(Duration::from_secs(8))
    .max_lifetime(Duration::from_secs(8));
let db_cnn = block_on(Database::connect(opt));

How the Deadlock Happens

With a single tokio worker thread:

[Tokio Worker Thread]
    |
    +-- HTTP request arrives (e.g., /openchannel)
    |
    +-- We call spawn_blocking(|| rgb_wallet.some_method())
    |       |
    |       +-- rgb-lib method calls block_on(sea_orm_query)
    |           |
    |           +-- futures::executor blocks the thread
    |           |   waiting for async-std future
    |           |
    |           +-- SQLite connection pool has only 1 slot
    |
    +-- Meanwhile, another task also needs the database
    |       |
    |       +-- Queued, waiting for the single connection
    |
    +-- DEADLOCK: The single thread is blocked, connection is held,
                  nothing can make progress

With 3+ worker threads, other threads can continue processing while one is blocked, avoiding the deadlock.

Workaround We're Using

We wrap all rgb-lib calls in tokio::task::spawn_blocking():

let result = tokio::task::spawn_blocking(move || {
    rgb_wallet.some_method()
}).await?;

This moves the blocking work to tokio's blocking thread pool, but it's not a complete solution because:

  • It adds latency and complexity
  • The single connection pool still creates bottlenecks
  • Nested blocking can still cause issues. For instance the getchannelid_success test, does not pass with 1 worker_thread, needs minimum of 3 (Still trying to find where the exact deadlock is happening)

Proposed Solution

Expose an Async API and move to tokio runtime.

Add async versions of wallet methods that callers can await directly:

// Keep existing sync API for backwards compatibility
pub fn get_asset(&self, asset_id: String) -> Result<Option<DbAsset>, InternalError> {
    block_on(self.get_asset_async(asset_id))
}

// New async API
pub async fn get_asset_async(&self, asset_id: String) -> Result<Option<DbAsset>, InternalError> {
    Ok(Asset::find()
        .filter(asset::Column::Id.eq(asset_id))
        .one(self.get_connection())
        .await?)
}

This would let tokio applications await rgb-lib operations natively without runtime conflicts.

(Optional): Increase Connection Pool Size

At minimum, increasing the pool size would reduce contention:

opt.max_connections(5)  // Instead of 1
    .min_connections(1)  // Keep at least 1 connection warm

Questions for Discussion

  1. Is there a reason for the single-connection pool limit? Would increasing it cause issues with SQLite?
  2. Would the rgb-lib team be open to exposing async APIs? We'd be happy to contribute a PR.

Environment Details

  • rgb-lib version: 0.3.0-beta.4
  • sea-orm version: 1.1.19 with tokio runtime
  • Database: SQLite

Happy to provide more details or test any proposed fixes. Thanks for your time!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions