Summary
We're integrating rgb-lib into a Lightning node application (RLN) that uses tokio as its async runtime. We've encountered a deadlock that occurs when running with a single tokio worker thread but works fine with multiple threads. After investigation, we believe the root cause lies in how rgb-lib handles async operations internally.
The Problem
Our test getchannelid_success hangs indefinitely when run with:
#[tokio::test(flavor = "multi_thread", worker_threads = 1)]
But passes when run with worker_threads = 3 or higher.
This is a classic symptom of a blocking operation starving the async runtime.
Root Cause Analysis
1. Async Runtime Mismatch
rgb-lib uses async-std (deprecated) as its sea-orm runtime:
# rgb-lib Cargo.toml
sea-orm = { version = "1.1.12", default-features = false, features = [
"macros",
"runtime-async-std-rustls", # async-std runtime
"sqlx-sqlite",
"with-json",
] }
Our application uses tokio:
# Our Cargo.toml
sea-orm = { version = "1", features = ["sqlx-sqlite", "runtime-tokio-rustls", "macros"] }
When we call rgb-lib methods from our tokio runtime, rgb-lib's internal block_on() calls create a separate async-std executor that competes with tokio for the thread. Given async-std is deprecated, consider moving to runtime-tokio-rustls. I already tested rgb-lib with tokio runtime locally and all test pass. I only changed the Cargo.toml file to use tokio runtime. Please check if further investigation needs to be done to make the change.
2. Pervasive block_on() Usage
Looking at rgb-lib/src/database/mod.rs, every database operation wraps async sea-orm calls with futures::executor::block_on():
pub(crate) fn set_asset(&self, asset: DbAssetActMod) -> Result<i32, InternalError> {
let res = block_on(Asset::insert(asset).exec(self.get_connection()))?;
Ok(res.last_insert_id)
}
pub(crate) fn get_asset(&self, asset_id: String) -> Result<Option<DbAsset>, InternalError> {
Ok(block_on(
Asset::find()
.filter(asset::Column::Id.eq(asset_id))
.one(self.get_connection()),
)?)
}
There are 48+ instances of this pattern across all database operations.
3. Single Connection Pool
In rgb-lib/src/wallet/offline.rs, the connection pool is configured with only 1 connection:
let mut opt = ConnectOptions::new(connection_string);
opt.max_connections(1) // Only 1 connection allowed
.min_connections(0)
.connect_timeout(Duration::from_secs(8))
.idle_timeout(Duration::from_secs(8))
.max_lifetime(Duration::from_secs(8));
let db_cnn = block_on(Database::connect(opt));
How the Deadlock Happens
With a single tokio worker thread:
[Tokio Worker Thread]
|
+-- HTTP request arrives (e.g., /openchannel)
|
+-- We call spawn_blocking(|| rgb_wallet.some_method())
| |
| +-- rgb-lib method calls block_on(sea_orm_query)
| |
| +-- futures::executor blocks the thread
| | waiting for async-std future
| |
| +-- SQLite connection pool has only 1 slot
|
+-- Meanwhile, another task also needs the database
| |
| +-- Queued, waiting for the single connection
|
+-- DEADLOCK: The single thread is blocked, connection is held,
nothing can make progress
With 3+ worker threads, other threads can continue processing while one is blocked, avoiding the deadlock.
Workaround We're Using
We wrap all rgb-lib calls in tokio::task::spawn_blocking():
let result = tokio::task::spawn_blocking(move || {
rgb_wallet.some_method()
}).await?;
This moves the blocking work to tokio's blocking thread pool, but it's not a complete solution because:
- It adds latency and complexity
- The single connection pool still creates bottlenecks
- Nested blocking can still cause issues. For instance the getchannelid_success test, does not pass with 1 worker_thread, needs minimum of 3 (Still trying to find where the exact deadlock is happening)
Proposed Solution
Expose an Async API and move to tokio runtime.
Add async versions of wallet methods that callers can await directly:
// Keep existing sync API for backwards compatibility
pub fn get_asset(&self, asset_id: String) -> Result<Option<DbAsset>, InternalError> {
block_on(self.get_asset_async(asset_id))
}
// New async API
pub async fn get_asset_async(&self, asset_id: String) -> Result<Option<DbAsset>, InternalError> {
Ok(Asset::find()
.filter(asset::Column::Id.eq(asset_id))
.one(self.get_connection())
.await?)
}
This would let tokio applications await rgb-lib operations natively without runtime conflicts.
(Optional): Increase Connection Pool Size
At minimum, increasing the pool size would reduce contention:
opt.max_connections(5) // Instead of 1
.min_connections(1) // Keep at least 1 connection warm
Questions for Discussion
- Is there a reason for the single-connection pool limit? Would increasing it cause issues with SQLite?
- Would the rgb-lib team be open to exposing async APIs? We'd be happy to contribute a PR.
Environment Details
- rgb-lib version: 0.3.0-beta.4
- sea-orm version: 1.1.19 with tokio runtime
- Database: SQLite
Happy to provide more details or test any proposed fixes. Thanks for your time!
Summary
We're integrating rgb-lib into a Lightning node application (RLN) that uses tokio as its async runtime. We've encountered a deadlock that occurs when running with a single tokio worker thread but works fine with multiple threads. After investigation, we believe the root cause lies in how rgb-lib handles async operations internally.
The Problem
Our test
getchannelid_successhangs indefinitely when run with:But passes when run with
worker_threads = 3or higher.This is a classic symptom of a blocking operation starving the async runtime.
Root Cause Analysis
1. Async Runtime Mismatch
rgb-lib uses async-std (deprecated) as its sea-orm runtime:
Our application uses tokio:
When we call rgb-lib methods from our tokio runtime, rgb-lib's internal
block_on()calls create a separate async-std executor that competes with tokio for the thread. Given async-std is deprecated, consider moving to runtime-tokio-rustls. I already tested rgb-lib with tokio runtime locally and all test pass. I only changed the Cargo.toml file to use tokio runtime. Please check if further investigation needs to be done to make the change.2. Pervasive
block_on()UsageLooking at
rgb-lib/src/database/mod.rs, every database operation wraps async sea-orm calls withfutures::executor::block_on():There are 48+ instances of this pattern across all database operations.
3. Single Connection Pool
In
rgb-lib/src/wallet/offline.rs, the connection pool is configured with only 1 connection:How the Deadlock Happens
With a single tokio worker thread:
With 3+ worker threads, other threads can continue processing while one is blocked, avoiding the deadlock.
Workaround We're Using
We wrap all rgb-lib calls in
tokio::task::spawn_blocking():This moves the blocking work to tokio's blocking thread pool, but it's not a complete solution because:
Proposed Solution
Expose an Async API and move to tokio runtime.
Add async versions of wallet methods that callers can await directly:
This would let tokio applications await rgb-lib operations natively without runtime conflicts.
(Optional): Increase Connection Pool Size
At minimum, increasing the pool size would reduce contention:
Questions for Discussion
Environment Details
Happy to provide more details or test any proposed fixes. Thanks for your time!