Skip to content

Persistence trait#1902

Closed
average-gary wants to merge 39 commits into
stratum-mining:mainfrom
average-gary:persistence-trait
Closed

Persistence trait#1902
average-gary wants to merge 39 commits into
stratum-mining:mainfrom
average-gary:persistence-trait

Conversation

@average-gary
Copy link
Copy Markdown
Contributor

@average-gary average-gary commented Sep 23, 2025

After digging into the ShareAccounting more, I noticed there were calls within the hot path for mining messages. Notably, validate_share.

Since this validation and mining messages in general are on the critical path, I don't think #1866 with Results for ShareAccounting functions is the correct approach.

In this approach, the Channel constructor would pass in a Sender for the Channel to use to pass data through the a Persistence implementation that is then responsible for handling it's own threads, state, errors, etc.

@plebhash
Copy link
Copy Markdown
Member

did you try this suggestion? #1866 (comment)

@average-gary
Copy link
Copy Markdown
Contributor Author

did you try this suggestion? #1866 (comment)

Yes, and if any of the hot path calls return and error, it will need to be handled and bubbled up or ignored. In a case where the function returns early for a Persistence failure, I don't believe we'd want to return early from the function. This would also mean expanding:

pub enum ShareValidationError {
    Invalid,
    Stale,
    InvalidJobId,
    DoesNotMeetTarget,
    VersionRollingNotAllowed,
    DuplicateShare,
    NoChainTip,
}

All these errors are specific to the validation of the shares and it seemed inappropriate to add Persistence errors since it broadens the scope of this function's returned type beyond what the this function actually does.

Without refactoring this critical piece of code, I saw expanding to use Results for ShareAccounting becoming very messy since we want infallible code in the hot path of validating a Share or Block.
protocols/v2/channels-sv2/src/client/extended.rs

    pub fn validate_share(
        &mut self,
        share: SubmitSharesExtended,
    ) -> Result<ShareValidationResult, ShareValidationError> {
        // ........other validation logic......
        // check if a block was found
        if network_target.is_met_by(hash) {
            self.share_accounting.update_share_accounting(
                target_to_difficulty(self.target.clone()) as u64,
                share.sequence_number,
                hash.to_raw_hash(),
            );
            return Ok(ShareValidationResult::BlockFound);
        }

        // check if the share hash meets the channel target
        if hash_as_target < self.target {\
            if self.share_accounting.is_share_seen(hash.to_raw_hash()) {
                return Err(ShareValidationError::DuplicateShare);
            }

            self.share_accounting.update_share_accounting(
                target_to_difficulty(self.target.clone()) as u64,
                share.sequence_number,
                hash.to_raw_hash(),
            );

            // update the best diff
            self.share_accounting.update_best_diff(hash_as_diff);

            return Ok(ShareValidationResult::Valid);
        }

Additionally, anything other than in-memory ShareAccounting has the chance to introduce latency, which I don't think is desirable for processing shares/blocks.

@average-gary
Copy link
Copy Markdown
Contributor Author

FWIW, I want to create an example Persistence implementation just writing to a structured file before removing the Draft tag. Feedback on the approach seemed prudent before doing so.

@average-gary average-gary force-pushed the persistence-trait branch 4 times, most recently from 7660a13 to b20fc3a Compare September 25, 2025 13:23
@average-gary
Copy link
Copy Markdown
Contributor Author

The latest push implements a simple file writer Persistence which outputs a file.

ShareAccepted: channel_id: 1, share_work: 1, share_sequence_number: 1, share_hash: 00000000026fd2491f671027c9f92f95edcfd487b9ab313e3c8bfee782a95233, total_shares_accepted: 1, total_share_work_sum: 1, timestamp: SystemTime { tv_sec: 1758806242, tv_nsec: 877614000 }
BestDifficultyUpdated: channel_id: 1, new_best_diff: 105.05410207283688, previous_best_diff: 0, timestamp: SystemTime { tv_sec: 1758806242, tv_nsec: 877618000 }
ShareAccepted: channel_id: 1, share_work: 1, share_sequence_number: 2, share_hash: 000000005e15aca4e127fe29c208d802b4cc7b2d8c30d226b56051e287cef015, total_shares_accepted: 2, total_share_work_sum: 2, timestamp: SystemTime { tv_sec: 1758806250, tv_nsec: 256314000 }

Of note, I think the Persistence implementation here could use the status_tx channel for errors. I'm not sure if SRI would want this basic implementation as part of the repo but I can improve on the error handling if so.

Comment thread protocols/v2/channels-sv2/src/persistence.rs Outdated
Comment thread protocols/v2/channels-sv2/src/persistence.rs Outdated
Comment thread protocols/v2/channels-sv2/src/persistence.rs Outdated
Comment thread protocols/v2/channels-sv2/src/server/extended.rs Outdated
Comment thread protocols/v2/channels-sv2/src/server/extended.rs Outdated
Comment thread protocols/v2/channels-sv2/Cargo.toml Outdated
Comment thread protocols/v2/channels-sv2/src/client/extended.rs Outdated
@plebhash
Copy link
Copy Markdown
Member

I think there was meaningful progress here, but it still has a lot of unnecessary complexity that should be cleaned up

also not sure why we have #1916 making changes that were are correlated to this PR, while both are draft?

I mean, there are scenarios where breaking things down into different PRs do make sense, but I don't really understand the rationale here and overall it makes the review process unnecessarily confusing and convoluted

@average-gary
Copy link
Copy Markdown
Contributor Author

I think there was meaningful progress here, but it still has a lot of unnecessary complexity that should be cleaned up

ACK, thank you. Will address.

I mean, there are scenarios where breaking things down into different PRs do make sense, but I don't really understand the rationale here and overall it makes the review process unnecessarily confusing and convoluted

Will consolidate into one draft, I just wanted to keep things separate until I knew features are desired by SRI.

@average-gary average-gary force-pushed the persistence-trait branch 4 times, most recently from 06ee15c to bc58320 Compare October 20, 2025 18:24
@average-gary average-gary marked this pull request as ready for review October 21, 2025 14:42
@average-gary average-gary requested a review from plebhash October 21, 2025 16:56
Comment thread protocols/v2/channels-sv2/src/client/extended.rs Outdated
Comment thread protocols/v2/channels-sv2/src/server/extended.rs Outdated
Comment thread protocols/v2/channels-sv2/src/server/extended.rs Outdated
Comment thread roles/jd-server/src/lib/config.rs
Comment thread protocols/v2/channels-sv2/src/client/standard.rs Outdated
- Add optional share_persistence_file_path configuration in pool config examples
- Include comments explaining the purpose and usage of share persistence logging
- Replace `NoPersistence` with generic `Persistence<TestPersistence>` in test cases
- Add `TestPersistence` struct implementing `PersistenceHandler` trait
- Remove `status_tx` parameter from `ShareFileHandler::new()` method
- Remove status sender from `ShareFileHandler` struct
- Simplify share persistence error handling

This refactoring removes the direct status communication from the share persistence module, improving separation of concerns and reducing complexity.
- Relocate share persistence implementation from pool role to stratum-apps
- Remove local share_persistence module from pool role
- Update import paths in channel_manager modules
- Add new share_persistence module to stratum-apps with existing implementation
- Improve modularity and reusability of share persistence code
- Prepare for potential cross-role share persistence usage
- Change share work sum type from u64 to f64 for more precise difficulty tracking
- Remove unnecessary type casting of difficulty to u64 in share accounting methods
- Update share accounting methods to use floating-point difficulty values
- Remove unused NoPersistence import in multiple files
- Simplify share accounting event persistence tracking
- Improve comments describing share work and difficulty calculations
This refactoring improves the precision of share difficulty tracking and simplifies the share accounting implementation across multiple Stratum V2 protocol components.
let receiver = share_file_handler.get_receiver();

// Spawn the share file handler task
task_manager.spawn(async move {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During review, I identified a potential data loss issue with share persistence during pool shutdown.

Current Behavior

The share persistence task runs in the background with a 1024-event buffer. During Pool shutdown:

(roles/pool/src/lib/mod.rs:165-167):

  task_manager.abort_all().await;   // Immediately aborts all tasks
  task_manager.join_all().await;    // Waits for aborted tasks

Any pending share events in the channel buffer are lost when the task is aborted.

Impact

  • When: Only during pool shutdown (Ctrl+C, crashes)
  • What's lost: Up to 1024 pending share events
  • Severity: Depends on use case - critical if used for payment calculation, minor if just for auditing

Possible Solution

I think this can be accomplished in a follow up PR as follows:

Store the persistence task handle separately and add graceful shutdown:

  1. Stop accepting new mining work
  2. Drop persistence field to signal channel closure
  3. Wait for persistence task to drain its queue naturally
  4. Then abort remaining tasks

This ensures all shares are persisted before shutdown completes.

Alternatively, we can just document for implementers as this is an application side consideration/trade off and not protocol specific.

@average-gary
Copy link
Copy Markdown
Contributor Author

average-gary commented Oct 23, 2025

The fmt check is failing but I don't see that issue in the current commit

Tested with BitAxe locally.

@GitGab19
Copy link
Copy Markdown
Member

GitGab19 commented Oct 23, 2025

The fmt check is failing but I don't see that issue in the current commit

Tested with BitAxe locally.

On our CI we use nightly for fmt.

Try by running cargo +nightly fmt locally, I just did it locally and it fixed the error you're getting on CI.

Comment thread protocols/v2/channels-sv2/src/server/share_accounting.rs
@Shourya742
Copy link
Copy Markdown
Member

@average-gary, wouldn’t it make more sense to handle this event logic at the application layer instead of coupling channel_sv2 with the persistence trait? Channel_sv2 already exposes the necessary APIs that the application layer can use to record relevant events into a storage engine.

@average-gary
Copy link
Copy Markdown
Contributor Author

@average-gary, wouldn’t it make more sense to handle this event logic at the application layer instead of coupling channel_sv2 with the persistence trait? Channel_sv2 already exposes the necessary APIs that the application layer can use to record relevant events into a storage engine.

@Shourya742 What APIs are you referring to? I can get the ShareAccounting using Standard or Extended Channel but the ShareAccouting only surfaces summary data. I didn't see a way to access the complete context of a share outside of directly in the channel validation logic but I could have missed a different API.
Screenshot 2025-10-24 at 08 40 47

Perhaps if validate_share() returned more than an enum of the result, the returned data could then be used at the application layer. Definitely open to ideas.

@average-gary
Copy link
Copy Markdown
Contributor Author

This PR becomes an application concern when #1966 lands. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants