Skip to content

Conversation

@sixpluszero
Copy link
Contributor

[server][dvc] Add blob transfer send/receive byte rate metric at both version level and host level

This PR adds rate metric for blob transfer send and receive activity for more observability in the ingestion path.

It will give host level send/receive byte rate, as well as version level send/receive byte rate. With this, we can easily monitor how system is doing the blob transfer from/to other peer nodes.

Solution

Code changes

  • Added new code behind a config. If so list the config names and their default values in the PR description.
  • Introduced new log lines.
    • Confirmed if logs need to be rate limited to avoid excessive logging.

Concurrency-Specific Checks

Both reviewer and PR author to verify

  • Code has no race conditions or thread safety issues.
  • Proper synchronization mechanisms (e.g., synchronized, RWLock) are used where needed.
  • No blocking calls inside critical sections that could lead to deadlocks or performance degradation.
  • Verified thread-safe collections are used (e.g., ConcurrentHashMap, CopyOnWriteArrayList).
  • Validated proper exception handling in multi-threaded code to avoid silent thread termination.

How was this PR tested?

  • New unit tests added.
  • New integration tests added.
  • Modified or extended existing tests.
  • Verified backward compatibility (if applicable).

Does this PR introduce any user-facing or breaking changes?

  • No. You can skip the rest of this section.
  • Yes. Clearly explain the behavior change and its impact.

@sixpluszero sixpluszero marked this pull request as draft December 9, 2025 01:41
@sixpluszero sixpluszero force-pushed the blobTransferTrafficMetric branch from 615e1b9 to ebf4cf1 Compare December 16, 2025 21:07
@sixpluszero sixpluszero force-pushed the blobTransferTrafficMetric branch from ebf4cf1 to 994aa20 Compare January 12, 2026 18:06
@sixpluszero sixpluszero marked this pull request as ready for review January 17, 2026 01:05
package com.linkedin.davinci.stats;

public class AggBlobTransferStats {
private final AggVersionedBlobTransferStats aggVersionedBlobTransferStats;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let’s add some Javadoc here. Also, could you include the Kafka stats it will be compared against?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also add doc that aggVersionedBlobTransferStats reports speed, time, and success/failure counts, and aggHostLevelIngestionStats reports bytes accumulation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah good callout, updated the java doc.


// The blob file receiving throughput (in MB/sec) and time (in sec)
protected static final String BLOB_TRANSFER_THROUGHPUT = "blob_transfer_file_receive_throughput";
protected static final String BLOB_TRANSFER_TIME = "blob_transfer_time";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

qq, do you think we need to remove the BLOB_TRANSFER_THROUGHPUT graph, I can cut a separate PR to remove it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so. But maybe let's first verify this metric is working and we can deprecate the old one

@sixpluszero sixpluszero requested a review from jingy-li January 22, 2026 00:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants