Skip to content

Add metrics for failed topic load operation #18963

@codelipenghui

Description

@codelipenghui

Search before asking

  • I searched in the issues and found nothing similar.

Motivation

Currently, we have topic load-related metrics like the followings:

topic_load_times{cluster="standalone",quantile="0.5"} 140.0
topic_load_times{cluster="standalone",quantile="0.75"} 183.0
topic_load_times{cluster="standalone",quantile="0.95"} 249.0
topic_load_times{cluster="standalone",quantile="0.99"} 249.0
topic_load_times{cluster="standalone",quantile="0.999"} 249.0
topic_load_times{cluster="standalone",quantile="0.9999"} 249.0
topic_load_times_count{cluster="standalone"} 6.0
topic_load_times_sum{cluster="standalone"} 955.0
topic_load_times_created{cluster="standalone"} 1.671240308864E9

But we are not able to detect if there are topics that failed to load due to
zookeeper/bookkeeper problems.

It's better to add new metrics for the topic load failed operation so that users
can add alerts based on the metrics.

Solution

Add topic_load_failed_count metrics

Alternatives

No response

Anything else?

The metrics changes requires a proposal

Are you willing to submit a PR?

  • I'm willing to submit a PR!

Metadata

Metadata

Assignees

Labels

Staletype/enhancementThe enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions