Skip to content

feat(breeze-buddy): add global pickup rate alert background task#717

Open
Devansh-1218 wants to merge 1 commit into
juspay:releasefrom
Devansh-1218:pickup-rate-alerting
Open

feat(breeze-buddy): add global pickup rate alert background task#717
Devansh-1218 wants to merge 1 commit into
juspay:releasefrom
Devansh-1218:pickup-rate-alerting

Conversation

@Devansh-1218
Copy link
Copy Markdown
Contributor

@Devansh-1218 Devansh-1218 commented Apr 20, 2026

Adds a periodic background task that monitors call pickup rates and sends Slack warning alerts when rates drop below a configurable threshold.

  • Add app/services/pickup_rate/ module with:
    • config.py: AlertConfig dataclass (enabled, interval, threshold, alert_type, scope) with per-merchant fields stubbed for Phase 2 extension
    • calculator.py: compute_pickup_rates() mirroring ScoreMonitor._get_daily_call_stats() for both call-based and lead-based rate calculation
    • monitor.py: PickupRateMonitor class with threshold evaluation, Redis-based dedup (fail-open), and inline slack_alert.send() for alerts
    • task.py: initialize_pickup_rate_tasks() following langfuse task registration pattern
  • Add 3 dynamic config getters in dynamic.py: ENABLE_PICKUP_RATE_ALERT, PICKUP_RATE_ALERT_INTERVAL_SECONDS, PICKUP_RATE_ALERT_THRESHOLD
  • Register task in main.py lifespan after Langfuse task registration

Edge cases handled: zero calls/leads skipped, Redis unavailable fails open, DB errors skip the cycle, Slack failures skip mark-alerted for retry. Lookback window derived from interval_seconds (no separate config needed). Per-merchant alerting scaffolded for Phase 2 (call_execution_config + DB filters).

DEVPROOF :
image

Summary by CodeRabbit

  • New Features
    • Added pickup rate alert system that monitors call and lead-based pickup rates.
    • Alerts are triggered when configured thresholds are exceeded, with Slack notifications.
    • Feature is runtime-configurable via Redis, allowing dynamic enable/disable and threshold adjustments without redeployment.
    • Includes duplicate alert prevention to avoid notification spam.

Copilot AI review requested due to automatic review settings April 20, 2026 07:01
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 20, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: eb9e0da6-ebfe-47d4-8d80-deed731a2acc

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review

Walkthrough

Introduces a new pickup rate alert system comprising runtime-configurable alert enablement and thresholds, a calculator computing call and lead-based pickup rates from database records, a background monitor performing threshold checks with Redis deduplication, and scheduler task initialization integrated into the FastAPI startup sequence.

Changes

Cohort / File(s) Summary
Dynamic Configuration
app/core/config/dynamic.py
Added three async Redis-backed configuration accessors: ENABLE_PICKUP_RATE_ALERT() (bool), PICKUP_RATE_ALERT_INTERVAL_SECONDS() (int), and PICKUP_RATE_ALERT_THRESHOLD() (float) with specified defaults.
Alert Configuration Model
app/services/pickup_rate/config.py
Introduced AlertConfig dataclass modeling alert enablement, evaluation cadence, thresholds, alert type, scope, and lookback window; includes __post_init__ for Redis dedup key computation and property accessor.
Pickup Rate Calculation
app/services/pickup_rate/calculator.py
Added compute_pickup_rates() async function computing call-based and lead-based pickup rates from database call trackers and lead analytics; returns metrics dictionary or None on exception.
Background Monitor
app/services/pickup_rate/monitor.py
Implemented PickupRateMonitor class with check_and_alert() method performing threshold evaluation, Redis deduplication, and Slack alert dispatch; includes helper methods for configuration loading and Redis key management.
Task Initialization & Wiring
app/services/pickup_rate/task.py, app/services/pickup_rate/__init__.py, app/main.py
Created initialize_pickup_rate_tasks() to conditionally register the monitor with the scheduler; exposed service exports via __init__.py; integrated task registration into FastAPI lifespan startup.

Sequence Diagram

sequenceDiagram
    participant FastAPI as FastAPI Startup
    participant Task as Task Initializer
    participant Config as Dynamic Config
    participant Scheduler as Background Scheduler
    participant Monitor as PickupRateMonitor
    participant DB as Database
    participant Redis as Redis
    participant Slack as Slack API

    FastAPI->>Task: initialize_pickup_rate_tasks()
    Task->>Config: ENABLE_PICKUP_RATE_ALERT()
    Config-->>Task: enabled: bool
    alt Alert Enabled
        Task->>Config: PICKUP_RATE_ALERT_INTERVAL_SECONDS()
        Config-->>Task: interval: int
        Task->>Scheduler: register_task(pickup_rate_monitor.check_and_alert, interval)
        Scheduler-->>Task: success
        Note over Scheduler: Task scheduled
    else Alert Disabled
        Task-->>FastAPI: early exit
    end
    
    Note over Scheduler: On scheduled interval...
    Scheduler->>Monitor: check_and_alert()
    Monitor->>Config: load_config()
    Config-->>Monitor: AlertConfig
    Monitor->>DB: fetch call trackers & lead analytics
    DB-->>Monitor: metrics data
    Monitor->>Monitor: compute pickup rates
    alt Threshold Breached
        Monitor->>Redis: check_dedup_key()
        Redis-->>Monitor: exists?
        alt No Recent Alert
            Monitor->>Slack: send_alert(payload)
            Slack-->>Monitor: success
            Monitor->>Redis: set_dedup_key(TTL)
            Redis-->>Monitor: done
        else Recent Alert Exists
            Note over Monitor: Skip (deduplicated)
        end
    else Below Threshold
        Note over Monitor: No alert needed
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 Hop hop, the alerts now hop so fast,
With Redis dedup, no spam shall last!
Calls picked, leads tracked, thresholds met,
A monitoring system, the best one yet! ✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the main change: adding a global pickup rate alert background task, which is the primary feature introduced across all modified and new files.
Docstring Coverage ✅ Passed Docstring coverage is 93.33% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new Breeze Buddy background task to monitor recent pickup rates (call-based and lead-based) and send Slack warnings when rates fall below a configurable threshold.

Changes:

  • Introduces app/services/pickup_rate/ module containing config, rate calculation, monitoring + Slack alerting, and task registration.
  • Adds dynamic config getters for enabling the alert, setting the check interval, and setting the threshold.
  • Registers the pickup-rate monitor task during app/main.py lifespan startup (after Langfuse task registration).

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
app/services/pickup_rate/task.py Registers the pickup-rate monitor task with the background scheduler using dynamic config.
app/services/pickup_rate/monitor.py Implements the periodic check, threshold evaluation, Redis dedup, and Slack alert payload construction.
app/services/pickup_rate/config.py Adds an AlertConfig dataclass (global scope now; merchant/reseller scaffolding for Phase 2).
app/services/pickup_rate/calculator.py Computes call-based and lead-based pickup rates for the configured time window.
app/services/pickup_rate/init.py Exposes the service’s public API (monitor singleton + initialization).
app/main.py Wires pickup-rate task initialization into the app lifespan background task setup.
app/core/config/dynamic.py Adds dynamic getters for enable flag, interval, and threshold for pickup-rate alerting.

Comment thread app/services/pickup_rate/task.py Outdated
Comment on lines +21 to +24
The ENABLE_PICKUP_RATE_ALERT and PICKUP_RATE_ALERT_INTERVAL_SECONDS values
are fetched from dynamic config (Redis/DevCycle) so they can be toggled
at runtime without a redeploy. SLACK_WEBHOOK_URL is static and must be
present for alerts to have any effect.
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring implies PICKUP_RATE_ALERT_INTERVAL_SECONDS can be changed at runtime without redeploy, but the scheduler interval is fixed at registration time. Changing the dynamic interval later won’t change how often the task runs until the process restarts (only enabled is effectively runtime-togglable with the current scheduler). Consider clarifying the comment or implementing re-registration / dynamic interval handling.

Suggested change
The ENABLE_PICKUP_RATE_ALERT and PICKUP_RATE_ALERT_INTERVAL_SECONDS values
are fetched from dynamic config (Redis/DevCycle) so they can be toggled
at runtime without a redeploy. SLACK_WEBHOOK_URL is static and must be
present for alerts to have any effect.
ENABLE_PICKUP_RATE_ALERT and PICKUP_RATE_ALERT_INTERVAL_SECONDS are read
from dynamic config (Redis/DevCycle) when this initializer runs. The task
is only registered if alerts are enabled, and the configured interval is
applied at registration time. Changing the interval later will not affect
an already-registered task unless it is re-registered or the process
restarts. SLACK_WEBHOOK_URL is static and must be present for alerts to
have any effect.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DONE

Comment thread app/services/pickup_rate/monitor.py Outdated

Edge-case handling:
- Zero calls / leads in window → skip (no data to evaluate)
- Redis unavailable → fail-open: log warning, allow alert to fire
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The module docstring says “Redis unavailable → fail-open: … allow alert to fire”, but the BackgroundTaskScheduler requires Redis to acquire its distributed lock (scheduler.py:_execute_task). If Redis is down, the task won’t execute at all, so alerts won’t fire. Please adjust the docstring (and/or the PR description) to reflect the actual behavior, or change the scheduler/locking strategy if fail-open execution is a requirement.

Suggested change
- Redis unavailablefail-open: log warning, allow alert to fire
- Redis unavailablelocal dedup/mark-alerted operations fail open
(log warning / continue), but scheduled execution
still depends on BackgroundTaskScheduler Redis locking

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DONE

Comment on lines +46 to +47
start_date = now - timedelta(hours=config.lookback_hours)

Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description says the lookback window is derived from interval_seconds, but the code uses config.lookback_hours (defaulting to 24) and load_config() never sets it from the interval. Either derive lookback_hours from the interval in load_config() or update the PR description/config docs to match the implemented fixed 24h window.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DONE

Comment on lines +54 to +90
try:
# ------------------------------------------------------------------
# 1. Call-based metrics
# ------------------------------------------------------------------
# Phase 2 NOTE: pass reseller_id / merchant_id to the accessor once
# those optional params are supported by get_all_lead_call_trackers().
call_trackers = await get_all_lead_call_trackers(
start_date=start_date,
end_date=end_date,
)

calls_attempted = 0
calls_no_answer = 0

if call_trackers:
for tracker, _provider in call_trackers:
if tracker.status and tracker.status.value == "FINISHED":
calls_attempted += 1
outcome_value = tracker.outcome if tracker.outcome else None
if outcome_value == "NO_ANSWER":
calls_no_answer += 1

calls_picked = calls_attempted - calls_no_answer
call_pickup_rate = (
round(calls_picked / calls_attempted * 100, 1)
if calls_attempted > 0
else 0.0
)

# ------------------------------------------------------------------
# 2. Lead-based metrics
# ------------------------------------------------------------------
# Phase 2 NOTE: pass reseller_id / merchant_id once supported.
lead_data = await get_lead_based_analytics(
start_date=start_date,
end_date=end_date,
)
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compute_pickup_rates() documents returning None on DB error, but the underlying accessors (get_all_lead_call_trackers / get_lead_based_analytics) catch exceptions and return [] on failure (see app/database/accessor/breeze_buddy/lead_call_tracker.py). That means DB failures will be indistinguishable from “no data”, and the monitor will silently skip alerting rather than treating it as an error. Consider either letting the accessors propagate errors / return None on failure, or updating the docs + monitor logic to reflect the current behavior.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DONE

Comment thread app/services/pickup_rate/calculator.py Outdated

calls_picked = calls_attempted - calls_no_answer
call_pickup_rate = (
round(calls_picked / calls_attempted * 100, 1)
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pickup rates are rounded to 1 decimal before being used for threshold comparisons. This can change alert behavior around the threshold boundary (e.g., 39.96% rounds to 40.0% and won’t alert for a 40% threshold). Prefer keeping the unrounded float for evaluation and only rounding/formatting when rendering logs/Slack text.

Suggested change
round(calls_picked / calls_attempted * 100, 1)
calls_picked / calls_attempted * 100

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DONE

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
app/services/pickup_rate/monitor.py (2)

94-96: Redundant second datetime.now(timezone.utc).

now is already computed at line 45; recomputing now_display a few ms later just for the date string is pointless and can (at window boundaries) disagree with the now used for the lookback window. Reuse now.

-        now_display = datetime.now(timezone.utc)
-        display_date = now_display.strftime("%d-%m-%y")
+        display_date = now.strftime("%d-%m-%y")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/services/pickup_rate/monitor.py` around lines 94 - 96, The code
recomputes datetime.now(timezone.utc) into now_display before building
display_date/title, which can diverge from the earlier now used for the
lookback; replace use of now_display with the already-computed now (i.e., remove
now_display and set display_date = now.strftime("%d-%m-%y") and use that for
title) so the alert date uses the same timestamp as the rest of the function
(referencing the variables now, display_date, title and the call
datetime.now(timezone.utc) to remove).

242-264: Unused parameter interval_seconds in _should_alert.

interval_seconds is never referenced in the body — dedup is driven entirely by the TTL set in _mark_alerted. Drop the parameter (and the corresponding argument at line 83) to avoid giving the impression it's used.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/services/pickup_rate/monitor.py` around lines 242 - 264, The
_should_alert function currently accepts an unused parameter interval_seconds;
remove interval_seconds from the _should_alert signature and update all call
sites that pass that argument (the caller that currently supplies
interval_seconds) to call _should_alert with only redis_key, and ensure any
references to interval_seconds inside or related logic remain handled by
_mark_alerted's TTL logic; specifically modify function _should_alert and its
invocation(s) so dedup is driven solely by the Redis key existence check and no
unused parameter is left.
app/services/pickup_rate/task.py (1)

32-42: Runtime enablement requires a restart.

ENABLE_PICKUP_RATE_ALERT is read only at startup; if the flag is flipped from falsetrue in Redis later, the task will not get registered until the next deploy/restart. The monitor's own check_and_alert already re-checks config.enabled per tick, so consider always registering the task when SLACK_WEBHOOK_URL is configured and letting the monitor's runtime check gate execution. This makes the dynamic flag behave truly "dynamic" in both directions.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/services/pickup_rate/task.py` around lines 32 - 42, The startup-only
check of ENABLE_PICKUP_RATE_ALERT prevents runtime toggling; change task
registration in pickup_rate.task to not gate registration on
ENABLE_PICKUP_RATE_ALERT — only require SLACK_WEBHOOK_URL to be present to
register the PickupRateMonitor task — and rely on
PickupRateMonitor.check_and_alert to call/read ENABLE_PICKUP_RATE_ALERT (or
config.enabled) on each tick to decide whether to run alerts; locate the
registration logic that references ENABLE_PICKUP_RATE_ALERT and
SLACK_WEBHOOK_URL and remove the early return based on ENABLE_PICKUP_RATE_ALERT,
ensuring the monitor itself performs the runtime check.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@app/services/pickup_rate/calculator.py`:
- Around line 65-81: calls_no_answer is counted across all trackers while
calls_attempted only counts FINISHED trackers, which can make calls_picked
negative; update the loop in the pickup rate calculation to only increment
calls_no_answer when tracker.status and tracker.status.value == "FINISHED" (same
filter used for calls_attempted) so both counters operate over the same set;
adjust the outcome check (outcome_value == "NO_ANSWER") to be nested under the
FINISHED-status branch and keep the rest of the logic computing calls_picked and
call_pickup_rate unchanged; apply the same fix pattern in
ScoreMonitor._get_daily_call_stats to ensure consistency.

---

Nitpick comments:
In `@app/services/pickup_rate/monitor.py`:
- Around line 94-96: The code recomputes datetime.now(timezone.utc) into
now_display before building display_date/title, which can diverge from the
earlier now used for the lookback; replace use of now_display with the
already-computed now (i.e., remove now_display and set display_date =
now.strftime("%d-%m-%y") and use that for title) so the alert date uses the same
timestamp as the rest of the function (referencing the variables now,
display_date, title and the call datetime.now(timezone.utc) to remove).
- Around line 242-264: The _should_alert function currently accepts an unused
parameter interval_seconds; remove interval_seconds from the _should_alert
signature and update all call sites that pass that argument (the caller that
currently supplies interval_seconds) to call _should_alert with only redis_key,
and ensure any references to interval_seconds inside or related logic remain
handled by _mark_alerted's TTL logic; specifically modify function _should_alert
and its invocation(s) so dedup is driven solely by the Redis key existence check
and no unused parameter is left.

In `@app/services/pickup_rate/task.py`:
- Around line 32-42: The startup-only check of ENABLE_PICKUP_RATE_ALERT prevents
runtime toggling; change task registration in pickup_rate.task to not gate
registration on ENABLE_PICKUP_RATE_ALERT — only require SLACK_WEBHOOK_URL to be
present to register the PickupRateMonitor task — and rely on
PickupRateMonitor.check_and_alert to call/read ENABLE_PICKUP_RATE_ALERT (or
config.enabled) on each tick to decide whether to run alerts; locate the
registration logic that references ENABLE_PICKUP_RATE_ALERT and
SLACK_WEBHOOK_URL and remove the early return based on ENABLE_PICKUP_RATE_ALERT,
ensuring the monitor itself performs the runtime check.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: cdd7fe43-df46-40a8-9075-baece35111e0

📥 Commits

Reviewing files that changed from the base of the PR and between 1403006 and 8727f84.

📒 Files selected for processing (7)
  • app/core/config/dynamic.py
  • app/main.py
  • app/services/pickup_rate/__init__.py
  • app/services/pickup_rate/calculator.py
  • app/services/pickup_rate/config.py
  • app/services/pickup_rate/monitor.py
  • app/services/pickup_rate/task.py

Comment thread app/services/pickup_rate/calculator.py
Adds a periodic background task that monitors call pickup rates and sends
Slack warning alerts when rates drop below a configurable threshold.

- Add app/services/pickup_rate/ module with:
  - config.py: AlertConfig dataclass (enabled, interval, threshold, alert_type, scope)
    with per-merchant fields stubbed for Phase 2 extension
  - calculator.py: compute_pickup_rates() mirroring ScoreMonitor._get_daily_call_stats()
    for both call-based and lead-based rate calculation
  - monitor.py: PickupRateMonitor class with threshold evaluation, Redis-based dedup
    (fail-open), and inline slack_alert.send() for alerts
  - task.py: initialize_pickup_rate_tasks() following langfuse task registration pattern
- Add 3 dynamic config getters in dynamic.py: ENABLE_PICKUP_RATE_ALERT,
  PICKUP_RATE_ALERT_INTERVAL_SECONDS, PICKUP_RATE_ALERT_THRESHOLD
- Register task in main.py lifespan after Langfuse task registration

Edge cases handled: zero calls/leads skipped, Redis unavailable fails open,
DB errors skip the cycle, Slack failures skip mark-alerted for retry.
Lookback window derived from interval_seconds (no separate config needed).
Per-merchant alerting scaffolded for Phase 2 (call_execution_config + DB filters).
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.

Comment thread app/services/pickup_rate/monitor.py
Comment on lines +56 to +70
rates = await compute_pickup_rates(
start_date=start_date,
end_date=now,
reseller_id=config.reseller_id,
merchant_id=config.merchant_id,
)

if rates is None:
logger.error("PickupRateMonitor: DB query failed, skipping alert cycle")
return

# Skip when there is no data – avoids false alerts on idle systems
if rates["calls_attempted"] == 0 and rates["total_leads"] == 0:
logger.info(
"PickupRateMonitor: no calls or leads in window, skipping alert"
Copy link

Copilot AI Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rates is None is currently not a reliable signal of DB failure. Both get_all_lead_call_trackers() and get_lead_based_analytics() catch exceptions and return [] on error (see app/database/accessor/breeze_buddy/lead_call_tracker.py), so compute_pickup_rates() will typically return a dict (with zeros) even when the DB query fails, causing this monitor to silently treat failures as “no data” and skip alerting.

Suggested fix: either (a) change the accessors to propagate exceptions / return None on failure (optionally behind a flag so existing callers keep current behavior), or (b) have compute_pickup_rates() use lower-level query calls that can surface exceptions and return None so this check cycle is correctly skipped and logged as an error.

Suggested change
rates = await compute_pickup_rates(
start_date=start_date,
end_date=now,
reseller_id=config.reseller_id,
merchant_id=config.merchant_id,
)
if rates is None:
logger.error("PickupRateMonitor: DB query failed, skipping alert cycle")
return
# Skip when there is no data – avoids false alerts on idle systems
if rates["calls_attempted"] == 0 and rates["total_leads"] == 0:
logger.info(
"PickupRateMonitor: no calls or leads in window, skipping alert"
try:
rates = await compute_pickup_rates(
start_date=start_date,
end_date=now,
reseller_id=config.reseller_id,
merchant_id=config.merchant_id,
)
except Exception:
logger.exception(
"PickupRateMonitor: pickup rate calculation failed, skipping alert cycle"
)
return
if rates is None:
logger.error("PickupRateMonitor: DB query failed, skipping alert cycle")
return
# A fully empty aggregate is ambiguous here: downstream accessors may
# collapse DB failures into empty results that compute_pickup_rates()
# turns into zero-valued metrics. Fail closed and skip the cycle
# instead of treating this as a confirmed no-data window.
if (
rates["calls_attempted"] == 0
and rates["total_leads"] == 0
and rates["call_pickup_rate"] == 0
and rates["lead_pickup_rate"] == 0
):
logger.error(
"PickupRateMonitor: received an empty pickup-rate aggregate; "
"this may represent masked DB/query failure, so skipping alert cycle"

Copilot uses AI. Check for mistakes.
Comment thread app/services/pickup_rate/monitor.py
Comment thread app/services/pickup_rate/task.py
alert_type="BOTH",
scope="global",
lookback_hours=max(1, interval_seconds // 3600),
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Risk: Lookback window and dedup TTL can diverge for sub-hour intervals

lookback_hours = max(1, interval_seconds // 3600) floors to minimum 1h. The dedup TTL is interval_seconds.

If PICKUP_RATE_ALERT_INTERVAL_SECONDS = 1800 (30 min):

  • lookback = max(1, 0) = 1 hour
  • dedup_ttl = 30 minutes

Next alert cycle at +30min queries a 1h window that overlaps almost entirely with the previous cycle. Alert fires again on the same underperformance data. Engineers get 2× alerts per actual event.

This is fine for the current default (86400s = 24h window, 24h dedup), but becomes confusing if someone sets a shorter interval.

Fix: either enforce dedup_ttl = lookback_hours * 3600 so dedup always covers the full window, or don't floor lookback (allow sub-1h lookback when interval < 3600s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants