feat(breeze-buddy): add global pickup rate alert background task#717
feat(breeze-buddy): add global pickup rate alert background task#717Devansh-1218 wants to merge 1 commit into
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
WalkthroughIntroduces a new pickup rate alert system comprising runtime-configurable alert enablement and thresholds, a calculator computing call and lead-based pickup rates from database records, a background monitor performing threshold checks with Redis deduplication, and scheduler task initialization integrated into the FastAPI startup sequence. Changes
Sequence DiagramsequenceDiagram
participant FastAPI as FastAPI Startup
participant Task as Task Initializer
participant Config as Dynamic Config
participant Scheduler as Background Scheduler
participant Monitor as PickupRateMonitor
participant DB as Database
participant Redis as Redis
participant Slack as Slack API
FastAPI->>Task: initialize_pickup_rate_tasks()
Task->>Config: ENABLE_PICKUP_RATE_ALERT()
Config-->>Task: enabled: bool
alt Alert Enabled
Task->>Config: PICKUP_RATE_ALERT_INTERVAL_SECONDS()
Config-->>Task: interval: int
Task->>Scheduler: register_task(pickup_rate_monitor.check_and_alert, interval)
Scheduler-->>Task: success
Note over Scheduler: Task scheduled
else Alert Disabled
Task-->>FastAPI: early exit
end
Note over Scheduler: On scheduled interval...
Scheduler->>Monitor: check_and_alert()
Monitor->>Config: load_config()
Config-->>Monitor: AlertConfig
Monitor->>DB: fetch call trackers & lead analytics
DB-->>Monitor: metrics data
Monitor->>Monitor: compute pickup rates
alt Threshold Breached
Monitor->>Redis: check_dedup_key()
Redis-->>Monitor: exists?
alt No Recent Alert
Monitor->>Slack: send_alert(payload)
Slack-->>Monitor: success
Monitor->>Redis: set_dedup_key(TTL)
Redis-->>Monitor: done
else Recent Alert Exists
Note over Monitor: Skip (deduplicated)
end
else Below Threshold
Note over Monitor: No alert needed
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
Adds a new Breeze Buddy background task to monitor recent pickup rates (call-based and lead-based) and send Slack warnings when rates fall below a configurable threshold.
Changes:
- Introduces
app/services/pickup_rate/module containing config, rate calculation, monitoring + Slack alerting, and task registration. - Adds dynamic config getters for enabling the alert, setting the check interval, and setting the threshold.
- Registers the pickup-rate monitor task during
app/main.pylifespan startup (after Langfuse task registration).
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| app/services/pickup_rate/task.py | Registers the pickup-rate monitor task with the background scheduler using dynamic config. |
| app/services/pickup_rate/monitor.py | Implements the periodic check, threshold evaluation, Redis dedup, and Slack alert payload construction. |
| app/services/pickup_rate/config.py | Adds an AlertConfig dataclass (global scope now; merchant/reseller scaffolding for Phase 2). |
| app/services/pickup_rate/calculator.py | Computes call-based and lead-based pickup rates for the configured time window. |
| app/services/pickup_rate/init.py | Exposes the service’s public API (monitor singleton + initialization). |
| app/main.py | Wires pickup-rate task initialization into the app lifespan background task setup. |
| app/core/config/dynamic.py | Adds dynamic getters for enable flag, interval, and threshold for pickup-rate alerting. |
| The ENABLE_PICKUP_RATE_ALERT and PICKUP_RATE_ALERT_INTERVAL_SECONDS values | ||
| are fetched from dynamic config (Redis/DevCycle) so they can be toggled | ||
| at runtime without a redeploy. SLACK_WEBHOOK_URL is static and must be | ||
| present for alerts to have any effect. |
There was a problem hiding this comment.
The docstring implies PICKUP_RATE_ALERT_INTERVAL_SECONDS can be changed at runtime without redeploy, but the scheduler interval is fixed at registration time. Changing the dynamic interval later won’t change how often the task runs until the process restarts (only enabled is effectively runtime-togglable with the current scheduler). Consider clarifying the comment or implementing re-registration / dynamic interval handling.
| The ENABLE_PICKUP_RATE_ALERT and PICKUP_RATE_ALERT_INTERVAL_SECONDS values | |
| are fetched from dynamic config (Redis/DevCycle) so they can be toggled | |
| at runtime without a redeploy. SLACK_WEBHOOK_URL is static and must be | |
| present for alerts to have any effect. | |
| ENABLE_PICKUP_RATE_ALERT and PICKUP_RATE_ALERT_INTERVAL_SECONDS are read | |
| from dynamic config (Redis/DevCycle) when this initializer runs. The task | |
| is only registered if alerts are enabled, and the configured interval is | |
| applied at registration time. Changing the interval later will not affect | |
| an already-registered task unless it is re-registered or the process | |
| restarts. SLACK_WEBHOOK_URL is static and must be present for alerts to | |
| have any effect. |
|
|
||
| Edge-case handling: | ||
| - Zero calls / leads in window → skip (no data to evaluate) | ||
| - Redis unavailable → fail-open: log warning, allow alert to fire |
There was a problem hiding this comment.
The module docstring says “Redis unavailable → fail-open: … allow alert to fire”, but the BackgroundTaskScheduler requires Redis to acquire its distributed lock (scheduler.py:_execute_task). If Redis is down, the task won’t execute at all, so alerts won’t fire. Please adjust the docstring (and/or the PR description) to reflect the actual behavior, or change the scheduler/locking strategy if fail-open execution is a requirement.
| - Redis unavailable → fail-open: log warning, allow alert to fire | |
| - Redis unavailable → local dedup/mark-alerted operations fail open | |
| (log warning / continue), but scheduled execution | |
| still depends on BackgroundTaskScheduler Redis locking |
| start_date = now - timedelta(hours=config.lookback_hours) | ||
|
|
There was a problem hiding this comment.
PR description says the lookback window is derived from interval_seconds, but the code uses config.lookback_hours (defaulting to 24) and load_config() never sets it from the interval. Either derive lookback_hours from the interval in load_config() or update the PR description/config docs to match the implemented fixed 24h window.
| try: | ||
| # ------------------------------------------------------------------ | ||
| # 1. Call-based metrics | ||
| # ------------------------------------------------------------------ | ||
| # Phase 2 NOTE: pass reseller_id / merchant_id to the accessor once | ||
| # those optional params are supported by get_all_lead_call_trackers(). | ||
| call_trackers = await get_all_lead_call_trackers( | ||
| start_date=start_date, | ||
| end_date=end_date, | ||
| ) | ||
|
|
||
| calls_attempted = 0 | ||
| calls_no_answer = 0 | ||
|
|
||
| if call_trackers: | ||
| for tracker, _provider in call_trackers: | ||
| if tracker.status and tracker.status.value == "FINISHED": | ||
| calls_attempted += 1 | ||
| outcome_value = tracker.outcome if tracker.outcome else None | ||
| if outcome_value == "NO_ANSWER": | ||
| calls_no_answer += 1 | ||
|
|
||
| calls_picked = calls_attempted - calls_no_answer | ||
| call_pickup_rate = ( | ||
| round(calls_picked / calls_attempted * 100, 1) | ||
| if calls_attempted > 0 | ||
| else 0.0 | ||
| ) | ||
|
|
||
| # ------------------------------------------------------------------ | ||
| # 2. Lead-based metrics | ||
| # ------------------------------------------------------------------ | ||
| # Phase 2 NOTE: pass reseller_id / merchant_id once supported. | ||
| lead_data = await get_lead_based_analytics( | ||
| start_date=start_date, | ||
| end_date=end_date, | ||
| ) |
There was a problem hiding this comment.
compute_pickup_rates() documents returning None on DB error, but the underlying accessors (get_all_lead_call_trackers / get_lead_based_analytics) catch exceptions and return [] on failure (see app/database/accessor/breeze_buddy/lead_call_tracker.py). That means DB failures will be indistinguishable from “no data”, and the monitor will silently skip alerting rather than treating it as an error. Consider either letting the accessors propagate errors / return None on failure, or updating the docs + monitor logic to reflect the current behavior.
|
|
||
| calls_picked = calls_attempted - calls_no_answer | ||
| call_pickup_rate = ( | ||
| round(calls_picked / calls_attempted * 100, 1) |
There was a problem hiding this comment.
Pickup rates are rounded to 1 decimal before being used for threshold comparisons. This can change alert behavior around the threshold boundary (e.g., 39.96% rounds to 40.0% and won’t alert for a 40% threshold). Prefer keeping the unrounded float for evaluation and only rounding/formatting when rendering logs/Slack text.
| round(calls_picked / calls_attempted * 100, 1) | |
| calls_picked / calls_attempted * 100 |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
app/services/pickup_rate/monitor.py (2)
94-96: Redundant seconddatetime.now(timezone.utc).
nowis already computed at line 45; recomputingnow_displaya few ms later just for the date string is pointless and can (at window boundaries) disagree with thenowused for the lookback window. Reusenow.- now_display = datetime.now(timezone.utc) - display_date = now_display.strftime("%d-%m-%y") + display_date = now.strftime("%d-%m-%y")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/services/pickup_rate/monitor.py` around lines 94 - 96, The code recomputes datetime.now(timezone.utc) into now_display before building display_date/title, which can diverge from the earlier now used for the lookback; replace use of now_display with the already-computed now (i.e., remove now_display and set display_date = now.strftime("%d-%m-%y") and use that for title) so the alert date uses the same timestamp as the rest of the function (referencing the variables now, display_date, title and the call datetime.now(timezone.utc) to remove).
242-264: Unused parameterinterval_secondsin_should_alert.
interval_secondsis never referenced in the body — dedup is driven entirely by the TTL set in_mark_alerted. Drop the parameter (and the corresponding argument at line 83) to avoid giving the impression it's used.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/services/pickup_rate/monitor.py` around lines 242 - 264, The _should_alert function currently accepts an unused parameter interval_seconds; remove interval_seconds from the _should_alert signature and update all call sites that pass that argument (the caller that currently supplies interval_seconds) to call _should_alert with only redis_key, and ensure any references to interval_seconds inside or related logic remain handled by _mark_alerted's TTL logic; specifically modify function _should_alert and its invocation(s) so dedup is driven solely by the Redis key existence check and no unused parameter is left.app/services/pickup_rate/task.py (1)
32-42: Runtime enablement requires a restart.
ENABLE_PICKUP_RATE_ALERTis read only at startup; if the flag is flipped fromfalse→truein Redis later, the task will not get registered until the next deploy/restart. The monitor's owncheck_and_alertalready re-checksconfig.enabledper tick, so consider always registering the task whenSLACK_WEBHOOK_URLis configured and letting the monitor's runtime check gate execution. This makes the dynamic flag behave truly "dynamic" in both directions.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/services/pickup_rate/task.py` around lines 32 - 42, The startup-only check of ENABLE_PICKUP_RATE_ALERT prevents runtime toggling; change task registration in pickup_rate.task to not gate registration on ENABLE_PICKUP_RATE_ALERT — only require SLACK_WEBHOOK_URL to be present to register the PickupRateMonitor task — and rely on PickupRateMonitor.check_and_alert to call/read ENABLE_PICKUP_RATE_ALERT (or config.enabled) on each tick to decide whether to run alerts; locate the registration logic that references ENABLE_PICKUP_RATE_ALERT and SLACK_WEBHOOK_URL and remove the early return based on ENABLE_PICKUP_RATE_ALERT, ensuring the monitor itself performs the runtime check.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@app/services/pickup_rate/calculator.py`:
- Around line 65-81: calls_no_answer is counted across all trackers while
calls_attempted only counts FINISHED trackers, which can make calls_picked
negative; update the loop in the pickup rate calculation to only increment
calls_no_answer when tracker.status and tracker.status.value == "FINISHED" (same
filter used for calls_attempted) so both counters operate over the same set;
adjust the outcome check (outcome_value == "NO_ANSWER") to be nested under the
FINISHED-status branch and keep the rest of the logic computing calls_picked and
call_pickup_rate unchanged; apply the same fix pattern in
ScoreMonitor._get_daily_call_stats to ensure consistency.
---
Nitpick comments:
In `@app/services/pickup_rate/monitor.py`:
- Around line 94-96: The code recomputes datetime.now(timezone.utc) into
now_display before building display_date/title, which can diverge from the
earlier now used for the lookback; replace use of now_display with the
already-computed now (i.e., remove now_display and set display_date =
now.strftime("%d-%m-%y") and use that for title) so the alert date uses the same
timestamp as the rest of the function (referencing the variables now,
display_date, title and the call datetime.now(timezone.utc) to remove).
- Around line 242-264: The _should_alert function currently accepts an unused
parameter interval_seconds; remove interval_seconds from the _should_alert
signature and update all call sites that pass that argument (the caller that
currently supplies interval_seconds) to call _should_alert with only redis_key,
and ensure any references to interval_seconds inside or related logic remain
handled by _mark_alerted's TTL logic; specifically modify function _should_alert
and its invocation(s) so dedup is driven solely by the Redis key existence check
and no unused parameter is left.
In `@app/services/pickup_rate/task.py`:
- Around line 32-42: The startup-only check of ENABLE_PICKUP_RATE_ALERT prevents
runtime toggling; change task registration in pickup_rate.task to not gate
registration on ENABLE_PICKUP_RATE_ALERT — only require SLACK_WEBHOOK_URL to be
present to register the PickupRateMonitor task — and rely on
PickupRateMonitor.check_and_alert to call/read ENABLE_PICKUP_RATE_ALERT (or
config.enabled) on each tick to decide whether to run alerts; locate the
registration logic that references ENABLE_PICKUP_RATE_ALERT and
SLACK_WEBHOOK_URL and remove the early return based on ENABLE_PICKUP_RATE_ALERT,
ensuring the monitor itself performs the runtime check.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: cdd7fe43-df46-40a8-9075-baece35111e0
📒 Files selected for processing (7)
app/core/config/dynamic.pyapp/main.pyapp/services/pickup_rate/__init__.pyapp/services/pickup_rate/calculator.pyapp/services/pickup_rate/config.pyapp/services/pickup_rate/monitor.pyapp/services/pickup_rate/task.py
Adds a periodic background task that monitors call pickup rates and sends
Slack warning alerts when rates drop below a configurable threshold.
- Add app/services/pickup_rate/ module with:
- config.py: AlertConfig dataclass (enabled, interval, threshold, alert_type, scope)
with per-merchant fields stubbed for Phase 2 extension
- calculator.py: compute_pickup_rates() mirroring ScoreMonitor._get_daily_call_stats()
for both call-based and lead-based rate calculation
- monitor.py: PickupRateMonitor class with threshold evaluation, Redis-based dedup
(fail-open), and inline slack_alert.send() for alerts
- task.py: initialize_pickup_rate_tasks() following langfuse task registration pattern
- Add 3 dynamic config getters in dynamic.py: ENABLE_PICKUP_RATE_ALERT,
PICKUP_RATE_ALERT_INTERVAL_SECONDS, PICKUP_RATE_ALERT_THRESHOLD
- Register task in main.py lifespan after Langfuse task registration
Edge cases handled: zero calls/leads skipped, Redis unavailable fails open,
DB errors skip the cycle, Slack failures skip mark-alerted for retry.
Lookback window derived from interval_seconds (no separate config needed).
Per-merchant alerting scaffolded for Phase 2 (call_execution_config + DB filters).
8727f84 to
7a70e93
Compare
| rates = await compute_pickup_rates( | ||
| start_date=start_date, | ||
| end_date=now, | ||
| reseller_id=config.reseller_id, | ||
| merchant_id=config.merchant_id, | ||
| ) | ||
|
|
||
| if rates is None: | ||
| logger.error("PickupRateMonitor: DB query failed, skipping alert cycle") | ||
| return | ||
|
|
||
| # Skip when there is no data – avoids false alerts on idle systems | ||
| if rates["calls_attempted"] == 0 and rates["total_leads"] == 0: | ||
| logger.info( | ||
| "PickupRateMonitor: no calls or leads in window, skipping alert" |
There was a problem hiding this comment.
rates is None is currently not a reliable signal of DB failure. Both get_all_lead_call_trackers() and get_lead_based_analytics() catch exceptions and return [] on error (see app/database/accessor/breeze_buddy/lead_call_tracker.py), so compute_pickup_rates() will typically return a dict (with zeros) even when the DB query fails, causing this monitor to silently treat failures as “no data” and skip alerting.
Suggested fix: either (a) change the accessors to propagate exceptions / return None on failure (optionally behind a flag so existing callers keep current behavior), or (b) have compute_pickup_rates() use lower-level query calls that can surface exceptions and return None so this check cycle is correctly skipped and logged as an error.
| rates = await compute_pickup_rates( | |
| start_date=start_date, | |
| end_date=now, | |
| reseller_id=config.reseller_id, | |
| merchant_id=config.merchant_id, | |
| ) | |
| if rates is None: | |
| logger.error("PickupRateMonitor: DB query failed, skipping alert cycle") | |
| return | |
| # Skip when there is no data – avoids false alerts on idle systems | |
| if rates["calls_attempted"] == 0 and rates["total_leads"] == 0: | |
| logger.info( | |
| "PickupRateMonitor: no calls or leads in window, skipping alert" | |
| try: | |
| rates = await compute_pickup_rates( | |
| start_date=start_date, | |
| end_date=now, | |
| reseller_id=config.reseller_id, | |
| merchant_id=config.merchant_id, | |
| ) | |
| except Exception: | |
| logger.exception( | |
| "PickupRateMonitor: pickup rate calculation failed, skipping alert cycle" | |
| ) | |
| return | |
| if rates is None: | |
| logger.error("PickupRateMonitor: DB query failed, skipping alert cycle") | |
| return | |
| # A fully empty aggregate is ambiguous here: downstream accessors may | |
| # collapse DB failures into empty results that compute_pickup_rates() | |
| # turns into zero-valued metrics. Fail closed and skip the cycle | |
| # instead of treating this as a confirmed no-data window. | |
| if ( | |
| rates["calls_attempted"] == 0 | |
| and rates["total_leads"] == 0 | |
| and rates["call_pickup_rate"] == 0 | |
| and rates["lead_pickup_rate"] == 0 | |
| ): | |
| logger.error( | |
| "PickupRateMonitor: received an empty pickup-rate aggregate; " | |
| "this may represent masked DB/query failure, so skipping alert cycle" |
| alert_type="BOTH", | ||
| scope="global", | ||
| lookback_hours=max(1, interval_seconds // 3600), | ||
| ) |
There was a problem hiding this comment.
Risk: Lookback window and dedup TTL can diverge for sub-hour intervals
lookback_hours = max(1, interval_seconds // 3600) floors to minimum 1h. The dedup TTL is interval_seconds.
If PICKUP_RATE_ALERT_INTERVAL_SECONDS = 1800 (30 min):
- lookback = max(1, 0) = 1 hour
- dedup_ttl = 30 minutes
Next alert cycle at +30min queries a 1h window that overlaps almost entirely with the previous cycle. Alert fires again on the same underperformance data. Engineers get 2× alerts per actual event.
This is fine for the current default (86400s = 24h window, 24h dedup), but becomes confusing if someone sets a shorter interval.
Fix: either enforce dedup_ttl = lookback_hours * 3600 so dedup always covers the full window, or don't floor lookback (allow sub-1h lookback when interval < 3600s).
Adds a periodic background task that monitors call pickup rates and sends Slack warning alerts when rates drop below a configurable threshold.
Edge cases handled: zero calls/leads skipped, Redis unavailable fails open, DB errors skip the cycle, Slack failures skip mark-alerted for retry. Lookback window derived from interval_seconds (no separate config needed). Per-merchant alerting scaffolded for Phase 2 (call_execution_config + DB filters).
DEVPROOF :

Summary by CodeRabbit