Skip to content

fix: fetch all channel highlights (including retired) to prevent duplicate key error#3731

Closed
idoshamun wants to merge 2 commits intomainfrom
fix/channel-highlight-upsert-retired-posts
Closed

fix: fetch all channel highlights (including retired) to prevent duplicate key error#3731
idoshamun wants to merge 2 commits intomainfrom
fix/channel-highlight-upsert-retired-posts

Conversation

@idoshamun
Copy link
Member

Problem

replaceHighlightsForChannel in api-bg is throwing ~434 errors/6h on the opensource channel:

duplicate key value violates unique constraint "UQ_post_highlight_channel_post"
Key (channel, "postId")=(opensource, HYTRNkTpB) already exists.

Both api-bg pods (td8jq and tx8jf) are affected. Every scheduled run of api.generate-channel-highlight is failing with retries piling up.

Root Cause

replaceHighlightsForChannel only fetched active highlights (retiredAt IS NULL) to build its ID lookup map. When a previously retired post was re-selected by the evaluator, its existing row ID was unknown, so TypeORM's save() attempted an INSERT instead of an UPDATE — hitting the unique constraint on (channel, postId).

Fix

Fetch all existing highlights for the channel (active + retired) so the existingByPostId map always contains the correct row ID. This turns the conflicting INSERT into an UPDATE that clears retiredAt back to null.

Also narrows the retirement filter to only retire currently-active highlights not in the incoming set.

Tests

Added __tests__/common/channelHighlightPublish.ts covering:

  • Basic insert of new highlights
  • Retiring removed highlights
  • Re-admitting a previously retired post (the regression case)
  • Cross-channel isolation
  • Empty items handling
  • Updating headline/significance on active posts

…icate key error

When replaceHighlightsForChannel only fetched active highlights (retiredAt IS NULL),
re-admitting a previously retired post would attempt an INSERT instead of an UPDATE,
violating the UQ_post_highlight_channel_post unique constraint.

Root cause: The existingByPostId map was built from only active records, so retired
posts had no known ID. TypeORM's save() with id=undefined performs INSERT.

Fix: Fetch ALL existing highlights for the channel (active + retired) so the map
always contains the correct row ID. This turns the INSERT into an UPDATE when a
retired post is re-selected, and clears its retiredAt back to null.

Also narrows the retirement filter to only retire currently-active highlights that
are no longer in the incoming set (previously retired posts should stay retired
unless re-admitted).

Adds dedicated tests for replaceHighlightsForChannel covering:
- Basic insert
- Retiring removed highlights
- Re-admitting a retired post (the regression case)
- Cross-channel isolation
- Empty items handling
- Updating headline/significance on active posts
@pulumi
Copy link

pulumi bot commented Mar 23, 2026

🍹 The Update (preview) for dailydotdev/api/prod (at 1573ed4) was successful.

✨ Neo Explanation

This is a standard application rollout deploying a new build across all 7 services, 38 cron jobs, and triggering fresh database and ClickHouse migration jobs to run against the updated schema.

Root Cause Analysis

A new version of the API application has been built and is being deployed to production. Every workload in the cluster is being updated to run the new container image, and the versioned migration jobs from the previous release are being replaced with new ones for this release.

Dependency Chain

The new container image version cascades uniformly across all workloads:

  • 7 Deployments (API, background workers, WebSocket, private, Temporal, personalized digest, worker jobs) are all rolling out the new image
  • 38 CronJobs are updated to reference the new image so future scheduled runs use the latest code
  • 2 one-time migration Jobs (database and ClickHouse) from the previous release are deleted and replaced with new Jobs that run migrations against the new schema using the new image — this is the standard pattern for running migrations on each deploy

Risk Analysis

No stateful resources (databases, storage buckets, persistent volumes) are being replaced or deleted. The migration jobs are transient by design and follow the normal create/delete lifecycle per release. Deployments use rolling updates by default in Kubernetes, so there is no expected downtime.

Resource Changes

    Name                                                       Type                           Operation
~   vpc-native-update-current-streak-cron                      kubernetes:batch/v1:CronJob    update
~   vpc-native-update-trending-cron                            kubernetes:batch/v1:CronJob    update
~   vpc-native-generic-referral-reminder-cron                  kubernetes:batch/v1:CronJob    update
~   vpc-native-update-tags-str-cron                            kubernetes:batch/v1:CronJob    update
~   vpc-native-ws-deployment                                   kubernetes:apps/v1:Deployment  update
~   vpc-native-daily-digest-cron                               kubernetes:batch/v1:CronJob    update
~   vpc-native-channel-digests-cron                            kubernetes:batch/v1:CronJob    update
~   vpc-native-clean-zombie-user-companies-cron                kubernetes:batch/v1:CronJob    update
~   vpc-native-update-highlighted-views-cron                   kubernetes:batch/v1:CronJob    update
-   vpc-native-api-clickhouse-migration-4847c30e               kubernetes:batch/v1:Job        delete
~   vpc-native-update-source-public-threshold-cron             kubernetes:batch/v1:CronJob    update
~   vpc-native-personalized-digest-cron                        kubernetes:batch/v1:CronJob    update
~   vpc-native-update-views-cron                               kubernetes:batch/v1:CronJob    update
~   vpc-native-user-profile-analytics-clickhouse-cron          kubernetes:batch/v1:CronJob    update
~   vpc-native-rotate-daily-quests-cron                        kubernetes:batch/v1:CronJob    update
~   vpc-native-clean-expired-better-auth-sessions-cron         kubernetes:batch/v1:CronJob    update
+   vpc-native-api-clickhouse-migration-53a7329e               kubernetes:batch/v1:Job        create
~   vpc-native-clean-channel-highlights-cron                   kubernetes:batch/v1:CronJob    update
+   vpc-native-api-db-migration-53a7329e                       kubernetes:batch/v1:Job        create
~   vpc-native-calculate-top-readers-cron                      kubernetes:batch/v1:CronJob    update
~   vpc-native-check-analytics-report-cron                     kubernetes:batch/v1:CronJob    update
~   vpc-native-clean-zombie-images-cron                        kubernetes:batch/v1:CronJob    update
~   vpc-native-update-tag-recommendations-cron                 kubernetes:batch/v1:CronJob    update
~   vpc-native-user-posts-analytics-refresh-cron               kubernetes:batch/v1:CronJob    update
~   vpc-native-user-profile-analytics-history-clickhouse-cron  kubernetes:batch/v1:CronJob    update
~   vpc-native-user-profile-updated-sync-cron                  kubernetes:batch/v1:CronJob    update
~   vpc-native-clean-stale-user-transactions-cron              kubernetes:batch/v1:CronJob    update
~   vpc-native-private-deployment                              kubernetes:apps/v1:Deployment  update
~   vpc-native-channel-highlights-cron                         kubernetes:batch/v1:CronJob    update
~   vpc-native-post-analytics-clickhouse-cron                  kubernetes:batch/v1:CronJob    update
~   vpc-native-generate-search-invites-cron                    kubernetes:batch/v1:CronJob    update
~   vpc-native-hourly-notification-cron                        kubernetes:batch/v1:CronJob    update
~   vpc-native-worker-job-deployment                           kubernetes:apps/v1:Deployment  update
~   vpc-native-update-achievement-rarity-cron                  kubernetes:batch/v1:CronJob    update
~   vpc-native-personalized-digest-deployment                  kubernetes:apps/v1:Deployment  update
~   vpc-native-rotate-weekly-quests-cron                       kubernetes:batch/v1:CronJob    update
~   vpc-native-deployment                                      kubernetes:apps/v1:Deployment  update
~   vpc-native-bg-deployment                                   kubernetes:apps/v1:Deployment  update
~   vpc-native-clean-gifted-plus-cron                          kubernetes:batch/v1:CronJob    update
~   vpc-native-post-analytics-history-day-clickhouse-cron      kubernetes:batch/v1:CronJob    update
~   vpc-native-validate-active-users-cron                      kubernetes:batch/v1:CronJob    update
... and 10 other changes

@idoshamun idoshamun closed this Mar 23, 2026
@idoshamun idoshamun deleted the fix/channel-highlight-upsert-retired-posts branch March 23, 2026 07:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant