Skip to content

ROCK-8447 Optimizations to reduce load when a server restarts#214

Open
stphnlee wants to merge 3 commits into
masterfrom
sl-bugfix-checkin-cache-only
Open

ROCK-8447 Optimizations to reduce load when a server restarts#214
stphnlee wants to merge 3 commits into
masterfrom
sl-bugfix-checkin-cache-only

Conversation

@stphnlee
Copy link
Copy Markdown
Contributor

What changed and why

CheckinCache.cs

Change Effect
IsReadyToPublish gate Suppresses all bus publishing until IsRockStarted is True() plus a 30-second grace period. The recycling server silently hydrates its cache from the DB and consumes messages from other nodes, without disturbing them.
Clear() rewritten Removes items locally in a loop but publishes only 1 message (the clear-all with key=null) instead of N+1 messages. This alone eliminates the biggest message amplification.

CheckinCacheConsumer.cs

Change Effect
IsFromSelf(IRockMessage) check Skips messages this node sent. The local cache was already updated before publishing, so processing your own message was doubling every operation.
Surgical AllKeys updates Instead of calling RockCache.Remove(allKeysListCacheKey) on every message (which forces a keyFactory() DB call on next access), the consumer now adds/removes the single key directly in the cached list. The AllKeys list only gets invalidated on errors or full clears.

Copilot AI review requested due to automatic review settings April 14, 2026 21:08
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Optimizes the FamilyCheckin cache bus behavior to reduce message amplification and DB load during server restarts and normal cache churn.

Changes:

  • Adds a warm-up gate to suppress cache bus publishing until Rock is started plus a grace period.
  • Reworks cache clearing to publish a single “clear-all” message instead of per-item messages.
  • Updates the cache consumer to skip self-sent messages and to update the cached AllKeys list incrementally instead of invalidating it.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
Plugins/org.secc.FamilyCheckin/Cache/CheckinCacheConsumer.cs Skips self-sent messages and performs “surgical” AllKeys list maintenance on update/remove to avoid invalidation/DB refresh.
Plugins/org.secc.FamilyCheckin/Cache/CheckinCache.cs Adds warm-up publish suppression and changes Clear() to local-remove + single clear-all publish.

Comment on lines 155 to +165
public static void Clear( Func<List<string>> keyFactory )
{
UpdateKeys( keyFactory );

// Create a copy of the keys to avoid collection modification during enumeration
// Remove each cached item locally without publishing per-item messages
foreach ( var key in AllKeys().ToList() )
{
FlushItem( key, keyFactory );
RockCache.Remove( key );
}

// Clear the AllKeys list
RockCache.Remove( AllKey, AllRegion );

Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clear(Func<List<string>> keyFactory) no longer uses keyFactory and only iterates over the current cached AllKeys() list. If the AllKeys entry is missing/stale (e.g., removed due to a prior error or eviction), this loop can be empty and the local cache items will not be cleared—especially since the consumer now skips self-sent clear-all messages. Consider refreshing keys via UpdateKeys(keyFactory) when AllKeys() is empty, or use RockCache.ClearCachedItemsForType(typeof(T)) to ensure the local clear is complete, and then publish the single clear-all message.

Copilot uses AI. Check for mistakes.
Comment thread Plugins/org.secc.FamilyCheckin/Cache/CheckinCache.cs Outdated
Comment thread Plugins/org.secc.FamilyCheckin/Cache/CheckinCacheConsumer.cs Outdated
Comment thread Plugins/org.secc.FamilyCheckin/Cache/CheckinCacheConsumer.cs Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

Plugins/org.secc.FamilyCheckin/Cache/CheckinCache.cs:144

  • In Remove(), the cache publishes the bus message before updating the local AllKeys list. With the new consumer behavior that skips self-sent messages, this node will never process its own remove message, so AllKeys can remain stale and still include the removed key until a later refresh. Update the local key list (UpdateKeys/removeKey) before publishing (or ensure local key removal happens independently of self-message processing).
        public static void Remove( string qualifiedKey, Func<List<string>> keyFactory )
        {
            RockCache.Remove( qualifiedKey );
            PublishCacheUpdateMessage( qualifiedKey, default( T ) );
            UpdateKeys( keyFactory, removeKey: qualifiedKey );
        }

Comment on lines +26 to +35
// Warm-up: suppress publishing until this node is fully online.
// Anchored to the time this type was first loaded (i.e. app start),
// NOT to the first publish attempt, so the grace period cannot
// fire long after the startup burst has already passed.
// Override via web.config appSettings: <add key="CheckinCacheWarmUpSeconds" value="30" />
private static readonly DateTime _typeLoadedUtc = DateTime.UtcNow;
private static readonly TimeSpan WarmUpGracePeriod = TimeSpan.FromSeconds(
int.TryParse( System.Configuration.ConfigurationManager.AppSettings["CheckinCacheWarmUpSeconds"], out int configuredSeconds )
? configuredSeconds
: 30 );
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The warm-up timestamp is stored in a static field on the generic type (CheckinCache), which is initialized when each closed generic type is first used—not necessarily at app start. This can unintentionally suppress publishing for WarmUpGracePeriod the first time a particular T is touched later in runtime (i.e., delayed suppression), contradicting the comment and potentially hiding updates. Consider anchoring the warm-up start to a non-generic, process-wide startup timestamp (shared across all T) or explicitly to Rock's application start time if available.

Copilot uses AI. Check for mistakes.
Comment thread Plugins/org.secc.FamilyCheckin/Cache/CheckinCache.cs Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

{
RockCache.Remove( qualifiedKey );
PublishCacheUpdateMessage( qualifiedKey, default( T ) );
UpdateKeys( keyFactory, removeKey: qualifiedKey );
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove() now calls UpdateKeys() before publishing the bus message. If keyFactory/UpdateKeys throws (DB/transient issues), the method will exit without publishing the remove, leaving other nodes’ caches stale. Consider publishing the remove regardless of key list update success (e.g., publish before updating keys, or wrap UpdateKeys in a try/catch so publish still happens).

Suggested change
UpdateKeys( keyFactory, removeKey: qualifiedKey );
try
{
UpdateKeys( keyFactory, removeKey: qualifiedKey );
}
catch ( Exception ex )
{
Rock.Model.ExceptionLogService.LogException(
new Exception( $"Failed to update cache keys for removed key {qualifiedKey}: {ex.Message}", ex ) );
}

Copilot uses AI. Check for mistakes.
Comment on lines 213 to 242
// If within throttle window and we have existing keys
if ( currentKeys.Any() && ( now - _lastKeysRefreshUtc ) < KeysRefreshInterval )
{
bool modified = false;

// Copy-on-write: clone the list so threads currently
// enumerating the original reference are not affected.
var updatedKeys = new List<string>( currentKeys );

// Ensure the specified key is present
if ( !string.IsNullOrEmpty( ensureKey ) && !currentKeys.Contains( ensureKey ) )
if ( !string.IsNullOrEmpty( ensureKey ) && !updatedKeys.Contains( ensureKey ) )
{
currentKeys.Add( ensureKey );
updatedKeys.Add( ensureKey );
modified = true;
}

// Remove the specified key
if ( !string.IsNullOrEmpty( removeKey ) && currentKeys.Contains( removeKey ) )
if ( !string.IsNullOrEmpty( removeKey ) && updatedKeys.Contains( removeKey ) )
{
currentKeys.Remove( removeKey );
updatedKeys.Remove( removeKey );
modified = true;
}

if ( modified )
{
RockCache.AddOrUpdate( AllKey, AllRegion, currentKeys );
RockCache.AddOrUpdate( AllKey, AllRegion, updatedKeys );
}

return currentKeys;
return updatedKeys;
}
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In UpdateKeys(), the throttle-path always clones currentKeys into updatedKeys even when ensureKey/removeKey don’t result in any change. This creates unnecessary allocations under load (and returns a list instance that isn’t the cached reference when modified is false). Consider only cloning when you actually need to add/remove a key; otherwise return currentKeys directly.

Copilot uses AI. Check for mistakes.
Comment on lines 61 to 64
{
// Constants for the AllKeys list, matching CheckinCache<T>
string allKeysListCacheKey = $"{typeof( T ).Name}:All";
string allKeysListCacheRegion = "AllItems"; // This is the 'AllRegion' constant from CheckinCache<T>
string allKeysListCacheRegion = "AllItems";

Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

allKeysListCacheRegion is hard-coded as the string "AllItems" here while CheckinCache<T> also defines the region name. This duplication makes it easy for producer/consumer to drift if the region constant ever changes. Consider exposing the region/key constants from CheckinCache<T> (e.g., internal const) or centralizing them in a shared helper so both sides reference the same value.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants