Skip to content

hot reload feature proposal#1

Open
Uzziee wants to merge 5 commits intomainfrom
hot-reload-proposal
Open

hot reload feature proposal#1
Uzziee wants to merge 5 commits intomainfrom
hot-reload-proposal

Conversation

@Uzziee
Copy link
Owner

@Uzziee Uzziee commented Jan 28, 2026

No description provided.

Signed-off-by: Urjit Patel <105218041+Uzziee@users.noreply.github.com>
Copy link

@gunnarmorling gunnarmorling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @Uzziee, a few comments/suggestions/questions inline.

@@ -0,0 +1,905 @@
# Hot Reload Feature

As of today, any changes to virtual cluster configs (addition/removal/modification) require a full restart of kroxylicious app. This proposal is to add a dynamic reload feature, which will enable operators to modify virtual cluster configurations (add/remove/modify clusters) while **maintaining service availability for unaffected clusters** without the need for full application restarts. This feature will transform Kroxylicious from a **"restart-to-configure"** system to a **"live-reconfiguration"** system

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
As of today, any changes to virtual cluster configs (addition/removal/modification) require a full restart of kroxylicious app. This proposal is to add a dynamic reload feature, which will enable operators to modify virtual cluster configurations (add/remove/modify clusters) while **maintaining service availability for unaffected clusters** without the need for full application restarts. This feature will transform Kroxylicious from a **"restart-to-configure"** system to a **"live-reconfiguration"** system
As of today, any changes to virtual cluster configs (addition/removal/modification) require a full restart of kroxylicious app. This proposal is to add a dynamic reload feature, which will enable operators to modify virtual cluster configurations (add/remove/modify clusters) while **maintaining service availability for unaffected clusters** without the need for full application restarts. This feature will transform Kroxylicious from a **"restart-to-configure"** system to a **"live-reconfiguration"** system.

@@ -0,0 +1,905 @@
# Hot Reload Feature

As of today, any changes to virtual cluster configs (addition/removal/modification) require a full restart of kroxylicious app. This proposal is to add a dynamic reload feature, which will enable operators to modify virtual cluster configurations (add/remove/modify clusters) while **maintaining service availability for unaffected clusters** without the need for full application restarts. This feature will transform Kroxylicious from a **"restart-to-configure"** system to a **"live-reconfiguration"** system

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: I recommend to start each sentence on a new line (or even wrap after a line of thought in a longer sentence). That way, comments can be made more targeted rather than on a larger chunk of text at once.


# Part 1: Configuration Change Detection Framework

With this framework, kroxylicious will be able to detect config file changes (using standard fileWatcher service) and using various detector interfaces, it will figure out which virtual clusters are added/removed or modified. The list of affected clusters will be then passed on to the Part 2 of this feature, where the clusters would be gracefully restarted (or rollbacked to previous stable state in case of any failures )

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
With this framework, kroxylicious will be able to detect config file changes (using standard fileWatcher service) and using various detector interfaces, it will figure out which virtual clusters are added/removed or modified. The list of affected clusters will be then passed on to the Part 2 of this feature, where the clusters would be gracefully restarted (or rollbacked to previous stable state in case of any failures )
With this framework, Kroxylicious will be able to detect config file changes (using standard fileWatcher service) and using various detector interfaces, it will figure out which virtual clusters are added/removed or modified. The list of affected clusters will be then passed on to the Part 2 of this feature, where the clusters would be gracefully restarted (or rollbacked to previous stable state in case of any failures )


# Part 1: Configuration Change Detection Framework

With this framework, kroxylicious will be able to detect config file changes (using standard fileWatcher service) and using various detector interfaces, it will figure out which virtual clusters are added/removed or modified. The list of affected clusters will be then passed on to the Part 2 of this feature, where the clusters would be gracefully restarted (or rollbacked to previous stable state in case of any failures )

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this still mentions using the file watcher for detecting changes, wheras I think the discussion is converging around using an HTTP endpoint to do so? Are you planning to update the proposal in that regard, @Uzziee?

In general, I agree that watching the file system seems not the best approach, in particular for containerized environments, where I'd expect the config to be passed in via a config map. If a config map changes, I'm not sure whether the file system watcher would pick this up (instantly)? Having a HTTP-based trigger to enforce re-reading the config seems like the right thing to do.


With this framework, kroxylicious will be able to detect config file changes (using standard fileWatcher service) and using various detector interfaces, it will figure out which virtual clusters are added/removed or modified. The list of affected clusters will be then passed on to the Part 2 of this feature, where the clusters would be gracefully restarted (or rollbacked to previous stable state in case of any failures )

POC PR - https://github.com/kroxylicious/kroxylicious/pull/2901

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Closed by now, wanna link to the new one?

Comment on lines +284 to +289
public record ConfigurationChangeContext(
Configuration oldConfig,
Configuration newConfig,
List<VirtualClusterModel> oldModels,
List<VirtualClusterModel> newModels
) {}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will that accommodate changes to config parts other than clusters?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will also include filter changes, I forgot to add it as part of this proposal as it was added later

Comment on lines +294 to +298
public record ChangeResult(
List<String> clustersToRemove,
List<String> clustersToAdd,
List<String> clustersToModify
) {}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will that accommodate changes to config parts other than clusters?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will also include filter changes, I forgot to add it as part of this proposal as it was added later


## Flow diagram

<img width="2354" height="1394" alt="Image" src="https://github.com/user-attachments/assets/19a7edba-8881-4fa7-98c8-defcdbb132da" />

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't render.

Suggested change
<img width="2354" height="1394" alt="Image" src="https://github.com/user-attachments/assets/19a7edba-8881-4fa7-98c8-defcdbb132da" />
[Flow Diagram](https://github.com/user-attachments/assets/19a7edba-8881-4fa7-98c8-defcdbb132da)




# Part 2: Graceful Virtual Cluster Restart

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a high level, I think the flow should be this:

  • Make changes to the config (e.g. remounting a K8s config map with the file)
  • Trigger a reload via an HTTP endpoint, which causes the file to be re-read
  • Identify changes to all parts of the config, exposing them in a generic way (e.g. as a list of change descriptors, each comprising the id of affected config node (cluster id, filter id, etc.) and the old and new config of that node (either one of old/new or both may be present, based on the type of change, i.e. create/update/removal)
  • Pass these change descriptors to a pluggable set of change handlers; initially, we might only have an handler for cluster changes, while all other changes would be ignored or even rejected)

public class ConnectionTracker {

// Downstream connections (client → proxy)
private final Map<String, AtomicInteger> downstreamConnections = new ConcurrentHashMap<>();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private final Map<String, AtomicInteger> downstreamConnections = new ConcurrentHashMap<>();
private final ConcurrentMap<String, AtomicInteger> downstreamConnections = new ConcurrentHashMap<>();

(Here and elsewhere)

Uzziee and others added 4 commits January 29, 2026 09:45
Signed-off-by: Urjit Patel <105218041+Uzziee@users.noreply.github.com>
Rewrite the hot reload proposal to focus on architectural decisions
rather than implementation detail. The PR discussion has established
consensus on several key points that the document didn't reflect:

- Reframe around applyConfiguration(Configuration) as the core API,
  decoupled from trigger mechanisms (HTTP, file watcher, operator)
- Remove all Java class implementations and handler chains — these
  belong in the code PR where they're reviewable in context
- Call out remove+add with brief per-cluster downtime as deliberate
- Call out all-or-nothing rollback as the initial default, consistent
  with startup behaviour
- Move ReloadOptions to deployment-level static configuration rather
  than per-call parameters
- Identify plugin resource tracking as a known gap with pointer to
  separate proposal
- Flag open questions (config granularity, failure behaviour options,
  drain timeout configurability)
- Defer trigger mechanism design as explicit future work

Assisted-by: Claude claude-opus-4-6 <noreply@anthropic.com>
Signed-off-by: Sam Barker <sam@quadrocket.co.uk>
- Fix summary to read as proposed behaviour, not existing
- Use "administrators" instead of "operators" for humans to avoid
  confusion with the Kubernetes operator process
- Fix filter config examples (KMS endpoint, key selection pattern)
- Clarify failure behaviour is consistent across trigger mechanisms
- Note thundering herd as a known trade-off of remove+add
- Fix "original proposal" to "earlier iterations"

Assisted-by: Claude claude-opus-4-6 <noreply@anthropic.com>
Signed-off-by: Sam Barker <sam@quadrocket.co.uk>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants