Conversation
Signed-off-by: Urjit Patel <105218041+Uzziee@users.noreply.github.com>
gunnarmorling
left a comment
There was a problem hiding this comment.
Hey @Uzziee, a few comments/suggestions/questions inline.
proposals/012-hot-reload-feature.md
Outdated
| @@ -0,0 +1,905 @@ | |||
| # Hot Reload Feature | |||
|
|
|||
| As of today, any changes to virtual cluster configs (addition/removal/modification) require a full restart of kroxylicious app. This proposal is to add a dynamic reload feature, which will enable operators to modify virtual cluster configurations (add/remove/modify clusters) while **maintaining service availability for unaffected clusters** without the need for full application restarts. This feature will transform Kroxylicious from a **"restart-to-configure"** system to a **"live-reconfiguration"** system | |||
There was a problem hiding this comment.
| As of today, any changes to virtual cluster configs (addition/removal/modification) require a full restart of kroxylicious app. This proposal is to add a dynamic reload feature, which will enable operators to modify virtual cluster configurations (add/remove/modify clusters) while **maintaining service availability for unaffected clusters** without the need for full application restarts. This feature will transform Kroxylicious from a **"restart-to-configure"** system to a **"live-reconfiguration"** system | |
| As of today, any changes to virtual cluster configs (addition/removal/modification) require a full restart of kroxylicious app. This proposal is to add a dynamic reload feature, which will enable operators to modify virtual cluster configurations (add/remove/modify clusters) while **maintaining service availability for unaffected clusters** without the need for full application restarts. This feature will transform Kroxylicious from a **"restart-to-configure"** system to a **"live-reconfiguration"** system. |
proposals/012-hot-reload-feature.md
Outdated
| @@ -0,0 +1,905 @@ | |||
| # Hot Reload Feature | |||
|
|
|||
| As of today, any changes to virtual cluster configs (addition/removal/modification) require a full restart of kroxylicious app. This proposal is to add a dynamic reload feature, which will enable operators to modify virtual cluster configurations (add/remove/modify clusters) while **maintaining service availability for unaffected clusters** without the need for full application restarts. This feature will transform Kroxylicious from a **"restart-to-configure"** system to a **"live-reconfiguration"** system | |||
There was a problem hiding this comment.
Minor: I recommend to start each sentence on a new line (or even wrap after a line of thought in a longer sentence). That way, comments can be made more targeted rather than on a larger chunk of text at once.
proposals/012-hot-reload-feature.md
Outdated
|
|
||
| # Part 1: Configuration Change Detection Framework | ||
|
|
||
| With this framework, kroxylicious will be able to detect config file changes (using standard fileWatcher service) and using various detector interfaces, it will figure out which virtual clusters are added/removed or modified. The list of affected clusters will be then passed on to the Part 2 of this feature, where the clusters would be gracefully restarted (or rollbacked to previous stable state in case of any failures ) |
There was a problem hiding this comment.
| With this framework, kroxylicious will be able to detect config file changes (using standard fileWatcher service) and using various detector interfaces, it will figure out which virtual clusters are added/removed or modified. The list of affected clusters will be then passed on to the Part 2 of this feature, where the clusters would be gracefully restarted (or rollbacked to previous stable state in case of any failures ) | |
| With this framework, Kroxylicious will be able to detect config file changes (using standard fileWatcher service) and using various detector interfaces, it will figure out which virtual clusters are added/removed or modified. The list of affected clusters will be then passed on to the Part 2 of this feature, where the clusters would be gracefully restarted (or rollbacked to previous stable state in case of any failures ) |
proposals/012-hot-reload-feature.md
Outdated
|
|
||
| # Part 1: Configuration Change Detection Framework | ||
|
|
||
| With this framework, kroxylicious will be able to detect config file changes (using standard fileWatcher service) and using various detector interfaces, it will figure out which virtual clusters are added/removed or modified. The list of affected clusters will be then passed on to the Part 2 of this feature, where the clusters would be gracefully restarted (or rollbacked to previous stable state in case of any failures ) |
There was a problem hiding this comment.
So this still mentions using the file watcher for detecting changes, wheras I think the discussion is converging around using an HTTP endpoint to do so? Are you planning to update the proposal in that regard, @Uzziee?
In general, I agree that watching the file system seems not the best approach, in particular for containerized environments, where I'd expect the config to be passed in via a config map. If a config map changes, I'm not sure whether the file system watcher would pick this up (instantly)? Having a HTTP-based trigger to enforce re-reading the config seems like the right thing to do.
proposals/012-hot-reload-feature.md
Outdated
|
|
||
| With this framework, kroxylicious will be able to detect config file changes (using standard fileWatcher service) and using various detector interfaces, it will figure out which virtual clusters are added/removed or modified. The list of affected clusters will be then passed on to the Part 2 of this feature, where the clusters would be gracefully restarted (or rollbacked to previous stable state in case of any failures ) | ||
|
|
||
| POC PR - https://github.com/kroxylicious/kroxylicious/pull/2901 |
There was a problem hiding this comment.
Closed by now, wanna link to the new one?
proposals/012-hot-reload-feature.md
Outdated
| public record ConfigurationChangeContext( | ||
| Configuration oldConfig, | ||
| Configuration newConfig, | ||
| List<VirtualClusterModel> oldModels, | ||
| List<VirtualClusterModel> newModels | ||
| ) {} |
There was a problem hiding this comment.
How will that accommodate changes to config parts other than clusters?
There was a problem hiding this comment.
This will also include filter changes, I forgot to add it as part of this proposal as it was added later
proposals/012-hot-reload-feature.md
Outdated
| public record ChangeResult( | ||
| List<String> clustersToRemove, | ||
| List<String> clustersToAdd, | ||
| List<String> clustersToModify | ||
| ) {} |
There was a problem hiding this comment.
How will that accommodate changes to config parts other than clusters?
There was a problem hiding this comment.
This will also include filter changes, I forgot to add it as part of this proposal as it was added later
proposals/012-hot-reload-feature.md
Outdated
|
|
||
| ## Flow diagram | ||
|
|
||
| <img width="2354" height="1394" alt="Image" src="https://github.com/user-attachments/assets/19a7edba-8881-4fa7-98c8-defcdbb132da" /> |
There was a problem hiding this comment.
This doesn't render.
| <img width="2354" height="1394" alt="Image" src="https://github.com/user-attachments/assets/19a7edba-8881-4fa7-98c8-defcdbb132da" /> | |
| [Flow Diagram](https://github.com/user-attachments/assets/19a7edba-8881-4fa7-98c8-defcdbb132da) |
proposals/012-hot-reload-feature.md
Outdated
|
|
||
|
|
||
|
|
||
| # Part 2: Graceful Virtual Cluster Restart |
There was a problem hiding this comment.
On a high level, I think the flow should be this:
- Make changes to the config (e.g. remounting a K8s config map with the file)
- Trigger a reload via an HTTP endpoint, which causes the file to be re-read
- Identify changes to all parts of the config, exposing them in a generic way (e.g. as a list of change descriptors, each comprising the id of affected config node (cluster id, filter id, etc.) and the old and new config of that node (either one of old/new or both may be present, based on the type of change, i.e. create/update/removal)
- Pass these change descriptors to a pluggable set of change handlers; initially, we might only have an handler for cluster changes, while all other changes would be ignored or even rejected)
proposals/012-hot-reload-feature.md
Outdated
| public class ConnectionTracker { | ||
|
|
||
| // Downstream connections (client → proxy) | ||
| private final Map<String, AtomicInteger> downstreamConnections = new ConcurrentHashMap<>(); |
There was a problem hiding this comment.
| private final Map<String, AtomicInteger> downstreamConnections = new ConcurrentHashMap<>(); | |
| private final ConcurrentMap<String, AtomicInteger> downstreamConnections = new ConcurrentHashMap<>(); |
(Here and elsewhere)
Signed-off-by: Urjit Patel <105218041+Uzziee@users.noreply.github.com>
Rewrite the hot reload proposal to focus on architectural decisions rather than implementation detail. The PR discussion has established consensus on several key points that the document didn't reflect: - Reframe around applyConfiguration(Configuration) as the core API, decoupled from trigger mechanisms (HTTP, file watcher, operator) - Remove all Java class implementations and handler chains — these belong in the code PR where they're reviewable in context - Call out remove+add with brief per-cluster downtime as deliberate - Call out all-or-nothing rollback as the initial default, consistent with startup behaviour - Move ReloadOptions to deployment-level static configuration rather than per-call parameters - Identify plugin resource tracking as a known gap with pointer to separate proposal - Flag open questions (config granularity, failure behaviour options, drain timeout configurability) - Defer trigger mechanism design as explicit future work Assisted-by: Claude claude-opus-4-6 <noreply@anthropic.com> Signed-off-by: Sam Barker <sam@quadrocket.co.uk>
- Fix summary to read as proposed behaviour, not existing - Use "administrators" instead of "operators" for humans to avoid confusion with the Kubernetes operator process - Fix filter config examples (KMS endpoint, key selection pattern) - Clarify failure behaviour is consistent across trigger mechanisms - Note thundering herd as a known trade-off of remove+add - Fix "original proposal" to "earlier iterations" Assisted-by: Claude claude-opus-4-6 <noreply@anthropic.com> Signed-off-by: Sam Barker <sam@quadrocket.co.uk>
No description provided.