Skip to content

Add JDBC client throttling — analysis document, implementation and documentation#523

Draft
Copilot wants to merge 20 commits into
mainfrom
copilot/discuss-throttling-implementation
Draft

Add JDBC client throttling — analysis document, implementation and documentation#523
Copilot wants to merge 20 commits into
mainfrom
copilot/discuss-throttling-implementation

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 16, 2026

  • Merge main branch into this branch

Agent-Logs-Url: https://github.com/Open-J-Proxy/ojp/sessions/a19cc33b-051c-409c-b8cf-9f7e1c9a4e98

Co-authored-by: rrobetti <7221783+rrobetti@users.noreply.github.com>
Copilot AI requested a review from rrobetti May 16, 2026 16:11
Copy link
Copy Markdown
Contributor

@rrobetti rrobetti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Evaluate the following idea:
Add new information to the SessionInfo returned to clients, the extra info will be 2 new fields:
1 - the number of clients connected to that datasource with same user and password (session hash)
2 - the max admission concurrent requests allowed in the server.

The OJP JDBC client shall use this info to create a semaphore and restrict how many concurrent requests it sends to OJP server, each cliend shall set its semaphore by calculating by the formula (max admission of ojp server divided by number of clients of that datasource) multiplied by the number of OJP servers in the cluster.

…IVE_THROTTLING_ANALYSIS.md

Agent-Logs-Url: https://github.com/Open-J-Proxy/ojp/sessions/474a33bb-3629-4cce-9651-8df0768af306

Co-authored-by: rrobetti <7221783+rrobetti@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 17, 2026

Evaluate the following idea:
Add new information to the SessionInfo returned to clients, the extra info will be 2 new fields:
1 - the number of clients connected to that datasource with same user and ...

Evaluated in commit 028c38e — a new section "Option: Server-Cooperative Fair-Share Throttling via SessionInfo" was added to CLIENT_REACTIVE_THROTTLING_ANALYSIS.md.

Summary of findings:

The formula works, but two things need to be pinned before implementation:

  1. Is maxAdmission per-node or cluster-aggregate? The formula multiplies by numOjpServers, so if maxAdmission is per-node the result is correct; if it's cluster-aggregate the multiplication is wrong. Must be decided upfront.
  2. What does clientCount count? It must be distinct clientUUID values (JVMs) per connHash, not raw session/connection count — otherwise a single JVM with a 50-connection pool would inflate the count by 50 and collapse everyone's limit.

Main risks identified:

  • Integer division wastes capacity: floor(20/7) = 2, leaving 6 slots permanently unused. Ceiling division + a 10% safety headroom is safer.
  • Thundering herd on client disconnect: When several clients drop, every remaining client gets a larger budget simultaneously and may burst together. A step-limited increase (AIMD-style) mitigates this.
  • Semaphore resizing is non-trivial: java.util.concurrent.Semaphore has no setPermits(). The driver must track currentLimit separately and safely release/acquire the delta.
  • Cross-node clientCount accuracy: In multinode mode, each node only sees its own clients. Recommended v1 approach: use per-node snapshot (conservative, slightly over-throttles), document it clearly, and add cluster-aggregate gossip in a later version.
  • In-transaction bypass: Mid-transaction statements must skip the local semaphore or they risk deadlocking while the server waits for commit/rollback.

Positive: clusterHealth already carries the node UP/DOWN list, so numOjpServers can be derived from it without a new field. maxAdmission = 0 can cleanly mean "admission control not configured → no client-side limit."

Overall this design is materially better than purely reactive for steady-state behavior. The protocol cost (two int32 fields on SessionInfo) is minimal.

Copilot AI changed the title Add server-triggered JDBC throttling options analysis document Add JDBC client throttling analysis documents May 17, 2026
Copilot AI requested a review from rrobetti May 17, 2026 07:54
@rrobetti
Copy link
Copy Markdown
Contributor

@copilot
1 - I believe maxAdmission is per-node, check the code and confirm.
2 - clientUUID values (JVMs) per connHash is correct.

Use Ceiling division + a 10% safety headroom and document the rational.

Do include A step-limited increase (AIMD-style) and document the rational.

In relation to "Semaphore resizing is non-trivial" is there any cheap(efficient) alternative to control concurrency that can be easily resized and that does not add latency in the flow?

"Cross-node clientCount accuracy" I do not understand what this means, explain in simple language with simple example.

"In-transaction bypass" , I am not sure if this is a problem, explain in simple language with a simple example.

Keep only the analysis for now, do not change the code yet.

…ve, AIMD, plain-language examples

Agent-Logs-Url: https://github.com/Open-J-Proxy/ojp/sessions/221f3b7b-572e-4c3a-a6b3-19471bba8684

Co-authored-by: rrobetti <7221783+rrobetti@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 17, 2026

@copilot
1 - I believe maxAdmission is per-node, check the code and confirm.
2 - clientUUID values (JVMs) per connHash is correct.
...

Updated in commit c9b629c. Here's what changed:

1. maxAdmission is per-node — confirmed.
SlotManager.totalSlots is set to actualPoolSize (the HikariCP connection pool size on this specific node). The formula multiplying by numOjpServers is therefore correct. This is now documented in the proposal section, marked resolved in Q1, and the Cons #4 "ambiguous" note has been replaced with a confirmation.

2. clientUUID per connHash — confirmed and resolved in C1.

3. Ceiling division + 10% headroom. S1 now explains the rationale: floor division leaves capacity permanently wasted; ceiling slightly over-allocates, which the 10% headroom compensates for by absorbing one stale clientCount error.

4. AIMD step-limited increase. S4 now has the full algorithm: decrease applies immediately (fast overload response), increase is capped at min(newLimit, currentLimit + 1) per SessionInfo update. Rationale is documented with an example showing how 4 simultaneous disconnects without AIMD would spike all remaining clients at once.

5. Semaphore alternative. The new Con #5 shows that a simple AtomicInteger counter + volatile int limit is the right choice for fail-fast (non-blocking) control. Resizing is a single volatile write — zero overhead on the happy path (one atomic increment + one comparison). No drain/inject logic needed.

6. Cross-node clientCount — plain-language example added. Con #8 now shows concretely: App1→Node A, App2→Node B, each node sees only its own client (clientCount = 1), both clients compute (10/1)*2 = 20 permits but total cluster capacity is only 20. Safe v1 approach: accept per-node snapshot; server's own SlotManager remains the final safety gate.

7. In-transaction bypass — plain-language example added. C5 now walks through the step-by-step scenario where Thread A holds an open transaction, the client semaphore fills, Thread A blocks waiting for a permit, and the server's transaction timeout fires before Thread A ever sends its next statement. Fix: skip the inFlight check when autoCommit == false.

Copy link
Copy Markdown
Contributor

@rrobetti rrobetti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot we should have two modes configurable, proactive and reactive, in the proactive mode we do the client trottling with the atomic integer and fail fast as suggested, if reactive mode I want to explore and hear your opinions on keeping a record of how many simultaneous connections was the server serving before one got an admission timout, the goal with that is to send a smarter number to the client instead of the fixed total the server was configured to take, in real scenarios ig might break before that, for example, if ojp server was being able to serve 50 simultaneous requests without errors but when it tried to serve 51 it got a admission timout error, it means its limit is 50 and that is the number we should send to the client. Tell me your concerns, opinions, sugestions and questions about this new idea.

Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 17, 2026

@copilot we should have two modes configurable, proactive and reactive, in the proactive mode we do the client trottling with the atomic integer and fail fast as suggested, if reactive mode I want to ...

Evaluated and documented in commit 014e56e. The new sections added are:

Two-Mode Framework — a short overview table comparing proactive, reactive, and combined modes, showing what each requires and recommends.

Reactive Mode with observedPeak — Deep Analysis — full evaluation of the idea:

How it would work in SlotManager:

  • Add highWaterMark (tracks peak in-flight count) and observedPeak (initialized to maxAdmission)
  • On wait-timeout: observedPeak = max(floor, min(observedPeak, currentActive))
  • AIMD recovery: every totalSlots × 2 successful releases: observedPeak = min(maxAdmission, observedPeak + 1)
  • Sent as a third int32 field in SessionInfo; clients use it in place of maxAdmission in the fair-share formula

Main concerns raised:

  1. "False floor" collapse — a single slow query or DB blip can drive observedPeak to near-zero. Fix: 10% floor (max(1, totalSlots * 0.1)).
  2. Recovery rate tuning — too fast = burst risk, too slow = wasted capacity. K should be configurable.
  3. Timeout path ambiguity — only the semaphore wait-timeout is a clean signal; queue-depth rejections may produce misleading counts.
  4. SQS interaction — a slow-lane timeout ≠ total server overload; using activeSlow + activeFast total is safer than per-lane.
  5. Multiple concurrent timeouts raceAtomicInteger.updateAndGet with a pre-snapshotted currentActive handles this correctly.

Opinion (confidence: 85%): The idea is sound — it's the TCP CWND model applied to OJP. The "false floor" and recovery rate are the only significant risks and both have clear mitigations. The recommended v1 is combined mode: send both maxAdmission and observedPeak, driver uses min(proactiveLimit, reactiveLimit). Reactive-only (no clientCount tracking) is simpler and could be the right first cut if fairness between clients is not the immediate priority.

Copilot AI requested a review from rrobetti May 17, 2026 09:50
Copy link
Copy Markdown
Contributor

@rrobetti rrobetti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot the documents are too verbose, reduce the volume of information and focus on the latest ideas and solutions we're discussing, keep notes of previous ideas considered and why they were dropped.

Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 17, 2026

@copilot the documents are too verbose, reduce the volume of information and focus on the latest ideas and solutions we're discussing, keep notes of previous ideas considered and why they were dropped...

Done in commit c265470.

  • CLIENT_REACTIVE_THROTTLING_ANALYSIS.md: 1012 → 177 lines. Now structured as: protocol additions → proactive mode (formula, AtomicInteger, AIMD, in-transaction bypass, cross-node caveat) → reactive mode (observedPeak, risks table, mitigations) → combined mode → open questions → summary table → "Dropped Approaches" (purely client-reactive, Semaphore, floor division — each with a one-line reason why dropped).
  • JDBC_SERVER_TRIGGERED_THROTTLING_OPTIONS.md: 222 → 74 lines. Now a concise options-considered log with a forward reference to the current design doc. Resolved option sets (signaling method, computation method, queue scope) are collapsed to one-liners showing what was chosen and why alternatives were skipped.

… by CLIENT_REACTIVE_THROTTLING_ANALYSIS.md

Agent-Logs-Url: https://github.com/Open-J-Proxy/ojp/sessions/8ea9c312-8c7e-4888-a908-9741462f67bf

Co-authored-by: rrobetti <7221783+rrobetti@users.noreply.github.com>
Copilot AI changed the title Add JDBC client throttling — analysis documents and implementation Add JDBC client throttling — analysis document and implementation May 19, 2026
* Update limits from a fresh SessionInfo.
* AIMD: decrease immediately; increase capped at currentLimit + 1.
*/
public void updateFromSessionInfo(SessionInfo sessionInfo) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot does this run on every request response? I am worried about the performance impact

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, updateFromSessionInfo does not run on every row. It runs when a new block of rows arrives from the server (at ResultSet block boundaries), after each executeUpdate, and after each PreparedStatement execution — not per individual row.

The cost per call is low: a few protobuf field reads, a String.split(";") over the short clusterHealth string, a handful of arithmetic operations, and two volatile writes. Amortized over potentially hundreds or thousands of rows per block, the overhead is negligible.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updateFromSessionInfo is called only when Connection.setSession() is called — which happens on executeUpdate responses (the server returns a new SessionInfo in the OpResult) and on the initial connect. It does not run on every executeQuery response because queries return an iterator over results, not a SessionInfo.

When it does run, the cost is minimal: a few arithmetic operations (one Math.ceil, one multiply, one multiply by 0.9) plus two volatile int writes. The total overhead is in the low-nanosecond range — negligible compared to the network roundtrip to OJP server that already dominates query latency.

On the hot path (tryAcquire / release), called on every request, the cost is even cheaper: one AtomicInteger.get() + one compareAndSet — both single CPU instructions.

ClientThrottleMode mode = this.connection.getThrottleMode();
// getAutoCommit() may throw SQLException; evaluate before acquiring a slot
// so that release() is never called without a matching acquire.
boolean inTransaction = !this.connection.getAutoCommit();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot is this call being sent to OJP Server or we keep a boolean in the OJP JDBC side?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Purely local — no network call. Connection.getAutoCommit() simply returns this.autoCommit, a local boolean field that is kept in sync on the driver side when setAutoCommit(), commit(), or rollback() is called.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getAutoCommit() reads a local boolean autoCommit field on the Connection object — no call to OJP server. The field is updated only when setAutoCommit() is called (which does make a gRPC call to start or commit a transaction), but reading it is a plain local field access. Zero network overhead.

@rrobetti
Copy link
Copy Markdown
Contributor

@copilot update relevant documentation including but not restricted to the eBook about this new throttling solution. Keep the explanation in simple language and provide examples. You can use mermaid diagram if required.

Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 20, 2026

@copilot update relevant documentation including but not restricted to the eBook about this new throttling solution. Keep the explanation in simple language and provide examples. You can use mermaid d...

Done in commit 57f4ce1. Documentation updated across all relevant areas:

New eBook chapterpart3-chapter8a-client-throttling.md:

  • Opens with a highway/tunnel analogy to set the context simply
  • Shows the overload cascade problem with a Mermaid sequence diagram (before)
  • Shows how throttling prevents it (after diagram)
  • Explains the three SessionInfo signals the server sends (maxAdmission, clientCount, observedPeak)
  • Walks through the fair-share formula step by step with a concrete example
  • Explains why ceiling division + 10% headroom (not floor division)
  • Shows how the fail-fast AtomicInteger counter works (code snippet)
  • Explains AIMD step-limited increase with the "4 clients disconnect simultaneously" example
  • Covers the in-transaction bypass and why it matters
  • Explains the three modes (proactive, reactive, combined) with a Mermaid decision diagram
  • Deep-dives on observedPeak with a numeric example
  • Configuration section with property table and per-datasource override
  • SQS interaction section with a layered architecture Mermaid diagram
  • Multinode behaviour and the v1 cross-node caveat
  • Monitoring and troubleshooting section

Updated existing docs:

  • part2-chapter5-jdbc-configuration.md — new Section 5.5 with property table and examples
  • part2-chapter6-server-configuration.md — new Section 6.8 covering server-side signals and observedPeakRecoveryFactor tuning
  • part3-chapter8-slow-query-segregation.md — updated closing paragraph with navigation link to chapter 8a
  • documents/ebook/README.md — chapter 8a added to table of contents and advanced users reading path
  • documents/configuration/ojp-jdbc-configuration.mdojp.jdbc.clientThrottle.mode added to the JDBC Client Properties reference table

Copilot AI changed the title Add JDBC client throttling — analysis document and implementation Add JDBC client throttling — analysis document, implementation and documentation May 20, 2026
Copy link
Copy Markdown
Contributor

@rrobetti rrobetti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot merge main branch into this branch. Update the two documents OJP client specification about how to create clients in other languages for OJP.

Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 20, 2026

@copilot merge main branch into this branch. Update the two documents OJP client specification about how to create clients in other languages for OJP.

Done in commits b415310 and 15cdd82.

Main was already merged (the branch was up to date). Both client specification documents have been updated with the throttling feature:

CLIENT_SPEC.md (human-readable guide for non-Java implementers):

  • §2.4 Client Responsibilities: added client-side throttling to the list
  • §4.2 SessionInfo fields table: added clientCount, maxAdmission, observedPeak
  • §7.10 Configuration: added ojp.jdbc.clientThrottle.mode (default: combined)
  • New §7.14 Client-Side Throttling: why it exists, the three server signals, fair-share formula with before/after example, four modes, AIMD, in-transaction bypass, cross-node caveat, and pseudo-code walkthrough
  • §8 Testing: added throttle test scenarios (all four modes, in-transaction bypass, AIMD recovery)
  • Appendix B: added ClientThrottleManager and ClientThrottleMode

CLIENT_SPEC_AI.md (normative spec for automated client generation):

  • §1 Terminology: added maxAdmission, observedPeak, clientCount, ClientThrottleManager
  • §3.1 SessionInfo structure: added the three new fields
  • New §8 Client-Side Throttling: normative rules (8.1–8.8) covering signals, formula, AIMD, modes, concurrency rules, in-transaction bypass, CAS counter contract
  • §10.1 MUST Implement: added throttle acquire/release, AIMD update, and CAS counter requirements
  • §10.2 SHOULD Implement: added all four modes via ojp.jdbc.clientThrottle.mode
  • Sections 8→9, 9→10, 10→11 renumbered

Copilot AI requested a review from rrobetti May 20, 2026 06:42
@sonarqubecloud
Copy link
Copy Markdown

@rrobetti
Copy link
Copy Markdown
Contributor

@copilot merge main branch into this branch

Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 20, 2026

@copilot merge main branch into this branch

The branch is already up to date with main — no new commits on main that aren't already in this branch. Nothing to merge.

@copilot merge main branch into this branch

The branch is already up to date with main — no new commits have landed on main since the last merge. Nothing to do.

Copilot AI requested a review from rrobetti May 20, 2026 17:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants