Spring AI advisor + auto-configuration that adds budget enforcement to ChatClient invocations. Integrates with the Cycles Protocol for runtime authority over LLM spend, multi-tenant agent governance, and tamper-evident audit. Built for production Spring AI applications that need to gate LLM calls before they hit the provider.
Per-call lifecycle: reserve → call → commit on success, reserve → call → release on exception. When the Cycles server denies the reservation, the LLM call never happens and a CyclesBudgetDeniedException is thrown. Compatible with Java 21+, Spring Boot 3.5+, and Spring AI 1.0+.
<dependency>
<groupId>io.runcycles</groupId>
<artifactId>cycles-spring-ai-starter</artifactId>
<version>0.3.1</version>
</dependency>This dependency transitively pulls in cycles-client-java-spring which provides the underlying HTTP client to the Cycles server.
In application.yml:
cycles:
base-url: http://localhost:7878 # Cycles server URL
api-key: ${CYCLES_API_KEY} # provisioned via Cycles Admin
tenant: acme-corp # subject defaults applied to every call
workspace: production
app: order-agent
cycles.spring-ai:
enabled: true # default true; set false to bypass
default-estimate: 1000 # default per-call estimate (USD_MICROCENTS)
estimate-unit: USD_MICROCENTS # also accepts TOKENS, CREDITS, RISK_POINTS
action-kind: llm.chat
action-name: spring-ai-chat
fail-open: false # true = log + proceed on transport errorsThe first block (cycles.*) is owned by the underlying cycles-client-java-spring SDK; the second block (cycles.spring-ai.*) is owned by this starter.
@Service
public class OrderAgent {
private final ChatClient chatClient;
public OrderAgent(ChatClient.Builder builder) {
this.chatClient = builder.build();
}
public String summarize(String order) {
// Cycles reserves budget BEFORE this call hits the LLM provider.
// If the budget is exhausted, CyclesBudgetDeniedException is thrown
// and the LLM call never happens. On success, usage is committed
// back to Cycles. On exception, the reservation is released.
return chatClient.prompt()
.user("Summarize: " + order)
.call()
.content();
}
}No code changes to your call sites. The advisor is auto-attached to every ChatClient built from the auto-configured ChatClient.Builder via a ChatClientCustomizer bean.
For agents that call tools, wrap each ToolCallback with the auto-configured CyclesToolGate to reserve / commit / release per tool call. Tool reservations report tool.call / spring-ai-tool:<name> action labels so they're separable from chat reservations in audit history.
@Configuration
class ToolWiring {
@Bean
ToolCallback getWeatherTool(CyclesToolGate cyclesToolGate) {
ToolCallback raw = MethodToolCallback.builder()
.toolDefinition(ToolDefinition.builder().name("get_weather").build())
.toolMethod(...)
.build();
return cyclesToolGate.wrap(raw); // ← Cycles-gated
}
}Tool gating is opt-in: Spring AI doesn't provide a hook to auto-decorate every registered tool, so you choose which tools to gate. Currently tool reservations commit default-estimate as actual (tool callbacks don't expose token usage to the gate).
The auto-configured CyclesChatClientObservationConvention appends low-cardinality Cycles attribution tags (cycles.tenant, cycles.workspace, cycles.app, cycles.action_kind, cycles.action_name) to every chat-client trace. Apply it explicitly on a ChatClient.Builder to opt in:
@Service
class TracedAgent {
private final ChatClient chatClient;
TracedAgent(ChatClient.Builder builder, CyclesChatClientObservationConvention cyclesConvention) {
this.chatClient = builder
.observationConvention(cyclesConvention)
.build();
}
// ...
}The bean is auto-configured but NOT auto-attached — applying a convention has cross-cutting trace-visibility implications that should be a deliberate user decision.
Non-streaming chat (chatClient.prompt(...).call()):
| Step | Cycles wire call | Spring AI insertion point |
|---|---|---|
| Pre-call | POST /v1/reservations with subject + action + estimate |
CallAdvisor.adviseCall(...) runs at HIGHEST_PRECEDENCE + 100 |
| Call | (advisor delegates to chain.nextCall(request)) |
Spring AI continues advisor chain → provider call |
| Commit on success | POST /v1/reservations/{id}/commit with actual amount |
After chain.nextCall returns |
| Release on error | POST /v1/reservations/{id}/release with reason |
Catch block re-throws original after release |
Streaming chat (chatClient.prompt(...).stream()) — same lifecycle adapted to the reactive signal model. The entire pipeline is wrapped in Flux.defer(...) so reservation state is per-subscription (no leak when the Flux is assembled but never subscribed; resubscribing gets a fresh reservation):
| Step | Cycles wire call | Reactor signal |
|---|---|---|
| Pre-stream | POST /v1/reservations |
On subscription (inside Flux.defer). Reservation failures (denial, transport) surface as onError to the subscriber — the reactive-idiomatic shape; handle via .onErrorResume(...). |
| Stream | (advisor passes chunks through, tracking last seen) | doOnNext(lastResponse::set) |
| Commit on complete | POST /v1/reservations/{id}/commit with usage from the last chunk |
concatWith(Mono.defer(...)) after the upstream emits onComplete. Commit runs before the subscriber observes terminal completion, so a fail-closed commit failure correctly surfaces as onError (the way the non-streaming advisor fails the call). |
| Release on error | POST /v1/reservations/{id}/release |
doOnError |
| Release on cancel | POST /v1/reservations/{id}/release |
doOnCancel |
| Release on assembly failure | POST /v1/reservations/{id}/release |
If chain.nextStream(request) throws synchronously after we reserved, we release and re-throw. |
Tool invocations (when wrapped via CyclesToolGate.wrap):
| Step | Cycles wire call | Tool insertion point |
|---|---|---|
| Pre-call | POST /v1/reservations with tool.call action kind |
Before delegate.call(...) |
| Call | (wrapper delegates to the wrapped tool) | Spring AI invokes the tool |
| Commit on success | POST /v1/reservations/{id}/commit with default-estimate as actual |
After delegate returns |
| Release on exception | POST /v1/reservations/{id}/release |
Wrapper re-throws original after release |
Both chat advisors are registered automatically via Spring AI's ChatClientCustomizer mechanism — ChatClientAutoConfiguration discovers customizer beans and applies them to the builder. Simply exposing a CallAdvisor bean is not enough in Spring AI 1.0+ — the customizer is the supported wiring path. The tool gate and observation convention are exposed as beans for explicit opt-in (see Quick Start steps 4 and 5).
- Java: 21+
- Spring Boot: 3.5.x
- Spring AI: 1.0.x (BOM-managed; tested compatible with 1.1.x via the post-scaffold Dependabot bump to 1.1.6)
Three new extension points and a trace-correlation tag, on top of v0.2.0's full feature surface:
- ✅ Pluggable
SubjectResolver— multi-tenant agents can route the CyclesSubjectper call (tenant from@AuthenticationPrincipal, request header, thread-local, etc.) instead of using the static property defaults. Register aSubjectResolverbean and the auto-config's default backs off via@ConditionalOnMissingBean. See Extension points below. - ✅ Pluggable
PromptTokenEstimator— replace the v0.2.0chars / 4heuristic with real BPE tokenization. The starter ships a jtokkit-based estimator (cl100k_base/o200k_baseetc. — opt in viacycles.spring-ai.token-estimator-encoding) or you can supply your own bean for provider-specific tokenizers. - ✅
cycles.reservation_idon chat traces — theCyclesChatClientObservationConventionnow emits the active reservation id as a high-cardinality KeyValue on every chat-client observation, enabling trace ↔ Cycles reservation correlation in your tracing backend. Opt-out viacycles.spring-ai.emit-reservation-id-on-trace=false. - ✅ End-to-end integration test — the test bundle now boots a Spring context with the real auto-configuration and verifies the advisor attachment + reserve/commit lifecycle through a stub
ChatModel. Closes the "what if a regression breaks the wiring but unit tests still pass?" gap.
All known limitations from v0.1.0 are addressed:
- ✅ Streaming chat gating.
CyclesBudgetStreamAdvisormirrors the lifecycle of the non-streaming advisor forchatClient.prompt(...).stream()invocations. Reserves before subscribing; commits on stream complete; releases on error or subscriber cancellation. Both advisors are auto-attached to the auto-configuredChatClient.Builder. - ✅ Real
ChatResponse.Usageextraction on commit — when the LLM provider returns usage and eitherinput-cost-per-token/output-cost-per-tokenare configured (orestimate-unit=TOKENS), the advisor commits the actual cost computed from tokens rather than the estimate. Falls back to estimate-as-actual when usage data is missing. Applies to both the call and stream advisors (the stream advisor uses the last chunk that carried usage). - ✅ Prompt-based per-call estimate. When
cycles.spring-ai.estimate-from-prompt=trueand one of the cost-per-token rates is configured, the pre-call reservation is sized from the prompt's character count (chars / 4 → tokens) rather than the fixeddefault-estimate. Falls back todefault-estimatewhen the prompt is empty or rates are zero. - ✅
ToolCallbackdecoration.CyclesToolCallbackwraps any Spring AIToolCallbackwith the same reserve / commit / release lifecycle. Users opt in via the auto-configuredCyclesToolGate.wrap(...)factory. Tool reservations report distincttool.call/spring-ai-tool:<name>action labels so they're separable from chat reservations in audit history. - ✅
ObservationConventionfor chat-client traces.CyclesChatClientObservationConventionextends Spring AI's default convention and appends low-cardinality Cycles attribution tags (cycles.tenant,cycles.workspace,cycles.app,cycles.action_kind,cycles.action_name) to every chat-client observation. Auto-configured as a bean but NOT auto-attached — users apply it explicitly viachatClientBuilder.observationConvention(cyclesConvention).
| Property | Default | Description |
|---|---|---|
cycles.spring-ai.enabled |
true |
Master switch. Set false to disable Cycles wiring entirely. |
cycles.spring-ai.default-estimate |
1000 |
Default per-call estimate, in the configured unit. Used unless estimate-from-prompt=true derives a per-call value from prompt size. |
cycles.spring-ai.estimate-unit |
USD_MICROCENTS |
Unit for the estimate. Cycles Unit enum values: USD_MICROCENTS, TOKENS, CREDITS, RISK_POINTS. |
cycles.spring-ai.action-kind |
llm.chat |
Action.kind label reported to Cycles. |
cycles.spring-ai.action-name |
spring-ai-chat |
Action.name label reported to Cycles. |
cycles.spring-ai.fail-open |
false |
When true, transport errors against Cycles are logged and the LLM call proceeds. Budget denials are always surfaced. |
cycles.spring-ai.input-cost-per-token |
0 |
Per-input-token cost in the estimate unit. When set (with output-cost-per-token), the advisor commits actual token-based cost instead of the estimate. Example: 250 (= $2.50/1M tokens for OpenAI gpt-4o input, since 1 USD = 100,000,000 USD_MICROCENTS). |
cycles.spring-ai.output-cost-per-token |
0 |
Per-output-token cost. Example: 1000 (= $10.00/1M tokens for OpenAI gpt-4o output). |
cycles.spring-ai.estimate-from-prompt |
false |
When true and at least one cost-per-token rate is set, sizes the pre-call reservation from the prompt char count (chars / 4 × combined rate). Falls back to default-estimate when the prompt is empty or rates are zero. |
cycles.spring-ai.tool-action-kind |
tool.call |
Action.kind label reported for CyclesToolCallback-wrapped tool invocations (distinct from chat's action-kind). |
cycles.spring-ai.tool-action-name-prefix |
spring-ai-tool: |
Prefix prepended to the wrapped tool's name to produce the action.name label (e.g. spring-ai-tool:get_weather). |
cycles.spring-ai.token-estimator-encoding |
unset | When set AND jtokkit is on the classpath, swaps the default chars/4 prompt-token estimator for real BPE encoding. Values: cl100k_base (gpt-3.5-turbo, gpt-4), o200k_base (gpt-4o family), p50k_base / p50k_edit / r50k_base (older models). Requires adding com.knuddels:jtokkit:1.1.0 to your app's pom; the dep is optional=true on this starter. |
cycles.spring-ai.emit-reservation-id-on-trace |
true |
When the CyclesChatClientObservationConvention is applied, emit the active cycles.reservation_id as a high-cardinality KeyValue on chat-client observations (enables trace ↔ reservation correlation). Set false to omit when your tracing backend charges by unique tag-value combinations. |
Connection + subject properties (cycles.base-url, cycles.api-key, cycles.tenant, etc.) come from cycles-client-java-spring — see that repo's README for the full list.
The starter exposes three pluggable beans so you can replace the defaults without touching the advisor code. Each backs off via @ConditionalOnMissingBean, so registering your own bean is the only thing you need to do.
By default the starter reads tenant/workspace/app/etc. from cycles.* properties on every reservation, so every call from a given app gets the same Cycles Subject. Multi-tenant SaaS agents need per-request attribution. Register a SubjectResolver bean and the advisor calls it per request:
@Bean
public SubjectResolver tenantAwareSubjectResolver(CyclesProperties defaults) {
return request -> {
var auth = SecurityContextHolder.getContext().getAuthentication();
String tenant = (auth != null && auth.isAuthenticated()) ? auth.getName() : defaults.getTenant();
return Subject.builder()
.tenant(tenant)
.workspace(defaults.getWorkspace())
.app(defaults.getApp())
.build();
};
}The request parameter is the originating ChatClientRequest (or null on the tool-gating path — tool callbacks don't carry a request). Implementations should handle null defensively, typically by falling back to the property defaults.
Default is CharsPerTokenEstimator (the v0.2.0 chars / 4 heuristic). For tighter estimates:
Option 1: jtokkit (real OpenAI BPE encoding). Set the property:
cycles:
spring-ai:
estimate-from-prompt: true
input-cost-per-token: 250 # gpt-4o: $2.50/1M tokens × 100M microcents/USD = 250
output-cost-per-token: 1000 # gpt-4o: $10.00/1M tokens = 1000 microcents/token
token-estimator-encoding: cl100k_base # or o200k_base for gpt-4o familyAdd the jtokkit dep to your app pom (it's optional=true on this starter so it's not pulled transitively):
<dependency>
<groupId>com.knuddels</groupId>
<artifactId>jtokkit</artifactId>
<version>1.1.0</version>
</dependency>When the property is set without the dep on the classpath, the starter logs a WARN at app startup and falls back to chars/4 — you'll see the misconfig immediately, not at first call.
Option 2: custom bean. Register your own PromptTokenEstimator for provider-specific tokenizers or domain-aware heuristics:
@Bean
public PromptTokenEstimator anthropicTokenEstimator() {
return request -> /* count tokens using Anthropic's tokenizer */;
}The CyclesChatClientObservationConvention (Quick Start step 5) emits cycles.reservation_id as a high-cardinality KeyValue on every chat-client observation when applied. The advisor stores the reservation id in request.context() after a successful reserve; the convention reads it at observation-stop time. Disable via:
cycles:
spring-ai:
emit-reservation-id-on-trace: falseThe low-cardinality Cycles attribution tags (cycles.tenant, cycles.workspace, etc.) are always emitted by the convention regardless of this setting.
The two Java integrations are complementary, not competing — they target different layers of the same problem. This starter actually depends on cycles-spring-boot-starter and reuses its CyclesClient, CyclesProperties, and connection plumbing.
| Aspect | cycles-spring-boot-starter |
cycles-spring-ai-starter (this repo) |
|---|---|---|
| Maven artifact | io.runcycles:cycles-client-java-spring |
io.runcycles:cycles-spring-ai-starter |
| Integration mechanism | Spring AOP via @Cycles annotation |
Spring AI CallAdvisor + ChatClientCustomizer |
| Where it intercepts | Any Java method you annotate | Every chatClient.prompt(...).call() invocation |
| Granularity | Method-level, explicit opt-in | Framework-level, transparent |
| Call-site changes | Yes — annotate methods with @Cycles |
No — wired automatically |
| Estimate computation | SpEL: @Cycles("#tokens * 10") (dynamic per-call) |
default-estimate, or prompt-char × token-rate when estimate-from-prompt=true |
| Subject routing | SpEL: can pull tenant from method args | Property defaults, or per-call via a custom SubjectResolver bean (see Extension points) |
| Knows about LLMs? | No — generic | Yes — Spring AI ChatClient specific |
| Scope | Any cost-incurring Java code | Only Spring AI chat calls |
In one line: the Java/Spring starter is a method-level integration where you decide where to put the gates. This starter is a framework-level integration where every Spring AI call surface is gated transparently.
Use cycles-spring-boot-starter when:
- You call LLMs through code that is not Spring AI's
ChatClient— direct HTTP calls, custom OpenAI / Anthropic / Bedrock SDKs, LangChain4j, in-house wrappers, etc. - You want per-method dynamic estimates via SpEL (e.g.
@Cycles("#tokens * 10")where#tokensis a method arg). - You want per-method subject routing — extract tenant from a DTO, request context, or thread-local.
- You want explicit control over which methods are gated, not blanket coverage.
- You're cost-gating non-LLM operations: vector-store queries, document processing, third-party metered APIs.
- You're not using Spring AI at all.
Use cycles-spring-ai-starter (this repo) when:
- You're using Spring AI's
ChatClientas your LLM call surface. - You want transparent gating of every chat call without touching call sites.
- You want minimal integration friction — add the dep, set 6 properties, done. (Per-call estimates from prompt size are available via
estimate-from-prompt=true; for richer dynamic estimates use thecycles-spring-boot-starterSpEL surface.)
Use both when:
- You have a Spring AI app that also has non-Spring-AI cost-incurring code (e.g., a service method that runs a vector-store query and then a Spring AI chat call — the vector store has cost, the chat has cost).
- They wire on different conditions and don't conflict at the bean-wiring layer.
Because this starter declares a dependency on cycles-client-java-spring, the @Cycles annotation is always on your classpath when you use this starter — no need to explicitly add the other dependency to use both.
The two starters are designed to coexist, but you can accidentally double-charge if you wrap a Spring AI chat call inside an @Cycles-annotated method:
@Service
class SummaryService {
@Cycles("#tokens * 10") // ← Reservation #1 (AOP)
public String summarize(String text, int tokens) {
return chatClient.prompt() // ← Reservation #2 (Spring AI advisor)
.user(text)
.call()
.content();
}
}That method consumes budget twice for one user-perceivable operation. Both reservations charge against the same budget.
Rule of thumb: pick one strategy per call path.
| Your call path | Use |
|---|---|
Spring AI ChatClient.call() directly |
cycles-spring-ai-starter alone — don't also @Cycles the caller |
| LLM via a non-Spring-AI client | cycles-spring-boot-starter with @Cycles on the method |
| Non-LLM cost-incurring operation | cycles-spring-boot-starter with @Cycles on the method |
| Method that both does non-LLM work and a Spring AI chat call | Either @Cycles (charging once for the whole method) or let the Spring AI advisor handle just the chat part — not both |
- Pure Spring AI app: this starter alone. Transparent gating, no code changes.
- Pure non-Spring-AI Java/Spring app:
cycles-spring-boot-starterwith@Cycleson the methods that cost money. - Mixed: depend on this starter (you get the other transitively), use
@Cyclesfor non-LLM paths, let the Spring AI advisor handle Spring AI paths, and don't combine them on the same path.
cycles-spring-ai-starter/
├── cycles-spring-ai-starter/ ← the library (published to Maven Central)
└── cycles-spring-ai-demo/ ← a runnable demo app (not published)
mvn -B verify --file cycles-spring-ai-starter/pom.xml
mvn -B install --file cycles-spring-ai-starter/pom.xml -DskipTests
mvn -B verify --file cycles-spring-ai-demo/pom.xml(In Claude Code remote environments, use mvn-proxy instead of mvn — see CLAUDE.md.)
The project uses Maven CI-friendly versions via the ${revision} property, driven from .mvn/maven.config at the repo root. Both poms (starter + demo) declare <version>${revision}</version> and the demo's dep on the starter uses ${revision} as well, so a version bump is a single-line edit.
# .mvn/maven.config (single source of truth — applies to every mvn invocation)
-Drevision=X.Y.Z-SNAPSHOT
The inline <revision> defaults in each pom's <properties> block should be kept in lockstep with .mvn/maven.config so the flattened pom that ships to Maven Central doesn't carry stale <revision> metadata. (The flatten-maven-plugin's resolveCiFriendliesOnly mode resolves <version> but preserves the <properties> block as-is, so an IDE or build that bypasses .mvn/maven.config reads the inline default — and the published pom carries it verbatim.)
To cut a release (concrete example: cutting X.Y.Z from a X.Y.Z-SNAPSHOT dev branch):
- Edit
.mvn/maven.config:-Drevision=X.Y.Z-SNAPSHOT→-Drevision=X.Y.Z. Also bump the inline<revision>defaults in both poms toX.Y.Zto match. Commit and push tomain. - Create a GitHub Release for the new version (e.g. via
gh release create vX.Y.Z --generate-notesor the GitHub UI). Creating the release also creates the tag if it doesn't exist. The publish workflow triggers onrelease: [created]— pushing a bare tag does not trigger publishing, only the release event does. - The publish workflow checks
mvn help:evaluate -Dexpression=project.versionagainst the tag — both now readX.Y.Zfrom.mvn/maven.config, so the version-vs-tag gate passes and the artifact deploys to Maven Central. - After the release ships, bump
.mvn/maven.configand the inline pom<revision>defaults to the next SNAPSHOT (e.g.X.Y.Z+1-SNAPSHOTorX.Y+1.0-SNAPSHOT). Commit, push.
To test a release build without publishing (e.g. to verify GPG signing works on a new key): trigger the publish workflow via workflow_dispatch from the Actions tab. That runs the test-release-build job only — no deploy.
The flatten-maven-plugin (configured on both poms in resolveCiFriendliesOnly mode) substitutes ${revision} with the resolved value at process-resources and produces a .flattened-pom.xml that gets installed/deployed. Sonatype Central requires a literal version in the published pom; non-CI-friendly properties (BOM versions, etc.) remain as ${...} in the published pom and are interpolated against the same pom's <properties> block at consumer-resolve time — the standard behavior.
Apache 2.0 — see LICENSE.