Skip to content

SFF-065: Context window-aware model policy #77

@jmrpineda

Description

@jmrpineda

Reported by: repository user
Requested by: repository user
Priority: P1
Affected surfaces: model catalog, configuration, orchestration safeguards, portal guidance
Constraints: context-window data may vary by provider and model version; the policy must remain configurable rather than hard-coded

Summary

SpecForge should discover or maintain the effective context window for configured models so the product can anticipate prompt-size risk before requests exceed model limits and react according to an explicit policy.

Problem / opportunity

Without reliable context-window knowledge, the product can only discover over-limit situations too late, after prompt assembly has already failed or degraded. If SpecForge knows the limit ahead of time, it can warn early, recommend a larger model, compact or summarize context, or intentionally stop before producing a bad request.

Requested behavior

The system should determine the usable context window for the models it supports, expose that capability to the runtime, and apply a configurable policy whenever a request is predicted to exceed or dangerously approach the limit.

Scope

  • In scope: collecting or resolving model context-window limits, estimating request size before execution, warning when the current model is insufficient, and adding configuration for the overflow-handling policy.
  • Out of scope: silently rewriting all prompts without operator visibility, or assuming a single strategy works for every provider and workflow.

Acceptance criteria

  1. SpecForge can resolve or store context-window limits for the models it uses in a way the runtime can query before execution.
  2. When a request is predicted to exceed or dangerously approach the configured model limit, the product can either recommend a larger model or apply a configured mitigation strategy.
  3. Configuration exposes an explicit policy choice for overflow handling, including alternatives such as compacting, summarizing, or stopping with a blocking error.

Notes

  • This likely needs both provider/model metadata support and runtime token-estimation support.
  • The policy decision should be operator-configurable because some flows prefer fidelity, while others prefer continuity.
  • Related blocking bug: model configuration validity should already prevent running with incomplete linked models.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions