Skip to content

python sdk multimodal#9

Merged
pint-drinker merged 18 commits into
mainfrom
subcon-468-full-multimodal-support-images
Apr 23, 2026
Merged

python sdk multimodal#9
pint-drinker merged 18 commits into
mainfrom
subcon-468-full-multimodal-support-images

Conversation

@pint-drinker
Copy link
Copy Markdown
Contributor

@pint-drinker pint-drinker commented Apr 20, 2026

Summary

This PR ships the full multimodal surface for the Python SDK and bumps the package to 1.0.0. It's a breaking change — response types are overhauled to match the API wire format, Python 3.8/3.9 are dropped, and pydantic becomes a hard dependency.


What changed and why

Multimodal input (Image helper + ContentBlock types)

New: Image.from_path(), Image.from_bytes(), Image.from_url(), Image.from_blob_ref() in subconscious/content.py. Users can now pass images alongside text instructions in a content: list[ContentBlock] field on RunInput.

Why: SUBCON-468. The API has accepted multimodal content blocks for a while; the SDK didn't expose them. This surfaces the full TextContent | ImageContent union with all three image source kinds (base64, url, blob_ref) and enforces MIME validation client-side.

ToolResponse type

New: ToolResponse.build(tool_call_id, content) lets function-tool endpoints return images alongside text in a single normalized envelope.

Why: Tools that return screenshots, charts, or captured images previously had no clean path. This unblocks visual tool results end-to-end.

Response models → Pydantic BaseModel (breaking)

Run, RunResult, Usage, RunError, ReasoningTask, AgentToolUse are now Pydantic BaseModel instead of @dataclass. Field aliases handle the camelCase ↔ snake_case mapping automatically — _parse_run is now just Run.model_validate(data).

Why: The old dataclass construction in _parse_run was manual and fragile — it silently dropped any field the API added. Pydantic aliases give us automatic wire-format handling, validation at the boundary, and a natural path to strict mode.

Breaking: isinstance(run, BaseModel) not is_dataclass(run). Construction syntax is unchanged.

Usage flattened (breaking)

Usage.models and Usage.platform_tools are removed. New fields: input_tokens, output_tokens, duration_ms — matching the actual RunUsage shape in the monorepo.

Why: The old nested ModelUsage/PlatformToolUsage lists never matched what the API returned. Any code accessing run.usage.models[0].input_tokens was already broken silently.

ReasoningNodeReasoningTask (breaking)

ReasoningNode is deleted. ReasoningTask replaces it. subtask renamed to subtasks. tooluse is now Optional[AgentToolUse] (typed) instead of List[Any].

Why: Aligns with the monorepo ReasoningTask schema. The old name and shape were leftovers from an earlier design.

RunResult.reasoning is now list[ReasoningTask] | None (breaking)

Was Optional[ReasoningNode] (a single node). Now a list.

Why: The API always returns a list at the top level. The old model forced callers to treat a tree root specially.

Run.error — new field

RunError model with code and message is now populated on failed runs.

Why: Previously a failed run had status="failed" with no detail. This surfaces the error without requiring a separate fetch.

Engine literal expanded

Added: "tim", "tim-edge", "tim-oss-local", "tim-1.5", "tim-gpt-heavy-tc".

Why: These engines are live in the monorepo and were missing from the type.

Request wire-format (CreateRunBody + RunInputWire)

New CreateRunBody.build(engine, input).to_dict() centralizes all payload construction for both run() and stream(). Built-in 5 MB size check raises RequestTooLargeError before the request goes out.

Why: Payload construction was duplicated between run() and stream() with diverging logic. Consolidating into Pydantic models validates structure, enforces the size limit, and normalizes tools/content in one place.

Python 3.10+ required (breaking)

requires-python = ">=3.10". CI drops 3.8 and 3.9 test matrix rows, adds 3.13.

Why: X | Y union syntax and built-in generics (dict[str, Any], list[T]) are used throughout. Backporting to 3.8 would require from __future__ import annotations everywhere and Optional[X] / Union[X, Y] in all signatures — net negative for readability.

Tooling: ruff replaces black + isort, pre-commit added

Why: Faster, single tool, consistent with other repos in the workspace. Pre-commit hooks enforce formatting before commit and in CI.

pydantic>=2.0.0 added as a hard dependency

Why: Required by the new Pydantic response models and ToolResponse. Previously it was an assumed-present optional dep for structured output users.


Breaking changes at a glance

Symbol Change
ModelUsage Removed
PlatformToolUsage Removed
ReasoningNode Removed → use ReasoningTask
run.usage.models / .platform_tools Removed → use run.usage.input_tokens / .output_tokens
ReasoningTask.subtask Renamed.subtasks
run.result.reasoning Type changed — `list[ReasoningTask]
Run, RunResult, Usage, etc. @dataclassBaseModel
requires-python >=3.8>=3.10
pydantic Now a required dependency

Full migration guide with before/after code: .cursor/skills/sdk-migration/SKILL.md


Things to check before merging

Examples (subconscious/ repo)

  • Any example using run.usage.models or run.usage.platform_tools will AttributeError at runtime
  • Any example importing ReasoningNode, ModelUsage, or PlatformToolUsage will fail at import
  • Examples using run.result.reasoning.title (treating reasoning as a single node) need to move to run.result.reasoning[0].title or iteration

Customer projects that may break

  • Customers on Python 3.8 or 3.9 will get a hard install failure — confirm no known customers are on older Pythons before shipping to PyPI
  • run.usage access patterns: the flat model is a silent behavior change for any code that was iterating .models
  • isinstance(run, Run) still works; is_dataclass(run) will now be False

Internal — Co-op and other product surfaces

  • Any internal code constructing Usage(models=[...], platform_tools=[...]) will fail — update to Usage(input_tokens=..., output_tokens=...)
  • Any internal code calling ReasoningNode(...) directly needs to migrate to ReasoningTask
  • Co-op's reasoning display: if it iterates a single .reasoning node's .subtask list, move to .subtasks on the new list-based model
  • Check any internal dashboards or observability code that parses raw usage payloads from run responses

Docs (subconscious-docs/)

  • Usage section references run.usage.models — update to run.usage.input_tokens
  • Reasoning section references ReasoningNode — update to ReasoningTask + list shape
  • Add multimodal quickstart showing Image.from_path()
  • Note pydantic dependency and Python 3.10+ requirement in install instructions

Migration guide (quick reference)

# Install
pip install "subconscious-sdk>=1.0.0"
# pydantic is now a required dep — no separate install needed

# Usage — was
run.usage.models[0].input_tokens   # broken / wrong shape
# now
run.usage.input_tokens
run.usage.output_tokens
run.usage.duration_ms              # Optional[int]

# Reasoning — was
from subconscious import ReasoningNode
node = run.result.reasoning        # single node
node.subtask                       # list

# now
from subconscious import ReasoningTask
tasks = run.result.reasoning or [] # list
tasks[0].subtasks                  # Optional[List[ReasoningTask]]
tasks[0].tooluse.tool_name         # typed AgentToolUse

# Failed runs — new
if run.status == "failed" and run.error:
    print(f"{run.error.code}: {run.error.message}")

# Multimodal — new
from subconscious import Image, RunInput
client.run(
    engine="tim-claude",
    input=RunInput(
        instructions="What's in this image?",
        content=[Image.from_path("screenshot.png")],
    ),
)

# Tool responses with images — new
from subconscious import ToolResponse, Image
return ToolResponse.build(tool_call_id, [
    "Here's the screenshot:",
    Image.from_path("result.png"),
])

pint-drinker and others added 12 commits April 17, 2026 10:56
…ypes (SUBCON-468)

Generated Pydantic schemas from monorepo packages/schemas/ JSON Schema
source of truth. Adds Image helper (from_path/bytes/url/blob_ref), extends
RunInput.content with canonical ContentBlock list, client-side capability
check + 5MB size guard, densify_trace async helper for post-training data
collection with bounded concurrency + streaming JSONL writer.

MM-11 subconscious._schemas/ + Image helper + content types
MM-12 client serialization + capability snapshot + RequestTooLargeError
MM-13 densify_trace + python -m subconscious densify CLI

Min Python bumped to 3.10 (datamodel-code-generator requires it).
pydantic>=2.0.0 added as dep. 33 tests green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@pint-drinker pint-drinker marked this pull request as ready for review April 22, 2026 12:30
Comment thread subconscious/types.py

# Run status types
RunStatus = Literal["queued", "running", "succeeded", "failed", "canceled", "timed_out"]
RunStatus = Literal['queued', 'running', 'succeeded', 'failed', 'canceled', 'timed_out']
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we use timed_out? If not let's delete and count this as a version of failure.

Comment thread subconscious/types.py Outdated
Comment on lines +12 to +20
Engine = Literal[
'tim',
'tim-edge',
'tim-claude',
'tim-claude-heavy',
'tim-oss-local',
'tim-1.5',
'tim-gpt-heavy-tc',
]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See slack, I'd imagine this will change often so not high priority

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only models we should publicly support are:

tim-1.5 (default)
tim-claude
tim-claude-heavy
tim (offline)
tim-edge (offline)

Remove from SDK and platform

tim-gpt
tim-gpt-heavy
tim-gpt-heavy-tc
tim-oss-local

Comment thread subconscious/types.py

answer: str = ''
reasoning: list[ReasoningTask] | None = None
parsed_answer: Any = None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a cleaner way to handle answer_format. As long as Hongyin signs off this looks good

Comment thread subconscious/types.py
answer_format: Optional[OutputSchema] = None
tools: list[Tool] = field(default_factory=list)
resources: list[str] | None = None
skills: list[str] | None = None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the skills plan here? is this the complete skill dumped into the request? Or just the headline? How is the rest retrieved?

I think you can leave skills out for the moment, and we can handle this appropriately in the future. If you feel strongly your call

Comment thread subconscious/types.py
"""JSON Schema for the answer output format. Use pydantic_to_schema() to generate from Pydantic."""
reasoning_format: Optional[OutputSchema] = None
"""JSON Schema for the reasoning output format. Use pydantic_to_schema() to generate from Pydantic."""
content: list[ContentBlock] | None = None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great

@pint-drinker pint-drinker merged commit b2ecb17 into main Apr 23, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants