viamrobotics · btshrewsbury-viam · Jul 1, 2026 · Jul 1, 2026 · Jul 1, 2026 · Jul 1, 2026
diff --git a/docs/ai-control/_index.md b/docs/ai-control/_index.md
@@ -0,0 +1,30 @@
+---
+linkTitle: "AI & learned control"
+title: "AI and learned control"
+weight: 45
+layout: "docs"
+type: "docs"
+no_list: true
+description: "Run learned policies, vision-language-action models, and LLM-driven task planning on a machine using modules and the Viam APIs."
+---
+
+Classic robot control is written by hand: a PID loop, a motion planner, a
+state machine. A growing class of applications instead runs a **learned
+model** in the loop, a reinforcement-learning policy, a vision-language-action
+(VLA) model, or a large language model that decomposes a goal into skills.
+
+On Viam these run the same way any custom capability does: you package the
+model in a [module](/build-modules/) that implements a component or service
+API, and your application talks to it through the standard APIs. This section
+explains how each kind of model fits that pattern.
+
+- [Inference latency and loop rate](inference-latency/): why a model in the
+  loop cannot run faster than its own inference time, and how to size it.
+- [Learned and policy-based control](learned-and-policy-control/): when a
+  trained policy beats a hand-written controller, and how it runs on a machine.
+- [Run a vision-language-action model](run-a-vla/): drive a robot from a camera
+  frame plus a language prompt.
+- [Integrate an LLM with a robot](integrate-an-llm/): use a language model to
+  plan tasks and dispatch robot skills, safely.
+- [Simulation and sim-to-real](simulation-and-sim-to-real/): develop and
+  validate a policy before it touches hardware.
diff --git a/docs/ai-control/inference-latency.md b/docs/ai-control/inference-latency.md
@@ -0,0 +1,118 @@
+---
+linkTitle: "Inference latency and loop rate"
+title: "Inference latency and loop rate"
+weight: 5
+layout: "docs"
+type: "docs"
+description: "Understand why model inference latency sets the ceiling on how fast a perception or control loop can run, and estimate an achievable loop rate from model size, image resolution, and hardware."
+aliases:
+  - "/concepts/inference-latency/"
+---
+
+Consider a small program that watches a camera and reacts to what it sees:
+
+```python
+while True:
+    detections = await detector.get_detections_from_camera("my-camera")
+    steer(detections)
+    await asyncio.sleep(0.05)  # aim for 20 Hz
+```
+
+The `sleep(0.05)` looks like it sets the pace: run 20 times per second.
+In practice the loop rate depends far more on the line above it.
+The call to [`GetDetectionsFromCamera`](/vision/) captures a frame, runs it through a machine learning model, and returns the results.
+That call is synchronous: it blocks until inference finishes.
+If the model takes 200&nbsp;ms to produce detections, one trip through the loop takes at least 200&nbsp;ms, and the loop runs near 5 Hz no matter what number you pass to `sleep`.
+
+This page explains why inference latency sets the ceiling on loop rate, and how to estimate that ceiling before you build a real-time task.
+
+## Why each iteration waits for inference
+
+A perception or control loop does its work one iteration at a time, in order.
+Each iteration acquires an input, computes on it, and acts on the result.
+When the compute step is a model inference call, the loop reaches that call and waits for a return value before it can act or start the next iteration.
+
+The wall-clock time of one iteration is the sum of its blocking steps.
+The single largest term is usually inference:
+
+- Acquiring a frame from a camera: a few milliseconds to tens of milliseconds.
+- Running inference on that frame: milliseconds to hundreds of milliseconds.
+- Acting on the result (sending a command to a motor or base): typically a few milliseconds.
+
+Your `sleep` only adds idle time on top of that sum.
+It can slow the loop down, but it cannot speed it up past the inference call.
+The achievable loop rate is therefore bounded by the slowest blocking step, roughly:
+
+```text
+max_loop_rate ≈ 1 / max(inference_time, actuation_time, frame_time)
+```
+
+In most vision loops the inference term dominates, so `max_loop_rate ≈ 1 / inference_time`.
+A 200&nbsp;ms model caps the loop near 5 Hz; a 20&nbsp;ms model allows up to about 50 Hz, before you add any deliberate `sleep`.
+
+## What determines inference time
+
+Inference time is a property of three things together: the model, the input, and the hardware.
+
+**Model size and architecture.**
+A larger model with more parameters and layers performs more arithmetic per frame.
+A compact detector aimed at edge devices runs much faster than a large, high-accuracy backbone.
+Quantized formats such as TFLite `int8` trade a small amount of accuracy for a substantial speedup.
+
+**Input resolution.**
+Compute grows with the number of pixels, which grows with the square of the linear resolution.
+Halving both image dimensions cuts pixel count to a quarter and often cuts inference time by a similar factor.
+Feeding a model a smaller frame is one of the cheapest ways to raise loop rate.
+
+**Hardware.**
+Where inference runs matters more than any other single factor.
+As rough orders of magnitude, for a typical object detector:
+
+| Where inference runs                                             | Rough per-frame latency     | Rough loop ceiling |
+| ---------------------------------------------------------------- | --------------------------- | ------------------ |
+| TFLite on a Raspberry Pi CPU                                     | ~150-500&nbsp;ms            | ~2-6 Hz            |
+| Same model on a coprocessor or small GPU (for example, a Jetson) | ~10-40&nbsp;ms              | ~25-100 Hz         |
+| Larger model on a desktop or server GPU                          | single-digit to ~20&nbsp;ms | ~50-200 Hz         |
+
+Treat these as illustrative ranges, not specifications.
+Actual numbers depend on the exact model, resolution, framework, and device, and the only reliable figure is one you measure on your own hardware.
+The pattern holds across cases: moving the same model from a general-purpose CPU to an accelerator built for tensor math changes latency by roughly an order of magnitude.
+
+## Why remote inference adds latency
+
+For an ML-backed [vision service](/vision/), the detection ultimately blocks on the ML model service's `Infer` method, and you can run that model locally on the machine or call one hosted elsewhere. (A heuristic detector such as `color_detector` runs no model and skips this cost.)
+Running inference on a remote or cloud server can give you access to hardware far more powerful than an edge device.
+That power comes with an added cost: every frame travels to the server and every result travels back.
+
+Remote inference latency is the sum of the network round trip and the server-side compute:
+
+```text
+remote_inference_time ≈ upload_time + server_inference_time + download_time
+```
+
+Uploading a full-resolution frame over a constrained or high-latency link can add tens to hundreds of milliseconds and can vary from frame to frame.
+For a monitoring task that reports a status every few seconds, that overhead is comfortably absorbed.
+For a control loop steering a moving base, a variable extra 100&nbsp;ms per iteration both lowers the loop rate and makes its timing less predictable.
+
+## Sizing a real-time task
+
+The tolerable loop rate follows from what the loop controls.
+
+**Real-time control** acts on the physical world, where staleness compounds.
+A base moving at 1 m/s travels 20 cm during a 200&nbsp;ms inference call, so at 5 Hz every decision is based on a frame already 20 cm out of date.
+For steering, obstacle avoidance, or closed-loop reaction, you generally want inference well under the physical time constant of the system, and you want that latency to be steady rather than bursty.
+Viam's feedback controllers expose their own cadence directly: a [sensor-controlled base](/reference/components/base/sensor-controlled/) accepts a `control_frequency_hz` value (default 10 Hz), and its movement sensors must report at least that fast for the loop to hold rate.
+
+**Monitoring and logging** consume detections rather than steering on them, so a loop running at 1 Hz, or slower, is often plenty.
+Here you can favor a larger, more accurate model or a remote GPU and accept the higher per-frame latency, because no actuator is waiting on the result.
+
+To size a task, work backward from the required rate.
+Decide the loop rate the application needs, invert it to get the latency budget per iteration, subtract the frame-capture and actuation time, and choose a model, resolution, and hardware whose measured inference time fits what remains.
+If nothing fits, you have three levers: shrink or quantize the model, lower the input resolution, or move inference to faster hardware.
+Because these levers trade accuracy and cost against speed, measuring inference time on the target device is the step that turns an estimate into a design you can rely on.
+
+## Next steps
+
+- Learn how detection and classification calls work in the [vision service](/vision/).
+- See how a feedback loop consumes sensor input at a fixed rate on a [sensor-controlled base](/reference/components/base/sensor-controlled/).
+- Explore the [components](/reference/components/) that a perception or control loop reads from and acts on.
diff --git a/docs/ai-control/integrate-an-llm.md b/docs/ai-control/integrate-an-llm.md
@@ -0,0 +1,131 @@
+---
+linkTitle: "Integrate an LLM"
+title: "Integrate an LLM with a robot"
+weight: 30
+layout: "docs"
+type: "docs"
+description: "Build a logic module that uses an LLM to turn a high-level goal into a validated sequence of robot skills, then dispatches them through the Viam APIs."
+---
+
+Give a robot a high-level goal in plain language, such as "clear the cups off the table," and have it carry out a bounded sequence of actions: drive to the table, pick up a cup, place it in the bin, repeat.
+A large language model (LLM) is well suited to decomposing that goal into steps.
+Your module calls the LLM to propose which robot skills to run and with what arguments, validates each proposed action, and then dispatches the approved actions through the Viam APIs.
+
+Viam does not host the LLM.
+Your module calls an external LLM provider (such as an OpenAI-compatible API) or a local model that you run yourself, using that provider's own SDK.
+Everything below is the code you write inside a [module](/build-modules/).
+
+{{% alert title="Safety first" color="caution" %}}
+An LLM produces unconstrained text.
+Keep a validation layer between the model's proposed action and any [Viam API](/reference/apis/) call.
+That layer admits only actions in a fixed allowlist, within fixed parameter bounds, so a malformed or out-of-range response never reaches an actuator.
+Steps 4 and 5 cover this validation and the timeouts and human-confirmation gates that back it up.
+{{% /alert %}}
+
+## Prerequisites
+
+- A machine with the components and services your robot uses (for example a [base](/reference/apis/components/base/) and an [arm](/reference/apis/components/arm/)), already [configured](/hardware/).
+- Credentials for an LLM provider, or a local model you can query.
+- Familiarity with [writing a module](/build-modules/).
+
+## Author the logic module
+
+1. **Scaffold a logic module.**
+   Generate a [generic component or service](/build-modules/) module in your language of choice.
+   A logic module holds no hardware of its own; it depends on the components and services it orchestrates and coordinates them.
+   Declare those resources as [dependencies](/build-modules/dependencies/) so your module receives clients for them at runtime.
+
+2. **Define a set of robot skills.**
+   A skill is a small function that wraps one or more [Viam API](/reference/apis/) calls into a named, self-contained action.
+   Give each skill a clear name, a short description, and a typed set of arguments.
+   Keep the surface small and the arguments bounded.
+
+   ```python {class="line-numbers linkable-line-numbers"}
+   async def drive_to(self, zone: str):
+       """Drive the base to a named zone in the workspace."""
+       pose = self.zones[zone]                     # look up a known target
+       await self.motion.move(...)                 # see the Motion API reference
+
+   async def pick_cup(self):
+       """Close the gripper on a cup at the current pose."""
+       await self.gripper.grab()
+
+   SKILLS = {
+       "drive_to": {"zones": list(self.zones)},    # allowed values
+       "pick_cup": {},
+   }
+   ```
+
+   The `SKILLS` table is the allowlist.
+   It is the single source of truth for what the robot can be asked to do and which arguments are legal.
+   For exact method signatures, see the [component and service APIs](/reference/apis/).
+
+## Prompt the LLM to choose a skill
+
+3. **Ask the model to select a skill and arguments.**
+   Send the goal and the skill definitions to your LLM provider using its function-calling (also called tool-use) API.
+   That style constrains the response to a structured choice: a skill name plus arguments, rather than free-form prose you would have to parse.
+
+   ```python {class="line-numbers linkable-line-numbers"}
+   response = llm_client.chat(
+       goal=goal,
+       tools=self.skill_schemas,   # your SKILLS table as the provider's tool schema
+   )
+   proposed = response.tool_call   # e.g. {"skill": "drive_to", "args": {"zone": "table"}}
+   ```
+
+   The provider returns a proposed action.
+   Treat it as a request, not a command: nothing runs until it passes validation.
+
+## Validate before executing
+
+4. **Check the proposed action against the allowlist and bounds.**
+   Run this check on every proposed action, before any API call.
+   This is the guardrail that keeps an LLM from issuing an unsafe actuator command.
+
+   ```python {class="line-numbers linkable-line-numbers"}
+   def validate(self, proposed):
+       skill = proposed["skill"]
+       if skill not in self.SKILLS:                 # reject unknown skills
+           raise ValueError(f"unknown skill: {skill}")
+       for name, value in proposed["args"].items():
+           allowed = self.SKILLS[skill].get(name)
+           if allowed is not None and value not in allowed:
+               raise ValueError(f"{name}={value} out of bounds")
+       return proposed
+   ```
+
+   Each rule maps to a specific failure it prevents:
+
+   - The **allowlist** check admits only skills you wrote and tested, so a hallucinated or misspelled action name never becomes an API call.
+   - The **parameter-bounds** check confirms every argument falls in a legal range, so an out-of-range value, such as a velocity above your safe cap, is refused before it reaches a motor.
+
+   If validation fails, log the rejected action and either stop or ask the model to try again.
+   A rejected action costs nothing; an unchecked one can move hardware.
+
+## Execute through the Viam APIs
+
+5. **Dispatch the validated action, with timeouts and optional confirmation.**
+   Only actions that pass step 4 reach the Viam APIs.
+   Two further guardrails bound what execution can do:
+
+   - **Timeouts** bound each dispatched action in time, so a stalled or long-running motion returns control to your module instead of running unattended.
+   - **Human confirmation** gates high-consequence skills, such as moving an arm near a person, on an explicit approval before the action runs.
+
+   ```python {class="line-numbers linkable-line-numbers"}
+   async def execute(self, action):
+       action = self.validate(action)                      # never skip this
+       if action["skill"] in self.CONFIRM_REQUIRED:
+           if not await self.confirm(action):              # await a person's approval
+               return
+       async with asyncio.timeout(self.step_timeout_s):    # bound the action
+           await self.skills[action["skill"]](**action["args"])
+   ```
+
+   Loop steps 3 through 5 until the goal is met or a step fails, feeding the outcome of each action back to the model as context for the next choice.
+
+## Next steps
+
+- For a worked example of driving a robot from an LLM, see [Integrate Viam with ChatGPT](/tutorials/projects/integrating-viam-with-openai/).
+- To package and deploy your logic module, see [Build and deploy modules](/build-modules/).
+- To review the exact methods your skills call, see the [component and service APIs](/reference/apis/).