From c54900488ff7fbaef388cdbdd0a58d7d7b029ae3 Mon Sep 17 00:00:00 2001 From: Brandon Shrewsbury Date: Wed, 1 Jul 2026 13:02:37 -0600 Subject: [PATCH 01/10] Add AI/control, navigation, concepts, and manipulation sections Fills concept gaps surfaced by a use-case coverage analysis (Playbook 11): learned/policy control, VLA and LLM integration, simulation; localization, SLAM, base navigation, sensor fusion, multi-robot coordination; the platform mental model, confidence scores, inference latency, capture frequency; and force control and moving-object picking. 19 new pages across 4 sections. Passes prettier, markdownlint, and vale. --- docs/ai-control/_index.md | 28 +++ docs/ai-control/integrate-an-llm.md | 131 +++++++++++++ docs/ai-control/learned-and-policy-control.md | 128 +++++++++++++ docs/ai-control/run-a-vla.md | 181 ++++++++++++++++++ docs/ai-control/simulation-and-sim-to-real.md | 128 +++++++++++++ docs/concepts/_index.md | 23 +++ docs/concepts/capture-frequency.md | 83 ++++++++ docs/concepts/confidence-scores.md | 52 +++++ docs/concepts/inference-latency.md | 116 +++++++++++ docs/concepts/platform-model.md | 97 ++++++++++ docs/manipulation/_index.md | 22 +++ .../force-and-compliance-control.md | 130 +++++++++++++ .../track-and-pick-moving-objects.md | 162 ++++++++++++++++ docs/navigation/_index.md | 24 +++ docs/navigation/coordinate-a-fleet.md | 111 +++++++++++ docs/navigation/localization.md | 89 +++++++++ docs/navigation/navigate-a-mobile-base.md | 116 +++++++++++ docs/navigation/sensor-fusion.md | 85 ++++++++ docs/navigation/slam-and-mapping.md | 99 ++++++++++ 19 files changed, 1805 insertions(+) create mode 100644 docs/ai-control/_index.md create mode 100644 docs/ai-control/integrate-an-llm.md create mode 100644 docs/ai-control/learned-and-policy-control.md create mode 100644 docs/ai-control/run-a-vla.md create mode 100644 docs/ai-control/simulation-and-sim-to-real.md create mode 100644 docs/concepts/_index.md create mode 100644 docs/concepts/capture-frequency.md create mode 100644 docs/concepts/confidence-scores.md create mode 100644 docs/concepts/inference-latency.md create mode 100644 docs/concepts/platform-model.md create mode 100644 docs/manipulation/_index.md create mode 100644 docs/manipulation/force-and-compliance-control.md create mode 100644 docs/manipulation/track-and-pick-moving-objects.md create mode 100644 docs/navigation/_index.md create mode 100644 docs/navigation/coordinate-a-fleet.md create mode 100644 docs/navigation/localization.md create mode 100644 docs/navigation/navigate-a-mobile-base.md create mode 100644 docs/navigation/sensor-fusion.md create mode 100644 docs/navigation/slam-and-mapping.md diff --git a/docs/ai-control/_index.md b/docs/ai-control/_index.md new file mode 100644 index 0000000000..19d4b52d01 --- /dev/null +++ b/docs/ai-control/_index.md @@ -0,0 +1,28 @@ +--- +linkTitle: "AI & learned control" +title: "AI and learned control" +weight: 45 +layout: "docs" +type: "docs" +no_list: true +description: "Run learned policies, vision-language-action models, and LLM-driven task planning on a machine using modules and the Viam APIs." +--- + +Classic robot control is written by hand: a PID loop, a motion planner, a +state machine. A growing class of applications instead runs a **learned +model** in the loop, a reinforcement-learning policy, a vision-language-action +(VLA) model, or a large language model that decomposes a goal into skills. + +On Viam these run the same way any custom capability does: you package the +model in a [module](/build-modules/) that implements a component or service +API, and your application talks to it through the standard APIs. This section +explains how each kind of model fits that pattern. + +- [Learned and policy-based control](learned-and-policy-control/): when a + trained policy beats a hand-written controller, and how it runs on a machine. +- [Run a vision-language-action model](run-a-vla/): drive a robot from a camera + frame plus a language prompt. +- [Integrate an LLM with a robot](integrate-an-llm/): use a language model to + plan tasks and dispatch robot skills, safely. +- [Simulation and sim-to-real](simulation-and-sim-to-real/): develop and + validate a policy before it touches hardware. diff --git a/docs/ai-control/integrate-an-llm.md b/docs/ai-control/integrate-an-llm.md new file mode 100644 index 0000000000..32e9f59769 --- /dev/null +++ b/docs/ai-control/integrate-an-llm.md @@ -0,0 +1,131 @@ +--- +linkTitle: "Integrate an LLM" +title: "Integrate an LLM with a robot" +weight: 30 +layout: "docs" +type: "docs" +description: "Build a logic module that uses an LLM to turn a high-level goal into a validated sequence of robot skills, then dispatches them through the Viam APIs." +--- + +Give a robot a high-level goal in plain language, such as "clear the cups off the table," and have it carry out a bounded sequence of actions: drive to the table, pick up a cup, place it in the bin, repeat. +A large language model (LLM) is well suited to decomposing that goal into steps. +Your module calls the LLM to propose which robot skills to run and with what arguments, validates each proposed action, and then dispatches the approved actions through the Viam APIs. + +Viam does not host the LLM. +Your module calls an external LLM provider (such as an OpenAI-compatible API) or a local model that you run yourself, using that provider's own SDK. +Everything below is the code you write inside a [module](/build-modules/). + +{{% alert title="Safety first" color="caution" %}} +An LLM produces unconstrained text. +Keep a validation layer between the model's proposed action and any [Viam API](/reference/apis/) call. +That layer admits only actions in a fixed allowlist, within fixed parameter bounds, so a malformed or out-of-range response never reaches an actuator. +Steps 4 and 5 cover this validation and the timeouts and human-confirmation gates that back it up. +{{% /alert %}} + +## Prerequisites + +- A machine with the components and services your robot uses (for example a [base](/dev/reference/apis/components/base/) and an [arm](/dev/reference/apis/components/arm/)), already [configured](/operate/get-started/supported-hardware/). +- Credentials for an LLM provider, or a local model you can query. +- Familiarity with [writing a module](/build-modules/). + +## Author the logic module + +1. **Scaffold a logic module.** + Generate a [generic component or service](/build-modules/) module in your language of choice. + A logic module holds no hardware of its own; it depends on the components and services it orchestrates and coordinates them. + Declare those resources as [dependencies](/build-modules/dependencies/) so your module receives clients for them at runtime. + +2. **Define a set of robot skills.** + A skill is a small function that wraps one or more [Viam API](/reference/apis/) calls into a named, self-contained action. + Give each skill a clear name, a short description, and a typed set of arguments. + Keep the surface small and the arguments bounded. + + ```python {class="line-numbers linkable-line-numbers"} + async def drive_to(self, zone: str): + """Drive the base to a named zone in the workspace.""" + pose = self.zones[zone] # look up a known target + await self.motion.move(...) # see the Motion API reference + + async def pick_cup(self): + """Close the gripper on a cup at the current pose.""" + await self.gripper.grab() + + SKILLS = { + "drive_to": {"zones": list(self.zones)}, # allowed values + "pick_cup": {}, + } + ``` + + The `SKILLS` table is the allowlist. + It is the single source of truth for what the robot can be asked to do and which arguments are legal. + For exact method signatures, see the [component and service APIs](/reference/apis/). + +## Prompt the LLM to choose a skill + +3. **Ask the model to select a skill and arguments.** + Send the goal and the skill definitions to your LLM provider using its function-calling (also called tool-use) API. + That style constrains the response to a structured choice: a skill name plus arguments, rather than free-form prose you would have to parse. + + ```python {class="line-numbers linkable-line-numbers"} + response = llm_client.chat( + goal=goal, + tools=self.skill_schemas, # your SKILLS table as the provider's tool schema + ) + proposed = response.tool_call # e.g. {"skill": "drive_to", "args": {"zone": "table"}} + ``` + + The provider returns a proposed action. + Treat it as a request, not a command: nothing runs until it passes validation. + +## Validate before executing + +4. **Check the proposed action against the allowlist and bounds.** + Run this check on every proposed action, before any API call. + This is the guardrail that keeps an LLM from issuing an unsafe actuator command. + + ```python {class="line-numbers linkable-line-numbers"} + def validate(self, proposed): + skill = proposed["skill"] + if skill not in self.SKILLS: # reject unknown skills + raise ValueError(f"unknown skill: {skill}") + for name, value in proposed["args"].items(): + allowed = self.SKILLS[skill].get(name) + if allowed is not None and value not in allowed: + raise ValueError(f"{name}={value} out of bounds") + return proposed + ``` + + Each rule maps to a specific failure it prevents: + + - The **allowlist** check admits only skills you wrote and tested, so a hallucinated or misspelled action name never becomes an API call. + - The **parameter-bounds** check confirms every argument falls in a legal range, so an out-of-range value, such as a velocity above your safe cap, is refused before it reaches a motor. + + If validation fails, log the rejected action and either stop or ask the model to try again. + A rejected action costs nothing; an unchecked one can move hardware. + +## Execute through the Viam APIs + +5. **Dispatch the validated action, with timeouts and optional confirmation.** + Only actions that pass step 4 reach the Viam APIs. + Two further guardrails bound what execution can do: + + - **Timeouts** bound each dispatched action in time, so a stalled or long-running motion returns control to your module instead of running unattended. + - **Human confirmation** gates high-consequence skills, such as moving an arm near a person, on an explicit approval before the action runs. + + ```python {class="line-numbers linkable-line-numbers"} + async def execute(self, action): + action = self.validate(action) # never skip this + if action["skill"] in self.CONFIRM_REQUIRED: + if not await self.confirm(action): # await a person's approval + return + async with asyncio.timeout(self.step_timeout_s): # bound the action + await self.skills[action["skill"]](**action["args"]) + ``` + + Loop steps 3 through 5 until the goal is met or a step fails, feeding the outcome of each action back to the model as context for the next choice. + +## Next steps + +- For a worked example of driving a robot from an LLM, see [Integrate Viam with ChatGPT](/tutorials/projects/integrating-viam-with-openai/). +- To package and deploy your logic module, see [Build and deploy modules](/build-modules/). +- To review the exact methods your skills call, see the [component and service APIs](/reference/apis/). diff --git a/docs/ai-control/learned-and-policy-control.md b/docs/ai-control/learned-and-policy-control.md new file mode 100644 index 0000000000..a9c5f2b97e --- /dev/null +++ b/docs/ai-control/learned-and-policy-control.md @@ -0,0 +1,128 @@ +--- +linkTitle: "Learned and policy-based control" +title: "Learned and policy-based control" +weight: 10 +layout: "docs" +type: "docs" +description: "Understand when a trained control policy beats a hand-written PID or motion planner, how you package a policy as a module, and the real-time constraints it must meet on hardware." +--- + +Consider a quadruped that needs to trot across loose gravel, or a gripper that +has to pick up a crumpled cloth from a camera image. The dynamics are hard to +write down: contact with the ground is intermittent, the cloth deforms as you +touch it, and the "right" motor command depends on subtle features of what the +sensors currently see. A hand-tuned controller can struggle here, because there +is no clean equation from sensor reading to motor command that a person can +author directly. + +A learned control policy takes a different route. Instead of encoding the rule +by hand, you train a function that maps observations to actions and let that +function drive the machine. This page explains what such a policy is, how it +relates to the built-in control tools Viam already provides, and what it takes +to run one on real hardware. + +## What a control policy is + +A control policy is a function. It reads an observation, the current state as +the machine perceives it (joint angles, an IMU reading, a camera frame, a goal), +and returns an action (target joint torques, wheel velocities, a gripper +command). At runtime the policy runs inside a loop: observe, decide, act, repeat, +usually at a fixed rate. + +Two families of methods produce these policies: + +- **Reinforcement learning (RL)** trains a policy by repeated trial against a + reward signal, most often in simulation. The policy explores actions, and + behavior that earns reward becomes more likely. RL suits problems where good + behavior is easy to score but hard to demonstrate, such as a stable gait. +- **Imitation learning** trains a policy to reproduce demonstrations, for + example teleoperated grasps recorded from an expert. It suits problems where + you can show the desired behavior more easily than you can define a reward. + +In both cases the output is the same kind of artifact: a trained model that +turns observations into actions. + +## When a learned policy is warranted, and when it is not + +Viam ships a mature classical control stack. The +[controls package](/reference/controls-package/) provides PID control for +regulating a single quantity toward a setpoint, and the motion service plans +collision-free paths for arms and mobile bases. These tools are predictable, +require no training data, and are the right default for most tasks. + +A PID loop or a motion plan is the better choice when the task has a clear model: +holding a motor at a target speed, driving to a pose in a known map, or moving an +arm through free space. These controllers are cheap to configure, easy to reason +about, and behave consistently. + +A learned policy earns its cost when the mapping from perception to action +resists hand authoring: + +- **Rich, high-dimensional observations.** Policies that act directly on camera + images (visuomotor control) can learn features that are impractical to + hand-engineer. +- **Contact-rich or deformable dynamics.** Legged locomotion, in-hand + manipulation, and grasping soft objects involve dynamics that are hard to + model in closed form. +- **Behavior that is easier to demonstrate or reward than to specify.** If you + can show the task or score it, but cannot write the rule, learning fills the + gap. + +The trade-off is real. A learned policy needs training data or a simulator, +compute to train, and careful validation, and it generalizes only as far as its +training distribution. When a classical controller already does the job, prefer +it. + +## How a policy runs on Viam + +Viam does not include a built-in reinforcement learning trainer. Training happens +in your own stack, typically a simulator plus an ML framework. What Viam provides +is the deployment and integration layer: a clean way to run your trained policy +against real hardware. + +You deploy a policy as a [custom module](/build-modules/). The module loads your +trained model and runs its own control loop: + +1. It reads observations through component APIs, for example camera frames from a + camera, joint positions from an arm, or orientation from a movement sensor. +2. It runs the observation through the policy to compute an action. +3. It commands components through their APIs, for example setting motor power or + arm joint positions. + +Because the module talks to hardware through the same component APIs that the +rest of Viam uses, the policy is portable across machines that expose those +components, and it composes with everything else in the configuration: data +capture, other services, and remote control. Your training pipeline stays in +your own environment; the module is the bridge that carries the result onto the +machine. + +## The real-time constraint + +A control loop runs at a rate, for example 50 Hz for a walking gait, which gives +the policy a fixed budget per cycle, 20 ms at 50 Hz. Everything in one iteration, +reading sensors, running inference, and sending commands, must fit inside that +budget. If it does not, the loop slows down or skips cycles, and a controller +that was stable in simulation can oscillate or fall over on hardware. + +Inference latency, the time to run one forward pass of the model, is usually the +largest and most variable part of that budget. It depends on model size, the +compute available on the machine, and whether the model runs on CPU, GPU, or an +accelerator. Before you commit a policy to hardware, measure its worst-case +inference time on the target device and confirm it leaves room for sensor reads +and actuation within the loop period. See +[inference latency](/concepts/inference-latency/) for how to reason about this +budget. + +This constraint often shapes the policy itself. A smaller or quantized model that +meets the loop rate can control the machine better than a larger, more accurate +model that cannot keep up, because a control policy that misses its deadline is +not really controlling in real time. + +## Next steps + +- Learn how to package and deploy code on a machine in + [Build modules](/build-modules/). +- Understand the timing budget in [inference latency](/concepts/inference-latency/). +- Review the classical baseline in the + [controls package](/reference/controls-package/) before reaching for a learned + policy. diff --git a/docs/ai-control/run-a-vla.md b/docs/ai-control/run-a-vla.md new file mode 100644 index 0000000000..885fa1da53 --- /dev/null +++ b/docs/ai-control/run-a-vla.md @@ -0,0 +1,181 @@ +--- +linkTitle: "Run a VLA model" +title: "Run a vision-language-action model" +weight: 20 +layout: "docs" +type: "docs" +description: "Build a control loop that feeds a camera frame and a language prompt to a vision-language-action model, then maps the model's output to arm, base, or gripper commands with the Viam APIs." +--- + +A vision-language-action (VLA) model takes an image and a natural-language +instruction, such as "pick up the red block," and returns an action for a robot +to take. +Viam does not ship a built-in VLA model. +Instead, you bring your own model, either an open-weights VLA you run yourself or +a hosted foundation-model API, and connect it to your hardware through Viam. + +This page shows how to assemble the control loop: capture a camera frame, send +the frame and a prompt to the model, and translate the model's output into +component commands. +The reader is expected to know the [Viam basics](/build-modules/) and to have a +machine with a camera and at least one actuator, such as an arm, base, or +gripper, already configured. + +A related capability is **open-vocabulary** (or **zero-shot**) detection: a +vision model detects objects named by a text prompt, such as "coffee mug," with +no task-specific training. +You can run open-vocabulary detection through the +[vision service](/reference/services/vision/) and use its bounding boxes +either as a standalone perception step or as an input to the VLA loop below. + +## Prerequisites + +{{% expand "A configured machine with a camera and an actuator" %}} + +See [Supported hardware](/hardware/) to add a camera and an arm, base, or +gripper. + +{{% /expand %}} + +## Steps + +### 1. Choose where the model runs + +The model runs in one of two places, and the choice sets your control rate: + +- **On the edge**, in a module on the machine or on a nearby GPU host. Edge + inference avoids a network round trip, so it suits fast loops such as + closed-loop base or arm control. Larger VLA models need a capable local GPU. +- **In the cloud**, behind a hosted foundation-model API. A cloud API gives you + access to large models without local GPU hardware, at the cost of network + latency on every call. + +Weigh latency against model size for your target action rate: a 1 Hz "observe, +plan, act" loop tolerates cloud latency, while a 10 Hz visual servoing loop +needs edge inference. +For a fuller treatment of this tradeoff, see +[Inference latency](/concepts/inference-latency/). + +### 2. Wrap the model as a module, or call a hosted API + +Package the model so your control code can call it through a stable interface: + +- **Self-hosted model:** build a [module](/build-modules/) that runs the model. + If the model returns tensors, implement it as an + [ML model service](/train/deploy-a-model/). If it returns structured actions or + text, implement a + [generic service](/reference/services/generic/) and expose the model + through [`DoCommand`](/reference/apis/services/generic/#docommand). +- **Hosted API:** call the provider's API directly from your control code, or + wrap that call in a generic service so the rest of your system stays + provider-agnostic. + +A generic service keeps the model behind one method: + +```python {class="line-numbers linkable-line-numbers"} +from viam.services.generic import Generic + +vla = Generic.from_robot(machine, "vla-model") + +# Send an image and a prompt, receive a structured action. +result = await vla.do_command({ + "image": encoded_frame, # base64 or bytes, per your module's contract + "prompt": "pick up the red block", +}) +action = result["action"] # your module defines this shape +``` + +### 3. Capture a camera frame + +Read the current frame from the [camera API](/reference/apis/components/camera/): + +```python {class="line-numbers linkable-line-numbers"} +from viam.components.camera import Camera + +camera = Camera.from_robot(machine, "camera") +frame = await camera.get_image() +``` + +Encode the frame in whatever format your module or API expects, such as JPEG +bytes or a base64 string. + +### 4. Pass the frame and prompt to the model + +Send the encoded frame together with the language instruction, using the call +from step 2. +Keep the prompt specific and stable across the loop so the model's output stays +consistent: + +```python {class="line-numbers linkable-line-numbers"} +result = await vla.do_command({ + "image": encoded_frame, + "prompt": "move the gripper above the red block and grasp it", +}) +action = result["action"] +``` + +### 5. Map the model output to a component command + +Define the mapping from the model's output to a Viam API call. The shape of the +output depends on your model, so decide on a contract and translate it +explicitly. + +For an [arm](/reference/apis/components/arm/), move to a target pose: + +```python {class="line-numbers linkable-line-numbers"} +from viam.components.arm import Arm +from viam.proto.common import Pose + +arm = Arm.from_robot(machine, "arm") +p = action["pose"] # your model output +target = Pose(x=p["x"], y=p["y"], z=p["z"], + o_x=p["ox"], o_y=p["oy"], o_z=p["oz"], theta=p["theta"]) +await arm.move_to_position(target) +``` + +To move the arm while avoiding obstacles, plan the motion with the +[motion service](/reference/apis/services/motion/) instead of commanding the arm +directly. + +For a [base](/reference/apis/components/base/), command a velocity: + +```python {class="line-numbers linkable-line-numbers"} +from viam.components.base import Base +from viam.proto.common import Vector3 + +base = Base.from_robot(machine, "base") +v = action["velocity"] # your model output, in mm/s and deg/s +await base.set_velocity( + linear=Vector3(x=0, y=v["forward"], z=0), + angular=Vector3(x=0, y=0, z=v["turn"]), +) +``` + +For a [gripper](/reference/apis/components/gripper/), open or grasp: + +```python {class="line-numbers linkable-line-numbers"} +from viam.components.gripper import Gripper + +gripper = Gripper.from_robot(machine, "gripper") +if action["grasp"]: + await gripper.grab() +else: + await gripper.open() +``` + +### 6. Mind the loop rate + +Run steps 3 through 5 in a loop. Match the loop period to your model's inference +latency plus the time each command takes to execute, and leave margin so +commands do not queue up. +When inference is slower than your target action rate, either move the model to +the edge, choose a smaller model, or slow the loop to match. +See [Inference latency](/concepts/inference-latency/) for how latency shapes a +control loop. + +## Next steps + +- [Deploy an ML model service](/train/deploy-a-model/) +- [Run inference with the vision service](/reference/services/vision/) +- [Create a module](/build-modules/) +- [Plan motion with the motion service](/reference/apis/services/motion/) diff --git a/docs/ai-control/simulation-and-sim-to-real.md b/docs/ai-control/simulation-and-sim-to-real.md new file mode 100644 index 0000000000..fb285c4468 --- /dev/null +++ b/docs/ai-control/simulation-and-sim-to-real.md @@ -0,0 +1,128 @@ +--- +linkTitle: "Simulation and sim-to-real" +title: "Simulation and sim-to-real" +weight: 40 +layout: "docs" +type: "docs" +description: "Understand why control policies are developed and validated in simulation before hardware, the sim-to-real gap a policy must bridge, and how a validated policy deploys to a machine as a Viam module." +--- + +Suppose you are training a locomotion policy for a quadruped. Early in training +the policy is bad on purpose: it explores, which means it commands motions that +would slam legs into the ground, tip the robot over, or drive joints past their +limits. Running those first attempts on real hardware would burn out motors and +damage the frame long before the policy learned to walk. You therefore train in +simulation first, where a fall costs nothing, and only bring a policy to +hardware once it can already trot in the simulated world. + +This page explains why simulation is central to developing learned control +policies, the gap between simulated and real experience that a policy must +bridge, and how a policy validated in simulation then runs on a machine through +Viam. + +## Why simulation comes first + +A simulator is a model of the robot and its environment that runs the same +observe, decide, act loop the real machine will run, but in software. For +developing a control policy, this buys four things that hardware cannot offer at +the same time: + +- **Safety.** Exploratory and half-trained policies produce dangerous motions. + In simulation a catastrophic action ends an episode instead of breaking a + motor or injuring a bystander. +- **Speed.** A physics simulator can run many times faster than real time and in + hundreds of parallel instances. A policy that would take months of wall-clock + time to train on one physical robot can train in hours across a fleet of + simulated ones. +- **Cost.** Simulated robots do not wear out, and you can run thousands of them + without buying thousands of machines. +- **Reset and reproducibility.** A simulator resets to an exact starting state on + demand, so every training episode begins from a known condition and a failure + is repeatable. Resetting real hardware to a precise pose after each attempt is + slow and imprecise. + +Simulation is also where you validate a policy before it touches hardware. You +can measure how the policy behaves across thousands of randomized situations, +check that it stays within joint and torque limits, and catch failure modes +while they are still free to fix. Common simulators for this work include +Gazebo, MuJoCo, and NVIDIA Isaac; Viam does not bundle a simulator, so you +choose the one that fits your robot and train in your own stack. + +## The sim-to-real gap + +A policy that performs well in simulation can still stumble on hardware, because +the simulated world and the physical world differ. The policy was trained on +simulated observations and its actions were interpreted by a simulated body; on +the real machine both sides of that loop change. This mismatch is the +sim-to-real gap, and it shows up in a few consistent places: + +- **Observation gap.** Real sensors are noisier and less consistent than their + simulated counterparts. A simulated camera renders a clean image; a real + camera adds motion blur, exposure changes, and lens distortion. A simulated + IMU reports near-perfect orientation; a real one drifts and jitters. If the + policy trained only on clean observations, real readings fall outside the + distribution it learned to handle. +- **Dynamics gap.** The simulator approximates mass, friction, joint backlash, + motor response, and contact. Real values differ from the modeled ones and vary + from unit to unit and over time as parts wear. An action that produced one + motion in simulation can produce a slightly different motion on hardware. +- **Action and latency gap.** In simulation an action often takes effect + instantly. On real hardware, sensing, inference, and communication each take + time, so the machine acts on observations that are already slightly stale, and + commands reach the actuators after a delay. A policy that assumed instant + response can become unstable when that assumption breaks. + +Analyzing where these gaps are largest for a given robot tells you what to +harden the policy against before deployment. + +## Domain randomization + +One widely used way to bridge the gap is domain randomization: rather than +training against one fixed set of simulator parameters, you vary them across +episodes. Friction coefficients, masses, sensor noise, lighting, textures, and +control latency each get sampled from a range during training. A policy exposed +to that variety learns behavior that holds across many different worlds, and the +real robot becomes just one more sample from the distribution it already handles. +The trade-off is that a policy trained to be robust to wide variation can be more +conservative than one tuned to a single ideal model, so the range is chosen to +cover reality without being needlessly pessimistic. + +## From validated policy to running machine + +Once a policy performs reliably in simulation across randomized conditions, the +artifact you carry to hardware is the trained model itself. Deployment on Viam +follows the same pattern as any learned controller: you package the model in a +[custom module](/build-modules/) that reads observations through component APIs, +runs the model to compute an action, and commands actuators through their APIs. +Because that module talks to hardware through the standard component APIs, the +same policy runs on any machine that exposes the required components. + +The training environment and the deployment target stay cleanly separated: your +simulator and ML framework live in your own stack, and the Viam module is the +bridge that carries the validated result onto real hardware. For the details of +what a policy is and the real-time budget it must meet on the machine, see +[Learned and policy-based control](/ai-control/learned-and-policy-control/). + +## An alternative: model-predictive control + +Learning a policy in simulation is one way to control a hard-to-model system, but +it is not the only model-based approach. Model-predictive control (MPC) keeps an +explicit model of the system dynamics and, at each control step, uses that model +to predict how candidate action sequences would play out over a short future +horizon. It then executes the first action of the sequence that best achieves the +goal, and repeats the prediction at the next step with fresh observations. + +Where a learned policy front-loads its cost into training and then runs a cheap +forward pass at runtime, MPC does its planning online: it needs a reliable +dynamics model and enough compute to solve an optimization every control cycle, +but it requires no training data and adapts its plan as conditions change. The +two approaches also combine well, for instance using a learned model of the +dynamics inside an MPC loop. MPC is a substantial topic in its own right and may +grow into its own page. + +## Next steps + +- Learn how a trained policy runs on a machine in + [Learned and policy-based control](/ai-control/learned-and-policy-control/). +- Learn how to package and deploy code on a machine in + [Build modules](/build-modules/). diff --git a/docs/concepts/_index.md b/docs/concepts/_index.md new file mode 100644 index 0000000000..b5336764ee --- /dev/null +++ b/docs/concepts/_index.md @@ -0,0 +1,23 @@ +--- +linkTitle: "Concepts" +title: "Core concepts" +weight: 3 +layout: "docs" +type: "docs" +no_list: true +description: "Cross-cutting ideas the rest of the docs assume: the platform model, confidence scores, inference latency, and data sampling." +--- + +Some ideas show up across the whole product. A vision page, a data page, and +a fleet page all lean on them, but none of those pages is the right place to +define them. This section is that place. Read a page here once, and the rest +of the docs read more clearly. + +- [How Viam fits together](platform-model/): machines, parts, components, + services, and modules, the vocabulary every other section uses. +- [What a confidence score is (and isn't)](confidence-scores/): why a `0.9` + is not a 90% probability, and how to pick a threshold. +- [Inference latency and loop rate](inference-latency/): why a perception or + control loop cannot run faster than the model behind it. +- [Capture frequency versus sample rate](capture-frequency/): the difference + between how often you record and how fast a sensor measures. diff --git a/docs/concepts/capture-frequency.md b/docs/concepts/capture-frequency.md new file mode 100644 index 0000000000..313e3d67a4 --- /dev/null +++ b/docs/concepts/capture-frequency.md @@ -0,0 +1,83 @@ +--- +linkTitle: "Capture frequency versus sample rate" +title: "Capture frequency versus sample rate" +weight: 40 +layout: "docs" +type: "docs" +description: "Understand how the data-capture polling frequency differs from a sensor's internal sample rate, and how to choose a capture frequency that balances fidelity against storage cost." +--- + +Suppose you configure data capture on a temperature sensor with `capture_frequency_hz` set to `1`, and the sensor hardware samples its internal thermistor at 100 Hz. +These two numbers describe different things, and the gap between them is where most surprises come from. +The sensor produces a fresh internal reading 100 times per second, but `viam-server` reads the sensor's API once per second and stores only that one value. +The other 99 readings in each second never reach the data pipeline. + +Knowing which number governs your stored data helps you set a frequency that captures the events you care about without paying for data you do not need. + +## Two independent rates + +The **sample rate** is a property of the hardware. +It is how often the physical device measures the world and refreshes the value it exposes through its API. +A GPS module might update its position at 10 Hz, an accelerometer might sample at 1 kHz, and a slow environmental sensor might refresh once every few seconds. +You usually cannot change this rate from your machine configuration; it is fixed by the device or its driver. + +The **capture frequency** is a property of your data-capture configuration. +The data manager service polls the component's API at the rate you set in `capture_frequency_hz` and writes each returned reading to disk for syncing to the cloud. +Setting `capture_frequency_hz` to `0.5` records one reading every two seconds; setting it to `5` records five readings per second. +This is a polling loop on top of the API, entirely separate from whatever the hardware is doing internally. + +Because the two rates are independent, your capture frequency decides the resolution of your stored history, while the sample rate decides the freshest value available at the moment `viam-server` polls. + +## Undersampling and aliasing + +When the capture frequency is much lower than the rate at which the signal changes, you record a sparse set of snapshots. +Between snapshots, anything can happen and go unrecorded. +A tank-level sensor polled once per minute will miss a valve that opens and closes in ten seconds, even though the hardware measured the whole event at 100 Hz. +The event was visible to the sensor and invisible to your dataset. + +A subtler failure is aliasing. +When you sample a periodic signal slower than about twice its frequency, the recorded points trace out a false, lower-frequency pattern that was never really there. +A vibration that oscillates at 10 Hz, captured at 9 Hz, can appear in your data as a slow 1 Hz drift. +The stored numbers look plausible and lead you to the wrong conclusion. +To represent a signal that oscillates at some frequency, capture at more than twice that frequency; to catch a transient event, capture often enough that at least one reading lands inside the event's shortest duration. + +## The cost of a high frequency + +Raising the capture frequency is not free. +Every stored reading consumes disk on the machine, bandwidth during sync, and storage and query cost in the cloud, and these scale linearly with frequency. +A sensor captured at 50 Hz produces fifty times the rows of the same sensor at 1 Hz, across every machine in a fleet, every hour of every day. +High frequencies can also strain the hardware: polling a device faster than it can comfortably serve readings degrades performance, which is why capture rates should stay within what the component can handle. + +The choice is therefore a trade-off. +Too low, and you alias or miss events. +Too high, and you pay for redundant readings that all report the same slowly-changing value. + +## Choosing a frequency from the event timescale + +Let the shortest event you need to observe set the rate. +Start from the timescale of what you are monitoring, then capture two or more times faster than that: + +- A room-temperature reading that drifts over minutes is well served by `0.017` Hz (once a minute) or even slower. +- A door-open sensor for a room that people enter every few seconds needs a reading every second or two to reliably catch each entry. +- A motor-current signal used to detect a stall that resolves in under a second needs several readings per second so that at least one lands during the stall. + +Then weigh that against cost and event rate. +If events are rare but you still want to catch them, pair a modest steady capture frequency with [edge filtering](/data/filter-at-the-edge/) so that the machine stores readings only when something interesting happens, rather than polling fast around the clock. +This keeps fidelity high during events and storage low the rest of the time. + +## Two clocks on every reading + +Each captured reading carries two timestamps, which helps you reason about when a measurement actually happened versus when it landed in the cloud. +`time_requested` records when the machine's data manager polled the component, using the machine's own clock. +`time_received` records when the Viam cloud received and stored the reading. + +The gap between them reflects buffering and sync latency, which can be seconds during normal operation or hours for a machine that syncs only when it regains connectivity. +Because `time_received` is indexed, it is the timestamp to use for time-range queries; `time_requested` is the one that tells you when the event occurred on the machine. +Keeping the two clocks distinct means intermittent connectivity never corrupts your sense of when data was actually captured. + +## Next steps + +To configure capture and choose a frequency for each method, see: + +- [Capture data](/data/) +- [Filter at the edge](/data/filter-at-the-edge/) diff --git a/docs/concepts/confidence-scores.md b/docs/concepts/confidence-scores.md new file mode 100644 index 0000000000..b64fe6d0d2 --- /dev/null +++ b/docs/concepts/confidence-scores.md @@ -0,0 +1,52 @@ +--- +linkTitle: "Confidence scores" +title: "What a confidence score is (and isn't)" +weight: 20 +layout: "docs" +type: "docs" +description: "What the confidence value on a detection or classification measures, why it is not a calibrated probability, and how to reason about accept and reject thresholds for a quality-control task." +--- + +Point a person detector at an image and it returns something like `person: 0.82`. That `0.82` is a confidence score: a number between `0.0` and `1.0` that rides along with every result from [`GetDetections`](/reference/apis/services/vision/#getdetections) and [`GetClassifications`](/reference/apis/services/vision/#getclassifications). It is one of the most useful signals the vision service gives you, and one of the easiest to misread. This page explains what the number measures, how far you can trust it, and how to turn it into an accept or reject decision. + +## What the score measures + +A confidence score is a value the model produces alongside a label to express how strongly the input matched what that label looks like. Within a single model, it works as an ordering signal: a detection at `0.90` matched the learned pattern for its label more strongly than one at `0.55`. Sort a batch of results by confidence and the ones most likely to be correct tend toward the top. That ranking is exactly what makes the score worth acting on. + +The score is a raw model output, not a measured frequency. The model is not counting how often it has been right at `0.82` in the past; it is emitting a number that its training happened to settle on for inputs like this one. The value is meaningful, but it describes the strength of a match, not a track record. + +## Why it is not a calibrated probability + +It is tempting to read `person: 0.82` as "82 out of 100 detections like this one are correct." That reading assumes the score is _calibrated_: that across every detection scoring `0.80`, about 80% really are the labeled object. A calibrated probability makes that promise. A raw confidence score does not. + +In practice, model scores skew. A model may report `0.95` on inputs that are correct only 70% of the time, or cluster everything between `0.40` and `0.60` even when it is usually right. The number still ranks results usefully within that one model, but its magnitude carries no guaranteed hit rate. Treat `0.82` as "high for this model," not as a 82% chance of being correct. + +Three comparisons that feel natural also break down: + +- **Across classes.** A `0.70` on `person` and a `0.70` on `forklift` from the same model can reflect very different real-world reliability. Each label has its own score distribution, so the two numbers are not interchangeable even inside one model. +- **Across models.** A `0.80` from one model says nothing about a `0.80` from another. They were trained differently and their scores land on different scales. +- **Across versions.** Retraining or re-exporting the same model can shift the whole score distribution. A threshold that fit last month's version can behave differently after an update. + +Heuristic detectors expose a confidence value too, and it follows the same rules. A [`color_detector`](/reference/services/vision/color_detector/) computes its confidence from a rule about how much of a region falls within a target color range. That is a useful, repeatable score, but it is a measure of color match, not a probability that an object is present. Whatever produces the number, the discipline is the same: use it to rank and threshold, not as odds. + +## Choosing a threshold for a quality-control task + +Because the score orders results well within one model, the practical move is to pick a cutoff and act on everything above it. On the [ML model vision service](/reference/services/vision/mlmodel/), that cutoff lives in configuration as [`default_minimum_confidence`](/reference/services/vision/mlmodel/), which filters out every result below the value you set, and [`label_confidences`](/reference/services/vision/mlmodel/), which sets a separate cutoff per label. Per-label cutoffs exist precisely because scores are not comparable across classes: each label earns its own threshold. + +Consider a station that inspects parts for a defect. A detector returns `defect: ` and you reject the part when that score is at or above a threshold `T`. Where you put `T` decides which of two mistakes you make more often: + +- A **false accept** ships a defective part. The threshold was high enough that a real defect scored below it and passed. +- A **false reject** scraps a good part. The threshold was low enough that a clean part scored above it and got pulled. + +The two errors trade against each other, and the score distribution is what you are sliding along: + +- **Lower `T`** flags more parts as defective. Fewer defects slip through (fewer false accepts) at the cost of scrapping more good parts (more good parts scrapped). +- **Raise `T`** and the reverse holds: less good product wasted, but more defects escape. + +There is no single correct `T`; there is the `T` that fits the cost of each error. If a shipped defect triggers a recall or a safety incident and a scrapped part costs a few cents of material, the defect is far more expensive, so favor a lower threshold and accept a higher false-reject rate. If scrap is costly and an escaped defect is caught cheaply downstream, a higher threshold makes sense. Ground the choice in those costs rather than in the number looking "high enough." To set `T` well, run the model over a labeled sample, look at where correct and incorrect results actually fall on the score scale, and choose the cutoff that puts the errors where they hurt least. Then revisit it whenever you retrain or re-export, because the distribution can move underneath you. + +## Next steps + +- [Detect objects](/vision/object-detection/detect/), work with detections and their confidence scores. +- [Classify images](/vision/classify/), work with classifications and their confidence scores. +- [Tune a detector](/vision/object-detection/tune/), adjust `default_minimum_confidence` and `label_confidences` in practice. diff --git a/docs/concepts/inference-latency.md b/docs/concepts/inference-latency.md new file mode 100644 index 0000000000..680af91124 --- /dev/null +++ b/docs/concepts/inference-latency.md @@ -0,0 +1,116 @@ +--- +linkTitle: "Inference latency and loop rate" +title: "Inference latency and loop rate" +weight: 30 +layout: "docs" +type: "docs" +description: "Understand why model inference latency sets the ceiling on how fast a perception or control loop can run, and estimate an achievable loop rate from model size, image resolution, and hardware." +--- + +Consider a small program that watches a camera and reacts to what it sees: + +```python +while True: + detections = await detector.get_detections_from_camera("my-camera") + steer(detections) + await asyncio.sleep(0.05) # aim for 20 Hz +``` + +The `sleep(0.05)` looks like it sets the pace: run 20 times per second. +In practice the loop rate depends far more on the line above it. +The call to [`GetDetectionsFromCamera`](/vision/) captures a frame, runs it through a machine learning model, and returns the results. +That call is synchronous: it blocks until inference finishes. +If the model takes 200 ms to produce detections, one trip through the loop takes at least 200 ms, and the loop runs near 5 Hz no matter what number you pass to `sleep`. + +This page explains why inference latency sets the ceiling on loop rate, and how to estimate that ceiling before you build a real-time task. + +## Why each iteration waits for inference + +A perception or control loop does its work one iteration at a time, in order. +Each iteration acquires an input, computes on it, and acts on the result. +When the compute step is a model inference call, the loop reaches that call and waits for a return value before it can act or start the next iteration. + +The wall-clock time of one iteration is the sum of its blocking steps. +The single largest term is usually inference: + +- Acquiring a frame from a camera: a few milliseconds to tens of milliseconds. +- Running inference on that frame: milliseconds to hundreds of milliseconds. +- Acting on the result (sending a command to a motor or base): typically a few milliseconds. + +Your `sleep` only adds idle time on top of that sum. +It can slow the loop down, but it cannot speed it up past the inference call. +The achievable loop rate is therefore bounded by the slowest blocking step, roughly: + +```text +max_loop_rate ≈ 1 / max(inference_time, actuation_time, frame_time) +``` + +In most vision loops the inference term dominates, so `max_loop_rate ≈ 1 / inference_time`. +A 200 ms model caps the loop near 5 Hz; a 20 ms model allows up to about 50 Hz, before you add any deliberate `sleep`. + +## What determines inference time + +Inference time is a property of three things together: the model, the input, and the hardware. + +**Model size and architecture.** +A larger model with more parameters and layers performs more arithmetic per frame. +A compact detector aimed at edge devices runs much faster than a large, high-accuracy backbone. +Quantized formats such as TFLite `int8` trade a small amount of accuracy for a substantial speedup. + +**Input resolution.** +Compute grows with the number of pixels, which grows with the square of the linear resolution. +Halving both image dimensions cuts pixel count to a quarter and often cuts inference time by a similar factor. +Feeding a model a smaller frame is one of the cheapest ways to raise loop rate. + +**Hardware.** +Where inference runs matters more than any other single factor. +As rough orders of magnitude, for a typical object detector: + +| Where inference runs | Rough per-frame latency | Rough loop ceiling | +| ---------------------------------------------------------------- | --------------------------- | ------------------ | +| TFLite on a Raspberry Pi CPU | ~150-500 ms | ~2-6 Hz | +| Same model on a coprocessor or small GPU (for example, a Jetson) | ~10-40 ms | ~25-100 Hz | +| Larger model on a desktop or server GPU | single-digit to ~20 ms | ~50-200 Hz | + +Treat these as illustrative ranges, not specifications. +Actual numbers depend on the exact model, resolution, framework, and device, and the only reliable figure is one you measure on your own hardware. +The pattern holds across cases: moving the same model from a general-purpose CPU to an accelerator built for tensor math changes latency by roughly an order of magnitude. + +## Why remote inference adds latency + +The ML model service's `Infer` method is the lower-level call that a [vision service](/vision/) detection ultimately blocks on, and you can run that model locally on the machine or call one hosted elsewhere. +Running inference on a remote or cloud server can give you access to hardware far more powerful than an edge device. +That power comes with an added cost: every frame travels to the server and every result travels back. + +Remote inference latency is the sum of the network round trip and the server-side compute: + +```text +remote_inference_time ≈ upload_time + server_inference_time + download_time +``` + +Uploading a full-resolution frame over a constrained or high-latency link can add tens to hundreds of milliseconds and can vary from frame to frame. +For a monitoring task that reports a status every few seconds, that overhead is comfortably absorbed. +For a control loop steering a moving base, a variable extra 100 ms per iteration both lowers the loop rate and makes its timing less predictable. + +## Sizing a real-time task + +The tolerable loop rate follows from what the loop controls. + +**Real-time control** acts on the physical world, where staleness compounds. +A base moving at 1 m/s travels 20 cm during a 200 ms inference call, so at 5 Hz every decision is based on a frame already 20 cm out of date. +For steering, obstacle avoidance, or closed-loop reaction, you generally want inference well under the physical time constant of the system, and you want that latency to be steady rather than bursty. +Viam's feedback controllers expose their own cadence directly: a [sensor-controlled base](/components/base/sensor-controlled/) accepts a `control_frequency_hz` value (default 10 Hz), and its movement sensors must report at least that fast for the loop to hold rate. + +**Monitoring and logging** consume detections rather than steering on them, so a loop running at 1 Hz, or slower, is often plenty. +Here you can favor a larger, more accurate model or a remote GPU and accept the higher per-frame latency, because no actuator is waiting on the result. + +To size a task, work backward from the required rate. +Decide the loop rate the application needs, invert it to get the latency budget per iteration, subtract the frame-capture and actuation time, and choose a model, resolution, and hardware whose measured inference time fits what remains. +If nothing fits, you have three levers: shrink or quantize the model, lower the input resolution, or move inference to faster hardware. +Because these levers trade accuracy and cost against speed, measuring inference time on the target device is the step that turns an estimate into a design you can rely on. + +## Next steps + +- Learn how detection and classification calls work in the [vision service](/vision/). +- See how a feedback loop consumes sensor input at a fixed rate on a [sensor-controlled base](/components/base/sensor-controlled/). +- Explore the [components](/components/) that a perception or control loop reads from and acts on. diff --git a/docs/concepts/platform-model.md b/docs/concepts/platform-model.md new file mode 100644 index 0000000000..ba63a5c7b1 --- /dev/null +++ b/docs/concepts/platform-model.md @@ -0,0 +1,97 @@ +--- +linkTitle: "How Viam fits together" +title: "How Viam fits together" +weight: 10 +layout: "docs" +type: "docs" +description: "A tour of Viam's core vocabulary, machine, part, component, service, module, and modular resource, for developers new to robotics." +--- + +Picture a small delivery robot: two motors, a camera up front, a GPS unit, and a single-board computer that ties them together. +In Viam terms, that whole robot is a **machine**. +The software brain running on its computer is a program called `viam-server`, built on Viam's Robot Development Kit (RDK). +`viam-server` is what your code talks to, and it is the piece that turns a pile of hardware and algorithms into something you can drive from a uniform API. + +Everything else in the Viam vocabulary describes how that machine is organized and how you extend it. +Once these words click, the rest of the documentation reads much more smoothly. + +## Machines and parts + +A **machine** is the logical unit you configure and control: our delivery robot, a conveyor cell, or a camera-only sensor station. + +A **part** is one running instance of `viam-server` that belongs to a machine. +Most machines have exactly one part, so in practice "machine" and "part" often point at the same physical computer. +The distinction matters when a single machine spans more than one computer, for example, a robot arm on one board and a vision workstation on another, coordinated as one machine. +Each computer runs its own part, and the parts connect so that your code sees one machine with one address. +When you read about a machine "having a main part" or "sub-parts," this is the idea at work. + +## Components and services + +Inside a part, capabilities are grouped into two families. + +A **component** represents a piece of hardware you drive: a motor, a camera, a GPS movement sensor, an arm. +Each kind of component has a standard API, so every camera responds to the same image-capture method regardless of the vendor, and every motor responds to the same set-power and stop methods. +This uniformity is the point, you write against the camera API, and swapping a USB webcam for an industrial camera does not change your code. + +A **service** provides higher-level software capability that usually builds on top of components. +The vision service runs object detection on frames from a camera; the motion service plans a path and commands an arm to follow it; the navigation service drives a base toward a destination. +Services also present standard APIs, so a detector-based vision service and a segmentation-based one answer the same detection methods. + +The clean way to keep them straight: a component is a thing the machine controls, and a service is a capability the machine performs. +Both are **resources**, the general word for any configured, API-addressable element of a machine, and both are reached through `viam-server` in exactly the same style. + +## Models and APIs + +An **API** defines _what_ methods a resource answers, the camera API, the motor API, the vision service API. +A **model** is a specific _implementation_ of one of those APIs. + +The camera API is a single contract, but a Logitech webcam, a RealSense depth camera, and a simulated fake camera are three different models that all satisfy it. +When you configure a component or service, you choose a model, and behind that model sits real code that fulfills the API. +Models are named with a triplet like `namespace:family:name` (for example `viam:camera:webcam`), which keeps a community-contributed model distinct from Viam's own. + +## Modules and modular resources + +Viam ships with many built-in models, but the platform is designed to be extended, and this is where modules come in. + +A **module** is a packaged program that adds one or more new models to `viam-server`. +The individual capability a module contributes, the configured camera or the custom vision routine you get out of it, is a **modular resource**. +A modular resource implements a standard component or service API, so once it is running it looks and behaves like any built-in resource: the same methods, the same tooling, the same client code. + +Modules are shared through the **registry**, Viam's catalog of models. +When you add a model from the registry to a machine, `viam-server` downloads the module, launches it, and manages its lifecycle. +Your configuration names a model; the module supplies the implementation; the API guarantees that the rest of your system does not need to know which module is behind it. +This is what lets a hobbyist's sensor driver and a vendor's official one slot into the same machine interchangeably. + +## Two ways you write code + +As a developer, you meet Viam from one of two directions, and telling them apart clears up a lot of early confusion. + +When you write a **client script**, _you_ are the caller. +Your program connects to a machine, gets a handle to a resource, and calls its API methods, read this sensor, move that arm, run detections on this camera. +Control flows outward from your code into the machine, and this is how most applications, dashboards, and automations are built. +See [Control a machine](/operate/control/) for this path. + +When you **author a module**, the relationship inverts: _the platform_ is the caller. +You implement the methods of a component or service API, and `viam-server` invokes your code whenever a client asks that resource to do something. +Instead of calling `get-image`, you are the one who answers `get-image` when a request arrives. +Your module registers its model, and from then on it serves requests rather than sending them. +See [Build modules](/build-modules/) for this path. + +The same API sits between both roles, which is the elegant part: a client script written against the camera API works identically whether the camera is a built-in model or a modular resource you wrote yourself. +Learn one API contract and you understand both sides of it. + +## Putting it together + +Back to the delivery robot. +The machine is the robot; its one part is `viam-server` on the onboard computer. +The motors, camera, and GPS are components; a vision service and a navigation service supply the higher-level behavior. +If the stock GPS driver does not fit the specific hardware, someone publishes a module to the registry whose modular resource implements the movement sensor API, and the machine uses it exactly like a built-in one. +An operator's phone app is a client script that calls these APIs to send the robot on its way. + +With this vocabulary in place, the rest of the documentation, configuration, module development, fleet management, describes variations on these same relationships. + +## Next steps + +- [Configure and control a machine](/operate/), put components and services on a real part and drive them. +- [Build a module](/build-modules/overview/), author your own modular resource against a standard API. +- [Machine architecture reference](/operate/reference/architecture/), a closer look at how parts, resources, and `viam-server` connect. diff --git a/docs/manipulation/_index.md b/docs/manipulation/_index.md new file mode 100644 index 0000000000..795d0bfa8e --- /dev/null +++ b/docs/manipulation/_index.md @@ -0,0 +1,22 @@ +--- +linkTitle: "Manipulation" +title: "Advanced manipulation" +weight: 35 +layout: "docs" +type: "docs" +no_list: true +description: "Manipulation techniques beyond a single planned move: force and compliance control, and tracking and picking moving objects." +--- + +Planning a collision-free move to a pose covers many pick-and-place tasks. +Some tasks need more: a contact task that must control _force_ rather than +just position, or a pick from a _moving_ conveyor where the target won't hold +still. This section covers those techniques. For the core arm, gripper, and +motion-planning setup they build on, start with +[Motion planning](/motion-planning/). + +- [Force and compliance control](force-and-compliance-control/): tasks that + regulate contact force, insertion, tending, and accounting for a grasped + part in the planning world. +- [Track and pick moving objects](track-and-pick-moving-objects/): follow a + detection across frames and intercept it on a conveyor. diff --git a/docs/manipulation/force-and-compliance-control.md b/docs/manipulation/force-and-compliance-control.md new file mode 100644 index 0000000000..e2f7ddaa0b --- /dev/null +++ b/docs/manipulation/force-and-compliance-control.md @@ -0,0 +1,130 @@ +--- +linkTitle: "Force and compliance control" +title: "Force and compliance control" +weight: 10 +layout: "docs" +type: "docs" +description: "Why contact tasks like insertion need force feedback and compliance, and how a grasped part becomes part of the arm's collision geometry during planning." +--- + +Picture an arm pressing a round peg into a hole that is only a fraction of a +millimeter wider than the peg. You command the arm to the exact pose where the +peg should end up and move it straight down. The peg touches the rim slightly +off-center, catches, and stops. Position control keeps driving toward the +commanded pose, so the arm pushes harder against a wall it cannot pass through. +The peg jams, and the force at the contact point climbs until something flexes +or the motors stall. + +This is the core problem with contact tasks. When an arm moves through open air, +knowing where to go is enough. When it presses two parts together, where to go +is only half the story. The other half is how hard to push and when to stop or +adjust. Insertion tasks such as seating a connector, tending tasks such as +loading a part into a fixture, and any operation where the tool touches the +world all share this property. + +## Why position-only control struggles with contact + +A position controller has one goal: drive the joints until the tool reaches a +commanded pose. It succeeds by minimizing the gap between where the tool is and +where you told it to be. That works well in free space, where the only thing +between the current pose and the target is air. + +Contact changes the situation. Real parts have tiny misalignments, the hole is +never exactly where the model says, and surfaces have friction. A stiff +position controller treats the resulting contact force as an error to overcome, +so it commands more torque to close a gap that physical contact makes impossible +to close. The result is high contact forces, jamming, and marred parts. The +information the controller needs, how much force the contact is producing, never +enters the loop. + +## Force and torque feedback + +Force and torque feedback adds that missing information. A force/torque (F/T) +sensor, usually mounted at the wrist between the arm and the gripper, measures +the forces and torques the tool experiences: how hard it is pushing along each +axis and how much it is being twisted around each axis. In plain terms, it lets +the arm feel contact rather than only tracking position. + +With that signal available, the control goal can shift. Instead of only asking +"is the tool at the commanded pose," the system can also ask "is the contact +force within the range I want." For a peg insertion, a useful strategy is to +press downward with a gentle, bounded force while allowing small sideways motion +so the peg can slide until it aligns with the hole, then seat it. The force +reading tells the system when the peg has bottomed out and the task is complete. + +## Compliance: yielding to contact + +Compliance is the willingness of the arm to yield when it meets resistance, +rather than holding a commanded pose rigidly. A compliant arm behaves a little +like a spring: push on it, and it gives a controlled amount instead of fighting +back with full torque. + +Compliance is what turns force feedback into useful behavior. If the peg +contacts the rim off-center, a compliant response lets the arm move sideways in +the direction the contact pushes it, so the peg settles into the opening instead +of jamming against the edge. You choose how compliant each direction is: an +insertion often stays stiff along the insertion axis, so the arm still drives +the peg home, while staying soft in the sideways directions, so the part can +self-align. This selective softness is what makes reliable insertion and tending +possible. + +At a high level, the control idea is a loop that blends two aims: reach the +target region using position, and regulate contact using force. The F/T sensor +reports the current force, the controller compares it to the force you want, and +it adjusts the commanded motion so the actual force stays in range. The details +vary by strategy, but the shape is always feedback on force rather than position +alone. + +## Assembling force control on Viam + +Force control is an active area rather than a single turnkey primitive. The +building blocks are a force-capable arm or an arm paired with a wrist F/T +sensor, a fast feedback loop, and a control strategy tuned to the task. In +practice, teams implement this pattern as a custom +[module](/operate/get-started/other-hardware/) that reads the F/T sensor, +runs the force loop, and commands the arm. Treat any specific force-control API +as something you provide in your module rather than a built-in signature, and +size the approach to the hardware you actually have: a sensitive insertion needs +a genuinely force-capable arm and a responsive sensor, not just position +commands issued quickly. + +## The payload point: a grasped part joins the arm's geometry + +Contact tasks usually start by picking something up, and that changes what the +planner has to reason about. Before the grasp, the motion planner models the arm +and gripper and routes them around obstacles. The moment the gripper closes on a +part, that part rigidly extends the gripper. A connector held in the jaws sweeps +through space exactly as the gripper does, so from the planner's point of view it +is now part of the moving hardware. + +If you do not tell the planner about the held part, motion planning still avoids +collisions for the arm and gripper while treating the carried part as empty +space. The part can then clip a fixture wall or the edge of the workspace on the +way to the insertion point, even though the plan looks collision-free. + +The fix is to describe the grasped part as a geometry and attach it to the +gripper's frame in the planning world. When you call the motion service, its +[`WorldState`](/motion-planning/) carries both obstacles and transforms. Adding +a transform whose parent is the gripper frame, with a geometry sized to the held +part, places that shape in the planner's model of the scene. Because it is +parented to the gripper frame, it moves with the gripper as the arm moves, +exactly like the real part. Motion planning then routes the arm, the gripper, +and the carried part around obstacles together. When the gripper releases the +part, you drop that transform from `WorldState` so the planner stops carrying a +shape that is no longer attached. + +This is where the [frame system](/motion-planning/frame-system/) does the work. +Frames define how each part of the machine is positioned relative to its parent, +and a geometry parented to the gripper frame inherits the gripper's motion for +free. Getting the gripper frame right, and sizing the attached geometry to +enclose the real part with a small margin, is what lets the arm carry a part +through a cluttered cell without collisions. + +## Next steps + +- [Motion planning](/motion-planning/): how the motion service plans + collision-free paths and what `WorldState` contains. +- [Frame system](/motion-planning/frame-system/): how frames position each part + of the machine and how attached geometry moves with a frame. +- [Configure workspace obstacles](/motion-planning/obstacles/configure-workspace-obstacles/): + how to give the planner the static obstacles your contact task moves among. diff --git a/docs/manipulation/track-and-pick-moving-objects.md b/docs/manipulation/track-and-pick-moving-objects.md new file mode 100644 index 0000000000..b445ff6cce --- /dev/null +++ b/docs/manipulation/track-and-pick-moving-objects.md @@ -0,0 +1,162 @@ +--- +linkTitle: "Track and pick moving objects" +title: "Track and pick moving objects" +weight: 20 +layout: "docs" +type: "docs" +description: "Build a control loop that tracks a part moving on a conveyor, predicts where it will be, and picks it with a robot arm." +--- + +A part rides down a conveyor at a steady speed. Your arm has to reach the belt, +close the gripper on the part, and lift it away before the part passes out of +reach. Because the part keeps moving while the arm plans and travels, aiming the +arm at where the camera last saw the part places the gripper behind the target. +This guide shows you how to build a pick loop that measures the part's motion, +predicts where it will be at the moment of the grasp, and commands the arm to +that predicted pose. + +## Prerequisites + +- A configured [camera](/reference/components/camera/) viewing the belt +- A configured vision service detector that recognizes the part. See + [Detect objects](/vision/object-detection/). +- A configured [arm](/reference/components/arm/) and + [gripper](/reference/components/gripper/) that reach the belt +- The [motion service](/reference/apis/services/motion/) and a + [frame system](/motion-planning/frame-system/) that relates the + camera, arm, and belt in one coordinate space + +## Steps + +### 1. Detect the part + +Run the detector on the live camera feed. Each call returns the current +bounding boxes for parts in view. + +```python +from viam.services.vision import VisionClient + +detector = VisionClient.from_robot(machine, "belt-detector") +detections = await detector.get_detections_from_camera("belt-camera") +``` + +For the full parameter list and language-specific signatures, see +[`GetDetectionsFromCamera`](/reference/apis/services/vision/#getdetectionsfromcamera). +Detection alone reports what is in a single frame; it does not connect a box in +this frame to the same part in the next frame. + +### 2. Track the part across frames + +To follow one part through the stream, give each detection a persistent ID. The +[`viam:object-tracker` module](/vision/object-detection/) wraps your detector +and camera, matches detections between consecutive frames, and assigns each part +a stable track ID such as `part_0_20260701_143052`. Configure it as described in +[Track objects across frames](/vision/object-detection/), then read its +detections the same way you read any detector. + +With a stable ID you can measure motion. Record the part's position and the +capture time on two frames, then estimate belt velocity from the difference: + +```python +import time + +# Two observations of the same track ID, in world coordinates (mm) +p0, t0 = observe(track_id) # returns ((x, y, z), timestamp_seconds) +p1, t1 = observe(track_id) + +dt = t1 - t0 +velocity = tuple((b - a) / dt for a, b in zip(p0, p1)) # mm per second +``` + +Average several frame pairs to smooth out per-frame detection noise. On a +conveyor the motion is dominated by one axis, so the velocity estimate reduces +to belt speed along that axis. + +### 3. Predict the intercept pose + +Estimate how long the pick will take from the moment you commit: the time to +plan the arm move plus the time for the arm to travel and the gripper to close. +Call this `t_pick`. Extrapolate the part's position forward by that interval to +get the intercept point: + +```python +t_pick = 0.9 # seconds: planning + arm travel + grasp, measured on your cell + +intercept = tuple(p + v * t_pick for p, v in zip(p1, velocity)) +``` + +Keep `t_pick` realistic. If the true pick takes longer than your estimate, the +part overshoots the intercept point and the gripper closes behind it. + +### 4. Plan the arm move to the predicted pose + +Hand the intercept point to the [motion service](/reference/apis/services/motion/), +which plans a collision-free path and moves the arm. Orient the gripper for a +top-down grasp on the belt. + +```python +from viam.services.motion import MotionClient +from viam.proto.common import PoseInFrame, Pose + +motion_service = MotionClient.from_robot(machine, "builtin") + +x, y, z = intercept +destination = PoseInFrame( + reference_frame="world", + pose=Pose(x=x, y=y, z=z, o_x=0, o_y=0, o_z=-1, theta=0), +) +await motion_service.move(component_name="belt-arm", destination=destination) +``` + +For the move signature and options such as passing a `WorldState` of obstacles, +see [Move an arm to a pose](/motion-planning/move-an-arm/move-to-pose/). + +### 5. Time the grasp + +The arm arrives ahead of the part and the part travels into the open gripper. +Close the gripper when the part reaches the intercept point, then lift clear of +the belt: + +```python +from viam.components.gripper import Gripper + +gripper = Gripper.from_robot(machine, "belt-gripper") +await gripper.grab() +# Retract the arm to a safe pose above the belt with another motion_service.move +``` + +Wrap steps 1 through 5 in a loop so the cell processes one part per cycle. Track +whether each grasp succeeds and log the part IDs you pick so a missed part can be +retried on the next pass. + +## Diagnose the latency budget + +The maximum belt speed your cell can handle follows directly from `t_pick`. The +total pick latency is the sum of three stages: + +- **Inference:** capture a frame and run the detector and tracker on it. +- **Planning:** the motion service solves for a path to the intercept pose. +- **Arm move and grasp:** the arm travels and the gripper closes. + +During that whole interval the part keeps moving. If the part travels farther +than your prediction covers before the gripper closes, the grasp misses, so the +belt speed and the pick latency are linked: the faster the belt, the less time +you have, and the farther a prediction error carries the part off target. + +To raise the belt speed, shrink the latency budget: use a faster detector, +reduce planning time by constraining the workspace, or shorten arm travel by +starting each cycle from a pose near the belt. Measure each stage separately so +you tune the one that dominates. For how inference time enters this budget and +how to measure it, see [Inference latency](/concepts/inference-latency/). + +If picks miss intermittently, compare your assumed `t_pick` against the measured +end-to-end time under load. A budget that holds at rest often grows once the +detector, planner, and arm run concurrently, which pushes the real intercept +point past where you aimed. + +## Next steps + +- [Detect objects](/vision/object-detection/) +- [Track objects across frames](/vision/object-detection/) +- [Move an arm to a pose](/motion-planning/move-an-arm/move-to-pose/) +- [Motion service API](/reference/apis/services/motion/) diff --git a/docs/navigation/_index.md b/docs/navigation/_index.md new file mode 100644 index 0000000000..74cb87b160 --- /dev/null +++ b/docs/navigation/_index.md @@ -0,0 +1,24 @@ +--- +linkTitle: "Navigation" +title: "Navigation and localization" +weight: 40 +layout: "docs" +type: "docs" +no_list: true +description: "How a mobile robot knows where it is and drives itself to a goal: localization, SLAM, sensor fusion, base navigation, and multi-robot coordination." +--- + +Before a mobile robot can go anywhere on purpose, it has to answer one +question: _where am I?_ Everything else, planning a path, driving to a goal, +coordinating with other robots, builds on that answer. This section covers +how a machine estimates its own position and uses that estimate to move. + +- [How a robot knows where it is](localization/): odometry, GPS, and SLAM as + localization sources, and their drift and cost trade-offs. +- [SLAM and mapping](slam-and-mapping/): building a map and locating within it. +- [Combine sensors with sensor fusion](sensor-fusion/): why one sensor is + rarely enough, and what "fusion" does and doesn't mean today. +- [Navigate a mobile base to a goal](navigate-a-mobile-base/): drive to a map + or GPS waypoint with the motion service. +- [Coordinate a multi-robot fleet](coordinate-a-fleet/): share tasks and avoid + deadlock across machines. diff --git a/docs/navigation/coordinate-a-fleet.md b/docs/navigation/coordinate-a-fleet.md new file mode 100644 index 0000000000..7476831cc2 --- /dev/null +++ b/docs/navigation/coordinate-a-fleet.md @@ -0,0 +1,111 @@ +--- +linkTitle: "Coordinate a multi-robot fleet" +title: "Coordinate a multi-robot fleet" +weight: 50 +layout: "docs" +type: "docs" +description: "How centralized and decentralized coordination avoid deadlock among many robots, and which Viam primitives support task hand-off across machines." +--- + +Picture a warehouse with a fleet of autonomous mobile robots (AMRs) moving totes +between storage and packing. Two robots approach the same narrow aisle from opposite +ends. If both enter, neither can pass and neither can back out cleanly: the aisle is +deadlocked, and the throughput of the whole floor drops while they wait. Scale that to +fifty robots sharing intersections, charging docks, and pick faces, and coordination +becomes the hardest part of the application. A single robot that navigates well is not +enough; the fleet has to agree on who does what and who goes where. + +This page explains the coordination problem, contrasts centralized and decentralized +approaches to solving it, and maps each approach onto the Viam primitives you can build +with. It assumes you already know how to command one machine through its component and +service APIs. + +## The coordination problem + +Coordinating a fleet breaks down into three intertwined questions: + +- **Task allocation:** which robot handles which job. Assigning the nearest free robot + to each new pick request keeps travel time low, but a naive assignment can send three + robots to the same zone while another region sits idle. +- **Traffic and deadlock:** how robots share physical space. Aisles, doorways, and + charging docks are finite resources. When two robots each hold part of what the other + needs, such as opposite ends of a one-lane aisle, they can wait on each other + indefinitely. Avoiding this requires reserving space in advance or detecting and + resolving the standoff. +- **Shared state:** how robots agree on a common picture. A map of which zones are + occupied, which jobs are claimed, and which docks are free has to stay consistent + across machines that each see only their own surroundings. + +The design choice that shapes all three is where the decisions are made. + +## Centralized coordination + +In a centralized design, a coordinator service holds the authoritative view of the +fleet and hands out instructions. Robots report their status and requests to the +coordinator; the coordinator allocates tasks, reserves zones, and grants passage. Before +a robot enters the contested aisle, it asks the coordinator for a reservation. The +coordinator grants the aisle to one robot at a time and queues the other, so the +deadlock never forms. + +The strength of this approach is global reasoning. Because one component sees every +robot and every reservation, it can allocate tasks optimally, prevent deadlock by +construction, and give operators a single place to observe and override behavior. The +trade-offs are that the coordinator is a single point of failure, it can become a +throughput bottleneck as the fleet grows, and every robot depends on reliable +connectivity to it. Careful designs mitigate these with redundancy, regional +coordinators, and fallback behavior for when a robot loses contact. + +## Decentralized coordination + +In a decentralized design, robots negotiate locally. Each robot carries its own share of +the decision logic and resolves conflicts with the neighbors it can currently sense or +reach. At the aisle, the two robots exchange messages and settle who proceeds first +using an agreed rule, such as the robot with the higher-priority job or the one already +partway in. + +The strength here is resilience and scale. There is no central bottleneck, robots keep +working when connectivity to the cloud drops, and adding robots does not overload one +component. The trade-off is that local decisions can be globally suboptimal, and +guaranteeing freedom from deadlock is harder to prove when no single component sees the +whole picture. Robust decentralized systems lean on well-chosen priority rules and +protocols that provably break symmetric standoffs. + +Most production fleets blend the two: a coordinator sets high-level goals and zone +policy while robots handle immediate, latency-sensitive conflicts on their own. + +## Which Viam primitives support fleet coordination + +Viam gives you the building blocks for either approach rather than a turnkey traffic +manager. You compose the coordination layer yourself from these primitives: + +- **Each robot is an independent machine.** You drive its motion, sensing, and + navigation through the standard component and service + [APIs](/reference/apis/). Any coordinator, or any peer robot, commands a machine + through the same interfaces you already use for one robot. +- **A coordinator can run as an application or service.** Build it with the + [fleet management API](/reference/apis/) to enumerate machines, read their status, + and act on the fleet as a whole. The coordinator can live in the cloud or on a machine + on the floor. +- **Machine-to-machine communication** lets robots and coordinators talk directly. + [Machine-to-machine comms](/reference/machine-to-machine-comms/) support both the + centralized pattern, where robots call a coordinator, and the decentralized pattern, + where peers negotiate. +- **Shared data in the cloud** provides common state. Robots write occupancy, claimed + jobs, and dock status to Viam's cloud data, and other machines and services read it to + form a shared picture of the fleet. +- **Scheduled jobs** run coordination logic on a cadence. Use jobs to rebalance task + allocation, expire stale zone reservations, or sweep for stuck robots without keeping + a process running continuously. + +Fleet management, the APIs, machine-to-machine comms, shared data, and jobs are the +pieces; the allocation policy, reservation scheme, and deadlock-resolution rules are +the part you design for your application. Viam supplies robust, tested primitives so +that your effort goes into the coordination logic rather than the plumbing. + +## Next steps + +- Learn how [fleet management](/fleet/) organizes and operates many machines. +- Review the [component and service APIs](/reference/apis/) you use to command each + machine. +- See [machine-to-machine communication](/reference/machine-to-machine-comms/) for + direct links between machines and coordinators. diff --git a/docs/navigation/localization.md b/docs/navigation/localization.md new file mode 100644 index 0000000000..64ac0aeb5a --- /dev/null +++ b/docs/navigation/localization.md @@ -0,0 +1,89 @@ +--- +linkTitle: "How a robot knows where it is" +title: "How a robot knows where it is" +weight: 10 +layout: "docs" +type: "docs" +description: "Understand localization: how a robot estimates its own pose from odometry, GPS, and SLAM, and how to choose the right sensors for a deployment." +--- + +A cleaning robot finishes a run and needs to return to its charging dock. +To drive back, it has to answer one question: where am I right now? +The dock sits at a fixed spot, but the robot has been turning and rolling around a room for an hour. +Answering that question is called **localization**: estimating the robot's **pose** (its position and orientation) within a chosen reference **frame**, such as the corner of the room or a point on the globe. + +In Viam, a pose lives in the frame system, which tracks where each part of a machine sits relative to a common origin. +Localization is the process of keeping the robot's own pose in that frame accurate as it moves. +No single sensor answers "where am I" perfectly, so it helps to understand the three common sources and what each one is good at. + +## Odometry: counting your own motion + +Wheel **odometry** estimates pose by adding up the robot's own movement. +Encoders on the wheels report how far each wheel turned; from those counts and the robot's geometry, the software integrates a running estimate of how far and which way the robot traveled. +This technique is called **dead reckoning**: you start from a known pose and accumulate motion to guess your current one. + +Odometry is cheap, works anywhere, and updates quickly. +Its weakness is **drift**. +Every wheel slip, uneven tile, or rounding error adds a small mistake to the estimate, and because dead reckoning has no outside reference to check against, those small mistakes accumulate. +After a long run the estimated pose can be meters away from the true pose, even though each individual reading looked reasonable. +Drift is why a robot that relies only on odometry gradually loses track of the dock. + +Viam exposes wheel odometry through the [movement sensor](/components/movement-sensor/) component, using the `wheeled-odometry` model, which derives velocity and position from motor encoders. + +## GPS: an absolute outdoor fix + +**GPS** takes the opposite approach. +Instead of accumulating motion, a GPS receiver reports an **absolute** position from satellite signals, expressed as latitude and longitude. +Because each reading is independent, GPS does not drift: an error in one reading does not corrupt the next. + +The trade-offs are environment and precision. +GPS needs a clear view of the sky, so it works outdoors but degrades or fails indoors, in tunnels, and under dense cover. +Standard GPS is accurate to a few meters, which is fine for a lawn robot crossing a yard but too coarse to dock precisely. +In Viam, GPS receivers and inertial measurement units (IMUs) are also [movement sensor](/components/movement-sensor/) models, so the same component API surfaces both absolute position and orientation. + +## SLAM: building a map while you use it + +**SLAM** (Simultaneous Localization and Mapping) works where GPS cannot reach. +The robot builds a map of its surroundings from a range sensor such as a LIDAR or depth camera, and at the same time uses that map to figure out where it sits within it. +Matching current sensor readings against the map gives an absolute pose relative to the mapped space, so SLAM corrects drift the way GPS does, but indoors. + +The cost is hardware and computation: SLAM needs a capable range sensor and more processing than reading an encoder, and its accuracy depends on the environment having enough structure to recognize. +Viam provides SLAM through the [SLAM service](/navigation/slam-and-mapping/). + +## Comparing the three sources + +| Source | Reference | Drift over time | Environment | Relative cost | +| -------- | ---------------------------- | --------------- | ------------------ | ------------------------------- | +| Odometry | Relative (integrated motion) | Accumulates | Anywhere | Low (encoders) | +| GPS | Absolute (satellite) | None | Outdoor, open sky | Low to medium (receiver) | +| SLAM | Absolute (built map) | None | Indoor, structured | Higher (range sensor + compute) | + +The key split is **relative** versus **absolute**. +Odometry is relative: it tells you how you moved but never resets, so it drifts. +GPS and SLAM are absolute: each fix is anchored to an outside reference, so they stay bounded but depend on their environment and cost more to run. + +## Why fuse relative and absolute sources + +The two kinds of source complement each other, which is why many deployments combine them. +Odometry updates fast and smoothly but drifts; an absolute source updates the true position but can be slow, noisy, or briefly unavailable (a GPS signal drops under a bridge, or a SLAM scan finds a bare hallway). +Sensor fusion blends them: odometry fills in the fast, in-between motion, while the absolute source periodically corrects the accumulated drift. +The result is an estimate that is both smooth and bounded, better than either source alone. +For how Viam combines readings from several movement sensors into one pose estimate, see [sensor fusion](/navigation/sensor-fusion/). + +## Choosing sources for a deployment + +Which sources a machine needs follows from where it runs and how precise it must be. + +- **Outdoor, meter-scale** (a lawn mower, a field rover): a GPS movement sensor gives absolute position; add wheel odometry or an IMU so the estimate stays smooth between GPS updates and survives brief signal loss. +- **Indoor, no GPS** (a warehouse or home robot): use SLAM with a LIDAR or depth camera for absolute indoor localization, fused with odometry for fast updates. +- **Short, controlled runs** (a robot that never strays far from a known start): odometry alone can be enough, since drift stays small over a short distance and time. +- **High precision anywhere** (docking, tight aisles): pair an absolute source with odometry, because odometry alone will not stay accurate long enough to line up. + +Start from the environment to rule sources in or out (GPS outdoors, SLAM indoors), then decide whether the required precision and run length demand an absolute source at all. +That decision tells you which [movement sensors](/components/movement-sensor/) or range sensors to put on the machine. + +## Next steps + +- [Movement sensor component](/components/movement-sensor/): configure GPS, IMU, and wheeled-odometry models. +- [SLAM and mapping](/navigation/slam-and-mapping/): build and use maps for indoor localization. +- [Sensor fusion](/navigation/sensor-fusion/): combine relative and absolute sources into one pose estimate. diff --git a/docs/navigation/navigate-a-mobile-base.md b/docs/navigation/navigate-a-mobile-base.md new file mode 100644 index 0000000000..a1ebee60c6 --- /dev/null +++ b/docs/navigation/navigate-a-mobile-base.md @@ -0,0 +1,116 @@ +--- +linkTitle: "Navigate a mobile base" +title: "Navigate a mobile base to a goal" +weight: 40 +layout: "docs" +type: "docs" +description: "Drive a configured mobile base to a GPS or SLAM-map waypoint with the motion service, using MoveOnGlobe or MoveOnMap." +--- + +The motion service can drive a mobile base to a destination and plan a collision-free path along the way. +For a base that knows its own pose, one call moves it to a waypoint: `MoveOnGlobe` for a geographic goal or `MoveOnMap` for a goal on a [SLAM](/operate/reference/services/slam/) map. + +This page assembles the localization and motion inputs a base needs, issues the move, and maps common failures back to the missing input. + +## Prerequisites + +Before you start, configure the following on your machine: + +- A [mobile base](/components/base/) that you can already drive with velocity or position commands. +- A localization source that reports the base's pose: + - A [movement sensor](/components/movement-sensor/) that provides GPS position, for a geographic goal. + - A [SLAM service](/operate/reference/services/slam/) that provides a map and pose, for a map goal. +- The [motion service](/reference/apis/services/motion/), which plans the path and issues drive commands to the base. + +## Steps + +### 1. Confirm the base has a localization source + +The motion service moves the base relative to a known pose, so the base needs a source that reports where it is. +Confirm one of the following is configured and reporting: + +- **Geographic goals:** a movement sensor returning a valid GPS fix. +- **Map goals:** a SLAM service returning a current pose on its map. + +For the localization options and how each one supplies a pose, see [Localization](/navigation/localization/). + +### 2. Set up the motion service with the base and its localization source + +Add the [motion service](/reference/apis/services/motion/) to your machine. +The motion service reads the machine's [frame system](/operate/mobility/frame-system/) to relate the base, its localization source, and any obstacles in a shared coordinate space. + +Make sure your frame system places: + +- The base as a movable component. +- The movement sensor or SLAM service on the base, so its pose reports describe the base. + +With the frame system in place, the motion service has both the pose and the geometry it needs to plan. + +### 3. Command the base to a goal + +Call the motion service from an SDK. +Use `MoveOnGlobe` for a geographic destination or `MoveOnMap` for a destination on a SLAM map. + +**Geographic goal with `MoveOnGlobe`:** + +`MoveOnGlobe` takes a base, a destination as a `GeoPoint` (latitude and longitude), the name of the GPS movement sensor, and an optional list of geographic obstacles. + +```python +from viam.services.motion import MotionClient +from viam.proto.common import GeoPoint, GeoGeometry + +motion = MotionClient.from_robot(machine, "builtin") + +await motion.move_on_globe( + component_name=base_name, + destination=GeoPoint(latitude=40.7, longitude=-73.98), + movement_sensor_name=gps_name, + obstacles=[], # optional GeoGeometry obstacles +) +``` + +**Map goal with `MoveOnMap`:** + +`MoveOnMap` takes a base, a destination `Pose` on the map, the name of the SLAM service, and an optional list of obstacles. + +```python +from viam.services.motion import MotionClient +from viam.proto.common import Pose + +motion = MotionClient.from_robot(machine, "builtin") + +await motion.move_on_map( + component_name=base_name, + destination=Pose(x=1500, y=200, z=0), # millimeters on the map + slam_service_name=slam_name, + obstacles=[], # optional obstacles +) +``` + +For full parameters, obstacle geometry types, and other SDKs, see the [motion service API](/reference/apis/services/motion/). + +### 4. Verify the move + +Watch the base drive toward the goal. +Both calls return `true` when the base reaches the destination within the configured tolerance. +To confirm progress and final pose, read the localization source directly: + +- For a geographic goal, read the [movement sensor](/components/movement-sensor/) position. +- For a map goal, read the base pose from the [SLAM service](/operate/reference/services/slam/). + +## Troubleshooting + +If a navigation attempt fails, match the symptom to the input it depends on: + +| Symptom | Missing input | Fix | +| ---------------------------------------------------------------------------- | -------------------- | -------------------------------------------------------------------------------------------------------------------------------------------- | +| Error that the base has no pose, or the plan starts from the wrong location. | Localization source. | Confirm the movement sensor has a GPS fix or the SLAM service reports a pose (Step 1). | +| `MoveOnMap` reports no map, or the base leaves the mapped area. | SLAM map. | Confirm the SLAM service is running and the destination falls within its map (Step 1). | +| Base stops early or refuses to plan a path near an object. | Obstacle source. | Pass known obstacles to the `obstacles` argument, or add a [vision service](/operate/reference/services/vision/) obstacle detector (Step 3). | +| Motion service cannot relate the base and its sensor. | Frame system. | Confirm the frame system places the sensor on the base (Step 2). | + +## Next steps + +- [Localization](/navigation/localization/): compare GPS and SLAM localization sources. +- [Motion service API](/reference/apis/services/motion/): full `MoveOnGlobe` and `MoveOnMap` parameters. +- [Base component](/components/base/): tune the base that carries out the plan. diff --git a/docs/navigation/sensor-fusion.md b/docs/navigation/sensor-fusion.md new file mode 100644 index 0000000000..2cb1c0c558 --- /dev/null +++ b/docs/navigation/sensor-fusion.md @@ -0,0 +1,85 @@ +--- +linkTitle: "Sensor fusion" +title: "Combine sensors with sensor fusion" +weight: 30 +layout: "docs" +type: "docs" +description: "Understand what sensor fusion means, how it differs from Viam's merged movement sensor, and when combining an IMU with an absolute source produces a steadier pose estimate." +--- + +Picture a wheeled robot reporting where it is. +Its inertial measurement unit (IMU) updates hundreds of times a second and tracks fast, smooth motion, but its position estimate slowly slides away from the truth as small errors pile up. +A GPS receiver on the same robot reports an absolute position that never drifts, but it updates slowly and jumps around by a meter or more from one reading to the next. +Neither source alone gives you a position estimate that is both steady and correct. +Combine them, and you can get a pose that is smooth like the IMU and anchored like the GPS. + +That combination is the idea behind sensor fusion. + +## What sensor fusion is + +Sensor fusion takes several noisy measurements of the same thing and blends them into a single estimate that is better than any input on its own. +The classic tool for this is a Kalman filter. +At a high level, a Kalman filter keeps a running estimate of the robot's state, such as position and velocity, along with a measure of how confident it is in that estimate. +Each new sensor reading updates the estimate in proportion to how trustworthy that reading is: a precise measurement pulls the estimate strongly, a noisy one nudges it gently. +The filter also predicts how the state should change between readings using motion, so it can smooth over gaps and reject outliers. + +The result is a continuous estimate that carries information from every source at once. +A fused pose reflects the IMU's fast, fine-grained motion and the GPS's absolute anchor in the same number, weighted by how much each sensor can be trusted moment to moment. + +## How Viam's `merged` movement sensor differs + +Viam ships a [`merged` movement sensor](/components/movement-sensor/) model, and the name invites a natural assumption. +It is worth being precise, so you set the right expectations. + +The `merged` model performs **selection and aggregation**, not statistical fusion. +You configure it by property, such as `position`, `orientation`, or `angular_velocity`, and for each property you list one or more source sensors. +When your code requests a reading, `merged` returns the value from the first sensor in that property's list that answers without error. +If that sensor fails or is unavailable, `merged` falls through to the next one. + +This design does two useful things: + +- **Aggregation across sensors.** + A GPS reports position and a separate IMU reports orientation and angular velocity. + `merged` presents both through one movement sensor client, so your application reads a complete pose from a single component instead of juggling several. +- **Failover within a property.** + If you list two sensors that both report angular velocity, `merged` uses the first one that responds and switches to the second only when the first errors. + +What `merged` does not do is blend two readings of the same quantity into a weighted average. +If two sensors both report position, `merged` picks one of them for each reading; it does not compute a combined position that is more accurate than either. +In short, `merged` is an excellent way to assemble a full set of readings from complementary hardware and to stay running when a sensor drops out, but it is not the Kalman-filter-style estimator described above. + +## When fusing an IMU with an absolute source helps + +True fusion earns its keep when your sources have complementary strengths and weaknesses. +The textbook pairing is a high-rate relative sensor with a low-rate absolute one: + +- An **IMU** measures acceleration and rotation at a high rate. + Integrating those measurements gives smooth, responsive short-term motion, but the estimate drifts over seconds to minutes because integration accumulates error. +- An **absolute source**, such as GPS outdoors or a localization service indoors, reports position in a fixed frame that does not drift, but updates slowly and carries per-reading noise. + +Fusing the two lets each cover the other's weak spot. +The IMU fills the gaps between slow absolute updates and keeps the pose smooth during fast maneuvers. +The absolute source corrects the IMU's accumulated drift every time it reports, pinning the estimate back to ground truth. +The fused pose is steady between updates and stays accurate over long runs, which is exactly what a navigation or motion system wants. + +Fusion is worth the effort when: + +- One sensor is fast but drifts, and another is slow but absolute. +- You need a continuous pose at a higher rate than your absolute source alone provides. +- Outlier rejection matters, because a filter that models expected motion can discount readings that jump implausibly. + +Fusion buys you less when a single sensor already meets your accuracy and update-rate needs, or when your sources share the same weakness, such as two receivers that both lose signal in the same tunnel. + +## Getting fusion today + +Because the built-in `merged` model selects rather than fuses, a statistically fused pose in Viam comes from one of two places: + +- A **custom module** that reads the raw sensors, runs a filter such as a Kalman or complementary filter, and presents the fused result as its own movement sensor or sensor. +- An **upstream source** that already fuses internally, such as a GPS/IMU receiver or a SLAM system that outputs a filtered pose, which you then configure as a single movement sensor. + +Either way, the fusion lives in software you choose or hardware you select, and Viam consumes the fused output like any other movement sensor. + +## Next steps + +- Learn how a machine turns sensor readings into a position estimate in [Localization](/navigation/localization/). +- See the available movement sensor models, including [`merged`](/components/movement-sensor/), to decide which sensors to combine. diff --git a/docs/navigation/slam-and-mapping.md b/docs/navigation/slam-and-mapping.md new file mode 100644 index 0000000000..e68041a32f --- /dev/null +++ b/docs/navigation/slam-and-mapping.md @@ -0,0 +1,99 @@ +--- +linkTitle: "SLAM and mapping" +title: "SLAM and mapping" +weight: 20 +layout: "docs" +type: "docs" +description: "How a mobile robot builds a map of an unknown space while tracking its own position within it, and when to map live versus localize against a prebuilt map." +--- + +A robot vacuum starts in the middle of a living room it has never seen. It has +no floor plan and no marker on a wall telling it where it stands. Within a few +minutes it has both: a map of the rooms it can reach and a steady sense of where +it sits on that map. It builds the map and finds itself on the map at the same +time. That combined trick is **SLAM**. + +## What SLAM produces + +SLAM stands for **Simultaneous Localization And Mapping**. The two halves name +the two outputs: + +- **A map.** A machine-readable model of the space, commonly an _occupancy grid_ + (a top-down grid where each cell is marked free, occupied, or unknown) or a + _point cloud_ (a set of 3D points sampled from surfaces the sensors saw). +- **A live pose.** The robot's current position and orientation _within that map_, + updated continuously as it moves. + +The word _simultaneous_ carries the whole idea. Mapping and localization each +depend on the other, which looks like a chicken-and-egg problem: to place a new +wall on the map, the robot needs to know where it was standing when it saw the +wall; to know where it is standing, it needs a map to compare its view against. +SLAM breaks the loop by solving both together. Each new sensor reading nudges the +map and the pose estimate at the same time, and the two converge as the robot +explores. This is what separates SLAM from plain [localization](localization/): +localization answers _where am I_ against a map that already exists, while SLAM +produces the map and the answer together. + +## What hardware SLAM requires + +SLAM works by matching what the robot senses now against what it sensed a moment +ago and against the map so far. That matching needs sensors that measure the +_shape_ of the surroundings, plus a rough guess of how the robot moved between +readings: + +- **A ranging sensor** that reports distances to surrounding surfaces. A spinning + **LIDAR** sweeps a plane and returns distance at each angle; a **depth camera** + reports distance per pixel across its field of view. Either gives the geometry + that becomes the map. +- **Odometry**, a rough estimate of motion from wheel encoders or an inertial + sensor. This seeds each match with a starting guess ("I probably moved forward + about 20 cm"), which the ranging data then corrects. + +A plain color camera with no depth, or a bare GPS receiver, does not supply the +per-surface geometry SLAM relies on, so a supported ranging sensor is the core +requirement. + +## Two modes: map live or localize against a prebuilt map + +SLAM on a Viam machine runs as a **SLAM service** (a module paired with a +supported ranging sensor). You configure it in one of two modes, and the right +choice depends on whether the space is already mapped and how stable it is. + +**Map live.** The robot builds a fresh map as it drives and localizes against +that growing map in real time. Choose this when the space is unknown, when it +changes often enough that a saved map would go stale, or when you are creating a +map to save and reuse later. The cost is compute and time: the robot is doing the +full simultaneous problem on every reading. + +**Localize against a prebuilt map.** You supply a map captured earlier, and the +service only estimates the pose against it, the mapping half is already done. +Choose this when the environment is stable (a warehouse, a fixed building) and you +want lower compute, faster startup, and repeatable behavior across runs. The +trade-off is that the map is a snapshot: if the space is rearranged, the robot's +matches degrade until you remap. + +A common pattern combines them: map the space live once, save that map, then run +in localize-only mode for day-to-day operation and remap when the layout changes. + +Because these are configuration choices on the SLAM service rather than separate +components, you can switch modes by changing the service configuration. For the +exact configuration shape, supported sensors, and available SLAM modules, see the +[navigation and SLAM reference](/reference/services/navigation/) and +[How a robot knows where it is](localization/). + +## How the map feeds navigation + +The map and pose are inputs, not the goal. Once SLAM reports where the robot is on +a map, navigation can plan a route across that map to a destination and drive the +base there, steering around the obstacles the map records. That handoff, from +"where am I on the map" to "drive me to that spot", is covered in +[Navigate a mobile base to a goal](navigate-a-mobile-base/). + +## Next steps + +- [How a robot knows where it is](localization/): how odometry, GPS, and SLAM + compare as sources of position. +- [Navigate a mobile base to a goal](navigate-a-mobile-base/): turn a map and a + pose into motion toward a destination. +- [Navigation and SLAM reference](/reference/services/navigation/): configuration + fields, supported sensors, and the API. From e521dbcb6c76771cb134424076087153a5e02947 Mon Sep 17 00:00:00 2001 From: Brandon Shrewsbury Date: Wed, 1 Jul 2026 13:18:47 -0600 Subject: [PATCH 02/10] Apply Playbook 1 accuracy fixes from source review Verified against RDK source: - MoveOnMap/MoveOnGlobe are async, return an execution ID (not a bool) - merged movement sensor selects a source per property at configure time, not per-read failover - color_detector emits a constant 1.0 confidence - built-in webcam model triplet is rdk:builtin:webcam; clarify family slot - scope the ML model service Infer note to ML-backed detectors - open-vocabulary detection is a model you deploy, not a built-in - fix SLAM and object-tracker/detect link targets --- docs/ai-control/integrate-an-llm.md | 2 +- docs/ai-control/run-a-vla.md | 8 +++++--- docs/concepts/confidence-scores.md | 2 +- docs/concepts/inference-latency.md | 2 +- docs/concepts/platform-model.md | 2 +- .../track-and-pick-moving-objects.md | 10 +++++----- docs/navigation/navigate-a-mobile-base.md | 6 +++--- docs/navigation/sensor-fusion.md | 17 ++++++----------- docs/navigation/slam-and-mapping.md | 2 +- 9 files changed, 24 insertions(+), 27 deletions(-) diff --git a/docs/ai-control/integrate-an-llm.md b/docs/ai-control/integrate-an-llm.md index 32e9f59769..be821e128d 100644 --- a/docs/ai-control/integrate-an-llm.md +++ b/docs/ai-control/integrate-an-llm.md @@ -24,7 +24,7 @@ Steps 4 and 5 cover this validation and the timeouts and human-confirmation gate ## Prerequisites -- A machine with the components and services your robot uses (for example a [base](/dev/reference/apis/components/base/) and an [arm](/dev/reference/apis/components/arm/)), already [configured](/operate/get-started/supported-hardware/). +- A machine with the components and services your robot uses (for example a [base](/reference/apis/components/base/) and an [arm](/reference/apis/components/arm/)), already [configured](/operate/get-started/supported-hardware/). - Credentials for an LLM provider, or a local model you can query. - Familiarity with [writing a module](/build-modules/). diff --git a/docs/ai-control/run-a-vla.md b/docs/ai-control/run-a-vla.md index 885fa1da53..08679fc601 100644 --- a/docs/ai-control/run-a-vla.md +++ b/docs/ai-control/run-a-vla.md @@ -24,9 +24,11 @@ gripper, already configured. A related capability is **open-vocabulary** (or **zero-shot**) detection: a vision model detects objects named by a text prompt, such as "coffee mug," with no task-specific training. -You can run open-vocabulary detection through the -[vision service](/reference/services/vision/) and use its bounding boxes -either as a standalone perception step or as an input to the VLA loop below. +Viam does not ship an open-vocabulary detector; as with the VLA model, you +deploy one yourself, typically as a vision-service module. The +[vision service](/reference/services/vision/) is then the interface that returns +its bounding boxes, which you can use either as a standalone perception step or +as an input to the VLA loop below. ## Prerequisites diff --git a/docs/concepts/confidence-scores.md b/docs/concepts/confidence-scores.md index b64fe6d0d2..c715ab7d53 100644 --- a/docs/concepts/confidence-scores.md +++ b/docs/concepts/confidence-scores.md @@ -27,7 +27,7 @@ Three comparisons that feel natural also break down: - **Across models.** A `0.80` from one model says nothing about a `0.80` from another. They were trained differently and their scores land on different scales. - **Across versions.** Retraining or re-exporting the same model can shift the whole score distribution. A threshold that fit last month's version can behave differently after an update. -Heuristic detectors expose a confidence value too, and it follows the same rules. A [`color_detector`](/reference/services/vision/color_detector/) computes its confidence from a rule about how much of a region falls within a target color range. That is a useful, repeatable score, but it is a measure of color match, not a probability that an object is present. Whatever produces the number, the discipline is the same: use it to rank and threshold, not as odds. +Heuristic detectors expose a confidence value too, and it can mean even less. The [`color_detector`](/reference/services/vision/color_detector/) assigns every region it returns a constant confidence of `1.0`: the score reports that a color-matched region was found, and carries no information for ranking one detection above another. It is the clearest reminder that a confidence field is only ever as meaningful as the model behind it. Whatever produces the number, the discipline is the same: understand what it measures before you rank or threshold on it, and never read it as odds. ## Choosing a threshold for a quality-control task diff --git a/docs/concepts/inference-latency.md b/docs/concepts/inference-latency.md index 680af91124..cf5cafd292 100644 --- a/docs/concepts/inference-latency.md +++ b/docs/concepts/inference-latency.md @@ -78,7 +78,7 @@ The pattern holds across cases: moving the same model from a general-purpose CPU ## Why remote inference adds latency -The ML model service's `Infer` method is the lower-level call that a [vision service](/vision/) detection ultimately blocks on, and you can run that model locally on the machine or call one hosted elsewhere. +For an ML-backed [vision service](/vision/), the detection ultimately blocks on the ML model service's `Infer` method, and you can run that model locally on the machine or call one hosted elsewhere. (A heuristic detector such as `color_detector` runs no model and skips this cost.) Running inference on a remote or cloud server can give you access to hardware far more powerful than an edge device. That power comes with an added cost: every frame travels to the server and every result travels back. diff --git a/docs/concepts/platform-model.md b/docs/concepts/platform-model.md index ba63a5c7b1..9c32968541 100644 --- a/docs/concepts/platform-model.md +++ b/docs/concepts/platform-model.md @@ -47,7 +47,7 @@ A **model** is a specific _implementation_ of one of those APIs. The camera API is a single contract, but a Logitech webcam, a RealSense depth camera, and a simulated fake camera are three different models that all satisfy it. When you configure a component or service, you choose a model, and behind that model sits real code that fulfills the API. -Models are named with a triplet like `namespace:family:name` (for example `viam:camera:webcam`), which keeps a community-contributed model distinct from Viam's own. +Models are named with a triplet, `namespace:family:name`. The built-in webcam is `rdk:builtin:webcam`; a community model might be `myorg:realsense:d435`. The middle slot is the model's _family_, not the API it implements, and the namespace keeps a community-contributed model distinct from Viam's own. ## Modules and modular resources diff --git a/docs/manipulation/track-and-pick-moving-objects.md b/docs/manipulation/track-and-pick-moving-objects.md index b445ff6cce..8e07eaad65 100644 --- a/docs/manipulation/track-and-pick-moving-objects.md +++ b/docs/manipulation/track-and-pick-moving-objects.md @@ -19,7 +19,7 @@ that predicted pose. - A configured [camera](/reference/components/camera/) viewing the belt - A configured vision service detector that recognizes the part. See - [Detect objects](/vision/object-detection/). + [Detect objects](/vision/object-detection/detect/). - A configured [arm](/reference/components/arm/) and [gripper](/reference/components/gripper/) that reach the belt - The [motion service](/reference/apis/services/motion/) and a @@ -48,10 +48,10 @@ this frame to the same part in the next frame. ### 2. Track the part across frames To follow one part through the stream, give each detection a persistent ID. The -[`viam:object-tracker` module](/vision/object-detection/) wraps your detector +[`viam:object-tracker` module](/vision/object-detection/track/) wraps your detector and camera, matches detections between consecutive frames, and assigns each part a stable track ID such as `part_0_20260701_143052`. Configure it as described in -[Track objects across frames](/vision/object-detection/), then read its +[Track objects across frames](/vision/object-detection/track/), then read its detections the same way you read any detector. With a stable ID you can measure motion. Record the part's position and the @@ -156,7 +156,7 @@ point past where you aimed. ## Next steps -- [Detect objects](/vision/object-detection/) -- [Track objects across frames](/vision/object-detection/) +- [Detect objects](/vision/object-detection/detect/) +- [Track objects across frames](/vision/object-detection/track/) - [Move an arm to a pose](/motion-planning/move-an-arm/move-to-pose/) - [Motion service API](/reference/apis/services/motion/) diff --git a/docs/navigation/navigate-a-mobile-base.md b/docs/navigation/navigate-a-mobile-base.md index a1ebee60c6..b96e36aeef 100644 --- a/docs/navigation/navigate-a-mobile-base.md +++ b/docs/navigation/navigate-a-mobile-base.md @@ -91,9 +91,9 @@ For full parameters, obstacle geometry types, and other SDKs, see the [motion se ### 4. Verify the move -Watch the base drive toward the goal. -Both calls return `true` when the base reaches the destination within the configured tolerance. -To confirm progress and final pose, read the localization source directly: +`MoveOnGlobe` and `MoveOnMap` run asynchronously: each returns an execution ID immediately and the base keeps driving in the background. +Track progress and completion with `GetPlan` and `ListPlanStatuses`, and stop an in-progress move with `StopPlan`. +To confirm the final pose, read the localization source directly: - For a geographic goal, read the [movement sensor](/components/movement-sensor/) position. - For a map goal, read the base pose from the [SLAM service](/operate/reference/services/slam/). diff --git a/docs/navigation/sensor-fusion.md b/docs/navigation/sensor-fusion.md index 2cb1c0c558..e1b528571b 100644 --- a/docs/navigation/sensor-fusion.md +++ b/docs/navigation/sensor-fusion.md @@ -33,20 +33,15 @@ It is worth being precise, so you set the right expectations. The `merged` model performs **selection and aggregation**, not statistical fusion. You configure it by property, such as `position`, `orientation`, or `angular_velocity`, and for each property you list one or more source sensors. -When your code requests a reading, `merged` returns the value from the first sensor in that property's list that answers without error. -If that sensor fails or is unavailable, `merged` falls through to the next one. +When the sensor starts, `merged` picks, for each property, the first listed sensor that reports that property, and every reading of that property then comes from the chosen sensor. -This design does two useful things: - -- **Aggregation across sensors.** - A GPS reports position and a separate IMU reports orientation and angular velocity. - `merged` presents both through one movement sensor client, so your application reads a complete pose from a single component instead of juggling several. -- **Failover within a property.** - If you list two sensors that both report angular velocity, `merged` uses the first one that responds and switches to the second only when the first errors. +This design gives you aggregation across sensors: a GPS reports position while a separate IMU reports orientation and angular velocity, and `merged` presents both through one movement sensor client. +Your application reads a complete pose from a single component instead of juggling several. What `merged` does not do is blend two readings of the same quantity into a weighted average. -If two sensors both report position, `merged` picks one of them for each reading; it does not compute a combined position that is more accurate than either. -In short, `merged` is an excellent way to assemble a full set of readings from complementary hardware and to stay running when a sensor drops out, but it is not the Kalman-filter-style estimator described above. +If two sensors both report position, `merged` chooses one at startup and reads position from it; it does not compute a combined position that is more accurate than either. +The listed order sets that startup preference, not a per-reading failover: if the chosen sensor errors on a given read, `merged` returns that error rather than switching to another source mid-run. +In short, `merged` is a clean way to assemble a full pose from complementary hardware, but it is not the Kalman-filter-style estimator described above. ## When fusing an IMU with an absolute source helps diff --git a/docs/navigation/slam-and-mapping.md b/docs/navigation/slam-and-mapping.md index e68041a32f..185a442475 100644 --- a/docs/navigation/slam-and-mapping.md +++ b/docs/navigation/slam-and-mapping.md @@ -78,7 +78,7 @@ in localize-only mode for day-to-day operation and remap when the layout changes Because these are configuration choices on the SLAM service rather than separate components, you can switch modes by changing the service configuration. For the exact configuration shape, supported sensors, and available SLAM modules, see the -[navigation and SLAM reference](/reference/services/navigation/) and +[SLAM service reference](/operate/reference/services/slam/) and [How a robot knows where it is](localization/). ## How the map feeds navigation From 02c6a2fb5a8572e3f78785759e530fbfc045ed09 Mon Sep 17 00:00:00 2001 From: Brandon Shrewsbury Date: Wed, 1 Jul 2026 13:27:32 -0600 Subject: [PATCH 03/10] Fix internal links to this repo's IA and pass the Hugo link checks - Repoint /operate/* and nonexistent SLAM-reference links to valid targets (/motion-planning/frame-system/, /navigation/slam-and-mapping/, registry) - Rename link text off 'How a robot knows where it is' (the render-link hook errors on link text containing 'here', which 'where' matches) - Drop the removed navigation-service tombstone link Validated: 0 broken internal links, vale + markdownlint clean, Hugo content render clean (only the PostCSS asset step needs npm, handled in CI). --- docs/navigation/_index.md | 2 +- docs/navigation/navigate-a-mobile-base.md | 8 ++++---- docs/navigation/slam-and-mapping.md | 12 ++++++------ 3 files changed, 11 insertions(+), 11 deletions(-) diff --git a/docs/navigation/_index.md b/docs/navigation/_index.md index 74cb87b160..9cc6940e6d 100644 --- a/docs/navigation/_index.md +++ b/docs/navigation/_index.md @@ -13,7 +13,7 @@ question: _where am I?_ Everything else, planning a path, driving to a goal, coordinating with other robots, builds on that answer. This section covers how a machine estimates its own position and uses that estimate to move. -- [How a robot knows where it is](localization/): odometry, GPS, and SLAM as +- [How a robot knows its position](localization/): odometry, GPS, and SLAM as localization sources, and their drift and cost trade-offs. - [SLAM and mapping](slam-and-mapping/): building a map and locating within it. - [Combine sensors with sensor fusion](sensor-fusion/): why one sensor is diff --git a/docs/navigation/navigate-a-mobile-base.md b/docs/navigation/navigate-a-mobile-base.md index b96e36aeef..b71a26bb45 100644 --- a/docs/navigation/navigate-a-mobile-base.md +++ b/docs/navigation/navigate-a-mobile-base.md @@ -8,7 +8,7 @@ description: "Drive a configured mobile base to a GPS or SLAM-map waypoint with --- The motion service can drive a mobile base to a destination and plan a collision-free path along the way. -For a base that knows its own pose, one call moves it to a waypoint: `MoveOnGlobe` for a geographic goal or `MoveOnMap` for a goal on a [SLAM](/operate/reference/services/slam/) map. +For a base that knows its own pose, one call moves it to a waypoint: `MoveOnGlobe` for a geographic goal or `MoveOnMap` for a goal on a [SLAM](/navigation/slam-and-mapping/) map. This page assembles the localization and motion inputs a base needs, issues the move, and maps common failures back to the missing input. @@ -19,7 +19,7 @@ Before you start, configure the following on your machine: - A [mobile base](/components/base/) that you can already drive with velocity or position commands. - A localization source that reports the base's pose: - A [movement sensor](/components/movement-sensor/) that provides GPS position, for a geographic goal. - - A [SLAM service](/operate/reference/services/slam/) that provides a map and pose, for a map goal. + - A [SLAM service](/navigation/slam-and-mapping/) that provides a map and pose, for a map goal. - The [motion service](/reference/apis/services/motion/), which plans the path and issues drive commands to the base. ## Steps @@ -37,7 +37,7 @@ For the localization options and how each one supplies a pose, see [Localization ### 2. Set up the motion service with the base and its localization source Add the [motion service](/reference/apis/services/motion/) to your machine. -The motion service reads the machine's [frame system](/operate/mobility/frame-system/) to relate the base, its localization source, and any obstacles in a shared coordinate space. +The motion service reads the machine's [frame system](/motion-planning/frame-system/) to relate the base, its localization source, and any obstacles in a shared coordinate space. Make sure your frame system places: @@ -96,7 +96,7 @@ Track progress and completion with `GetPlan` and `ListPlanStatuses`, and stop an To confirm the final pose, read the localization source directly: - For a geographic goal, read the [movement sensor](/components/movement-sensor/) position. -- For a map goal, read the base pose from the [SLAM service](/operate/reference/services/slam/). +- For a map goal, read the base pose from the [SLAM service](/navigation/slam-and-mapping/). ## Troubleshooting diff --git a/docs/navigation/slam-and-mapping.md b/docs/navigation/slam-and-mapping.md index 185a442475..f3f6bfc88a 100644 --- a/docs/navigation/slam-and-mapping.md +++ b/docs/navigation/slam-and-mapping.md @@ -77,9 +77,9 @@ in localize-only mode for day-to-day operation and remap when the layout changes Because these are configuration choices on the SLAM service rather than separate components, you can switch modes by changing the service configuration. For the -exact configuration shape, supported sensors, and available SLAM modules, see the -[SLAM service reference](/operate/reference/services/slam/) and -[How a robot knows where it is](localization/). +configuration shape and supported sensors of a specific implementation, see the +SLAM modules in the [Viam Registry](https://app.viam.com/registry) and +[How a robot knows its position](localization/). ## How the map feeds navigation @@ -91,9 +91,9 @@ base there, steering around the obstacles the map records. That handoff, from ## Next steps -- [How a robot knows where it is](localization/): how odometry, GPS, and SLAM +- [How a robot knows its position](localization/): how odometry, GPS, and SLAM compare as sources of position. - [Navigate a mobile base to a goal](navigate-a-mobile-base/): turn a map and a pose into motion toward a destination. -- [Navigation and SLAM reference](/reference/services/navigation/): configuration - fields, supported sensors, and the API. +- [SLAM modules in the registry](https://app.viam.com/registry): configuration + fields and supported sensors for a specific SLAM implementation. From 71b7b915e3fe8757cb2fcb304ac59afaf3166fa7 Mon Sep 17 00:00:00 2001 From: Brandon Shrewsbury Date: Wed, 1 Jul 2026 13:50:25 -0600 Subject: [PATCH 04/10] Add concept-coverage README: 50 use-cases, method, gap closure Documents the use-case coverage analysis behind the four new sections: the 50 IoT/robotics use-cases, the ownership-sweep method, the before/after gap closure, and a link to the full spreadsheet. --- CONCEPT-COVERAGE-README.md | 87 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 87 insertions(+) create mode 100644 CONCEPT-COVERAGE-README.md diff --git a/CONCEPT-COVERAGE-README.md b/CONCEPT-COVERAGE-README.md new file mode 100644 index 0000000000..266868b82f --- /dev/null +++ b/CONCEPT-COVERAGE-README.md @@ -0,0 +1,87 @@ +# Concept coverage analysis: 50 IoT/robotics use-cases + +This branch adds four documentation sections that close concept gaps found by running a **use-case → concept coverage** analysis (Playbook 11) over 50 real IoT and robotics product use-cases. This README records the use-cases, the method, and the measured gap closure. + +**Spreadsheet (full matrix, per-task concepts, gaps, learning objectives, before/after):** [`playbook11-use-case-coverage.xlsx`](https://drive.google.com/drive/folders/1ErpNHevOAVc_4YwqYM_dpG960QeAaLmA) (in the shared Drive folder). + +## Method + +1. Decompose each use-case into the platform + robotics concepts a user must understand to succeed. +2. Ownership sweep: for each concept, does one findable page **own** (define) it, or is it scattered/missing/buried? (grep-grounded against the docs + RDK source). +3. Turn gaps into target pages with a Diátaxis type, write learning objectives, then draft the pages. +4. Re-run the sweep to measure gap closure. + +## Gap closure (63 concepts) + +| Coverage | Before | After | +|---|---:|---:| +| OWNED | 29 | 46 | +| PARTIAL | 10 | 10 | +| BURIED | 7 | 7 | +| SCATTERED | 3 | 0 | +| MISSING | 14 | 0 | +| **No-owner gaps (MISSING+SCATTERED)** | **17** | **0** | + +The 7 BURIED concepts (kinematics, frames, motion-planning) stay deferred: they are owned inside the motion-planning section, which is under active edit. + +## New sections in this PR (19 pages) + +- **`docs/concepts/`** — platform model, confidence scores, inference latency, capture frequency vs sample rate +- **`docs/ai-control/`** — learned & policy-based control, run a VLA, integrate an LLM, simulation & sim-to-real +- **`docs/navigation/`** — localization, SLAM & mapping, navigate a mobile base, sensor fusion, coordinate a fleet +- **`docs/manipulation/`** — force & compliance control, track & pick moving objects + +## The 50 use-cases + +| # | Category | Use case | Job | +|---|---|---|---| +| 1 | VLA / foundation-model control | VLA bin picking | Prompt a manipulator in natural language to pick a named item from a mixed bin | +| 2 | VLA / foundation-model control | NL task commanding | Command an arm with 'put the red block in the box' and have it execute | +| 3 | VLA / foundation-model control | Open-vocab perception | Detect arbitrary, un-trained objects from a text prompt on a robot camera | +| 4 | VLA / foundation-model control | LLM task planner | Use an LLM to decompose a goal into robot skills and dispatch them | +| 5 | VLA / foundation-model control | VLA mobile manipulation | A mobile manipulator tidies a room from a spoken instruction | +| 6 | VLA / foundation-model control | Voice-to-action control | Drive a robot by voice command with speech + a VLA policy | +| 7 | Policy-based / learned control | RL locomotion | Deploy an RL-trained gait policy on a legged/wheeled base | +| 8 | Policy-based / learned control | Imitation assembly | Teach an assembly skill from demonstrations and replay it | +| 9 | Policy-based / learned control | Visuomotor grasp policy | Run a learned pixel-to-action grasping policy in a loop | +| 10 | Policy-based / learned control | Sim-to-real transfer | Train a control policy in sim and deploy it on hardware | +| 11 | Policy-based / learned control | MPC mobile base | Run model-predictive control for smooth base trajectory tracking | +| 12 | Policy-based / learned control | Force-control insertion | Adaptive force policy for peg-in-hole insertion | +| 13 | Mobile robots / AMR | Warehouse AMR | Goods-to-person transport AMR navigating a mapped warehouse | +| 14 | Mobile robots / AMR | Outdoor delivery | GPS-navigated last-yard delivery robot on sidewalks | +| 15 | Mobile robots / AMR | SLAM cleaning robot | Indoor robot builds a map and cleans coverage-complete | +| 16 | Mobile robots / AMR | Multi-robot fleet coord | Coordinate a fleet of AMRs to avoid deadlock and share tasks | +| 17 | Mobile robots / AMR | Inventory scanning rover | Autonomous rover scans shelves and reports stock | +| 18 | Mobile robots / AMR | Person-following cart | A cart follows a worker through a facility | +| 19 | Mobile robots / AMR | Field-scouting rover | Ag rover autonomously scouts rows and geo-tags findings | +| 20 | Industrial manipulation | Machine tending | Arm picks molded parts, vision QC, sorts good/reject into bins | +| 21 | Industrial manipulation | Palletizing | Arm stacks boxes onto a pallet in a computed pattern | +| 22 | Industrial manipulation | Vision-guided kitting | Assemble a kit by picking parts located by vision | +| 23 | Industrial manipulation | Conveyor tracking pick | Pick moving parts off a running conveyor | +| 24 | Industrial manipulation | Dispensing path follow | Follow a Cartesian path for glue/weld dispensing | +| 25 | Industrial manipulation | CNC loader tending | Load/unload a CNC with force-sensed insertion | +| 26 | Industrial manipulation | Random bin picking | Pick randomly-oriented parts using 3D pose estimation | +| 27 | Fleet management / ops | Zero-touch provisioning | Provision 1,000 new devices with no per-unit manual setup | +| 28 | Fleet management / ops | Staged model rollout | Roll a new ML model to a fleet in canary then full stages | +| 29 | Fleet management / ops | Fleet config via fragments | Manage shared config across many machines with fragments | +| 30 | Fleet management / ops | RBAC for customer fleet | Grant scoped access to customers over their own machines | +| 31 | Fleet management / ops | Fleet health monitoring | Dashboard + alerts on fleet health and offline devices | +| 32 | Fleet management / ops | Remote teleop intervention | Remotely take control of a stuck robot to recover it | +| 33 | Fleet management / ops | Scheduled maintenance jobs | Run recurring jobs (calibration, logs) across a fleet | +| 34 | Fleet management / ops | White-labeled billing | Bill end customers under a partner brand for fleet usage | +| 35 | IoT sensing / monitoring | Cold-chain monitoring | Monitor temperature across assets and alert on excursions | +| 36 | IoT sensing / monitoring | Predictive maintenance | Vibration sensing to predict equipment failure | +| 37 | IoT sensing / monitoring | Air-quality network | Network of air-quality sensors reporting to the cloud | +| 38 | IoT sensing / monitoring | Smart-building occupancy | Occupancy + energy sensing for building automation | +| 39 | IoT sensing / monitoring | Utility meter aggregation | Edge-aggregate meter reads and sync upstream | +| 40 | IoT sensing / monitoring | Equipment anomaly detect | Detect leaks/anomalies on industrial equipment at the edge | +| 41 | Computer vision apps | Line defect detection | Detect product defects on a production line and reject | +| 42 | Computer vision apps | PPE compliance | Monitor a site for PPE compliance and alert | +| 43 | Computer vision apps | Access-control camera | License-plate / face access control at a gate | +| 44 | Computer vision apps | Shelf-stock analytics | Analyze retail shelves for out-of-stock | +| 45 | Computer vision apps | Queue/people analytics | Count people and measure queue length | +| 46 | Data / ML pipeline | Capture+sync for training | Continuously capture and sync robot data to build datasets | +| 47 | Data / ML pipeline | Auto-retraining loop | Detect model drift and retrain/redeploy automatically | +| 48 | Data / ML pipeline | Custom training script | Train a specialized model with a custom script on Viam data | +| 49 | Data / ML pipeline | Edge inference + upload | Run inference at the edge and conditionally upload hard cases | +| 50 | Data / ML pipeline | Sensor-data BI dashboard | Query and visualize fleet sensor data for business insight | From eb68fe549ad2dbb68b166e232e039808f116e646 Mon Sep 17 00:00:00 2001 From: Brandon Shrewsbury Date: Wed, 1 Jul 2026 14:58:34 -0600 Subject: [PATCH 05/10] Reorganize: place concept pages in topical homes, drop generic concepts section Per review, a concept page belongs in the section of the thing it is about, not a catch-all bucket: - platform-model -> what-is-viam/ - confidence-scores -> vision/ (ML/detector output) - capture-frequency -> data/ (capture vs sensor sample rate) - inference-latency -> ai-control/ Deletes docs/concepts/. Updates section landings, cross-links, and README. --- CONCEPT-COVERAGE-README.md | 135 ++++++++++-------- docs/ai-control/_index.md | 2 + .../inference-latency.md | 2 +- docs/ai-control/learned-and-policy-control.md | 4 +- docs/ai-control/run-a-vla.md | 4 +- docs/concepts/_index.md | 23 --- docs/{concepts => data}/capture-frequency.md | 2 +- .../track-and-pick-moving-objects.md | 2 +- .../{concepts => vision}/confidence-scores.md | 2 +- docs/what-is-viam/_index.md | 2 + .../platform-model.md | 2 +- 11 files changed, 85 insertions(+), 95 deletions(-) rename docs/{concepts => ai-control}/inference-latency.md (99%) delete mode 100644 docs/concepts/_index.md rename docs/{concepts => data}/capture-frequency.md (99%) rename docs/{concepts => vision}/confidence-scores.md (99%) rename docs/{concepts => what-is-viam}/platform-model.md (99%) diff --git a/CONCEPT-COVERAGE-README.md b/CONCEPT-COVERAGE-README.md index 266868b82f..a295681c53 100644 --- a/CONCEPT-COVERAGE-README.md +++ b/CONCEPT-COVERAGE-README.md @@ -1,6 +1,6 @@ # Concept coverage analysis: 50 IoT/robotics use-cases -This branch adds four documentation sections that close concept gaps found by running a **use-case → concept coverage** analysis (Playbook 11) over 50 real IoT and robotics product use-cases. This README records the use-cases, the method, and the measured gap closure. +This branch closes concept gaps found by running a **use-case → concept coverage** analysis (Playbook 11) over 50 real IoT and robotics product use-cases. It adds three new sections (AI & control, navigation, manipulation) plus individual concept pages placed in their topical homes (what-is-viam, vision, data). This README records the use-cases, the method, and the measured gap closure. **Spreadsheet (full matrix, per-task concepts, gaps, learning objectives, before/after):** [`playbook11-use-case-coverage.xlsx`](https://drive.google.com/drive/folders/1ErpNHevOAVc_4YwqYM_dpG960QeAaLmA) (in the shared Drive folder). @@ -13,75 +13,84 @@ This branch adds four documentation sections that close concept gaps found by ru ## Gap closure (63 concepts) -| Coverage | Before | After | -|---|---:|---:| -| OWNED | 29 | 46 | -| PARTIAL | 10 | 10 | -| BURIED | 7 | 7 | -| SCATTERED | 3 | 0 | -| MISSING | 14 | 0 | +| Coverage | Before | After | +| ------------------------------------- | -----: | ----: | +| OWNED | 29 | 46 | +| PARTIAL | 10 | 10 | +| BURIED | 7 | 7 | +| SCATTERED | 3 | 0 | +| MISSING | 14 | 0 | | **No-owner gaps (MISSING+SCATTERED)** | **17** | **0** | The 7 BURIED concepts (kinematics, frames, motion-planning) stay deferred: they are owned inside the motion-planning section, which is under active edit. -## New sections in this PR (19 pages) +## What this PR adds (19 pages) -- **`docs/concepts/`** — platform model, confidence scores, inference latency, capture frequency vs sample rate -- **`docs/ai-control/`** — learned & policy-based control, run a VLA, integrate an LLM, simulation & sim-to-real +Each concept page lives in the section of the thing it is about, not a generic bucket. + +**New sections:** + +- **`docs/ai-control/`** — inference latency, learned & policy-based control, run a VLA, integrate an LLM, simulation & sim-to-real - **`docs/navigation/`** — localization, SLAM & mapping, navigate a mobile base, sensor fusion, coordinate a fleet - **`docs/manipulation/`** — force & compliance control, track & pick moving objects +**Pages added into existing sections:** + +- **`docs/what-is-viam/platform-model.md`** — how machines, parts, components, services, and modules fit together +- **`docs/vision/confidence-scores.md`** — what a confidence score is (and isn't); an ML/vision concept +- **`docs/data/capture-frequency.md`** — data-capture frequency vs sensor sample rate + ## The 50 use-cases -| # | Category | Use case | Job | -|---|---|---|---| -| 1 | VLA / foundation-model control | VLA bin picking | Prompt a manipulator in natural language to pick a named item from a mixed bin | -| 2 | VLA / foundation-model control | NL task commanding | Command an arm with 'put the red block in the box' and have it execute | -| 3 | VLA / foundation-model control | Open-vocab perception | Detect arbitrary, un-trained objects from a text prompt on a robot camera | -| 4 | VLA / foundation-model control | LLM task planner | Use an LLM to decompose a goal into robot skills and dispatch them | -| 5 | VLA / foundation-model control | VLA mobile manipulation | A mobile manipulator tidies a room from a spoken instruction | -| 6 | VLA / foundation-model control | Voice-to-action control | Drive a robot by voice command with speech + a VLA policy | -| 7 | Policy-based / learned control | RL locomotion | Deploy an RL-trained gait policy on a legged/wheeled base | -| 8 | Policy-based / learned control | Imitation assembly | Teach an assembly skill from demonstrations and replay it | -| 9 | Policy-based / learned control | Visuomotor grasp policy | Run a learned pixel-to-action grasping policy in a loop | -| 10 | Policy-based / learned control | Sim-to-real transfer | Train a control policy in sim and deploy it on hardware | -| 11 | Policy-based / learned control | MPC mobile base | Run model-predictive control for smooth base trajectory tracking | -| 12 | Policy-based / learned control | Force-control insertion | Adaptive force policy for peg-in-hole insertion | -| 13 | Mobile robots / AMR | Warehouse AMR | Goods-to-person transport AMR navigating a mapped warehouse | -| 14 | Mobile robots / AMR | Outdoor delivery | GPS-navigated last-yard delivery robot on sidewalks | -| 15 | Mobile robots / AMR | SLAM cleaning robot | Indoor robot builds a map and cleans coverage-complete | -| 16 | Mobile robots / AMR | Multi-robot fleet coord | Coordinate a fleet of AMRs to avoid deadlock and share tasks | -| 17 | Mobile robots / AMR | Inventory scanning rover | Autonomous rover scans shelves and reports stock | -| 18 | Mobile robots / AMR | Person-following cart | A cart follows a worker through a facility | -| 19 | Mobile robots / AMR | Field-scouting rover | Ag rover autonomously scouts rows and geo-tags findings | -| 20 | Industrial manipulation | Machine tending | Arm picks molded parts, vision QC, sorts good/reject into bins | -| 21 | Industrial manipulation | Palletizing | Arm stacks boxes onto a pallet in a computed pattern | -| 22 | Industrial manipulation | Vision-guided kitting | Assemble a kit by picking parts located by vision | -| 23 | Industrial manipulation | Conveyor tracking pick | Pick moving parts off a running conveyor | -| 24 | Industrial manipulation | Dispensing path follow | Follow a Cartesian path for glue/weld dispensing | -| 25 | Industrial manipulation | CNC loader tending | Load/unload a CNC with force-sensed insertion | -| 26 | Industrial manipulation | Random bin picking | Pick randomly-oriented parts using 3D pose estimation | -| 27 | Fleet management / ops | Zero-touch provisioning | Provision 1,000 new devices with no per-unit manual setup | -| 28 | Fleet management / ops | Staged model rollout | Roll a new ML model to a fleet in canary then full stages | -| 29 | Fleet management / ops | Fleet config via fragments | Manage shared config across many machines with fragments | -| 30 | Fleet management / ops | RBAC for customer fleet | Grant scoped access to customers over their own machines | -| 31 | Fleet management / ops | Fleet health monitoring | Dashboard + alerts on fleet health and offline devices | -| 32 | Fleet management / ops | Remote teleop intervention | Remotely take control of a stuck robot to recover it | -| 33 | Fleet management / ops | Scheduled maintenance jobs | Run recurring jobs (calibration, logs) across a fleet | -| 34 | Fleet management / ops | White-labeled billing | Bill end customers under a partner brand for fleet usage | -| 35 | IoT sensing / monitoring | Cold-chain monitoring | Monitor temperature across assets and alert on excursions | -| 36 | IoT sensing / monitoring | Predictive maintenance | Vibration sensing to predict equipment failure | -| 37 | IoT sensing / monitoring | Air-quality network | Network of air-quality sensors reporting to the cloud | -| 38 | IoT sensing / monitoring | Smart-building occupancy | Occupancy + energy sensing for building automation | -| 39 | IoT sensing / monitoring | Utility meter aggregation | Edge-aggregate meter reads and sync upstream | -| 40 | IoT sensing / monitoring | Equipment anomaly detect | Detect leaks/anomalies on industrial equipment at the edge | -| 41 | Computer vision apps | Line defect detection | Detect product defects on a production line and reject | -| 42 | Computer vision apps | PPE compliance | Monitor a site for PPE compliance and alert | -| 43 | Computer vision apps | Access-control camera | License-plate / face access control at a gate | -| 44 | Computer vision apps | Shelf-stock analytics | Analyze retail shelves for out-of-stock | -| 45 | Computer vision apps | Queue/people analytics | Count people and measure queue length | -| 46 | Data / ML pipeline | Capture+sync for training | Continuously capture and sync robot data to build datasets | -| 47 | Data / ML pipeline | Auto-retraining loop | Detect model drift and retrain/redeploy automatically | -| 48 | Data / ML pipeline | Custom training script | Train a specialized model with a custom script on Viam data | -| 49 | Data / ML pipeline | Edge inference + upload | Run inference at the edge and conditionally upload hard cases | -| 50 | Data / ML pipeline | Sensor-data BI dashboard | Query and visualize fleet sensor data for business insight | +| # | Category | Use case | Job | +| --- | ------------------------------ | -------------------------- | ------------------------------------------------------------------------------ | +| 1 | VLA / foundation-model control | VLA bin picking | Prompt a manipulator in natural language to pick a named item from a mixed bin | +| 2 | VLA / foundation-model control | NL task commanding | Command an arm with 'put the red block in the box' and have it execute | +| 3 | VLA / foundation-model control | Open-vocab perception | Detect arbitrary, un-trained objects from a text prompt on a robot camera | +| 4 | VLA / foundation-model control | LLM task planner | Use an LLM to decompose a goal into robot skills and dispatch them | +| 5 | VLA / foundation-model control | VLA mobile manipulation | A mobile manipulator tidies a room from a spoken instruction | +| 6 | VLA / foundation-model control | Voice-to-action control | Drive a robot by voice command with speech + a VLA policy | +| 7 | Policy-based / learned control | RL locomotion | Deploy an RL-trained gait policy on a legged/wheeled base | +| 8 | Policy-based / learned control | Imitation assembly | Teach an assembly skill from demonstrations and replay it | +| 9 | Policy-based / learned control | Visuomotor grasp policy | Run a learned pixel-to-action grasping policy in a loop | +| 10 | Policy-based / learned control | Sim-to-real transfer | Train a control policy in sim and deploy it on hardware | +| 11 | Policy-based / learned control | MPC mobile base | Run model-predictive control for smooth base trajectory tracking | +| 12 | Policy-based / learned control | Force-control insertion | Adaptive force policy for peg-in-hole insertion | +| 13 | Mobile robots / AMR | Warehouse AMR | Goods-to-person transport AMR navigating a mapped warehouse | +| 14 | Mobile robots / AMR | Outdoor delivery | GPS-navigated last-yard delivery robot on sidewalks | +| 15 | Mobile robots / AMR | SLAM cleaning robot | Indoor robot builds a map and cleans coverage-complete | +| 16 | Mobile robots / AMR | Multi-robot fleet coord | Coordinate a fleet of AMRs to avoid deadlock and share tasks | +| 17 | Mobile robots / AMR | Inventory scanning rover | Autonomous rover scans shelves and reports stock | +| 18 | Mobile robots / AMR | Person-following cart | A cart follows a worker through a facility | +| 19 | Mobile robots / AMR | Field-scouting rover | Ag rover autonomously scouts rows and geo-tags findings | +| 20 | Industrial manipulation | Machine tending | Arm picks molded parts, vision QC, sorts good/reject into bins | +| 21 | Industrial manipulation | Palletizing | Arm stacks boxes onto a pallet in a computed pattern | +| 22 | Industrial manipulation | Vision-guided kitting | Assemble a kit by picking parts located by vision | +| 23 | Industrial manipulation | Conveyor tracking pick | Pick moving parts off a running conveyor | +| 24 | Industrial manipulation | Dispensing path follow | Follow a Cartesian path for glue/weld dispensing | +| 25 | Industrial manipulation | CNC loader tending | Load/unload a CNC with force-sensed insertion | +| 26 | Industrial manipulation | Random bin picking | Pick randomly-oriented parts using 3D pose estimation | +| 27 | Fleet management / ops | Zero-touch provisioning | Provision 1,000 new devices with no per-unit manual setup | +| 28 | Fleet management / ops | Staged model rollout | Roll a new ML model to a fleet in canary then full stages | +| 29 | Fleet management / ops | Fleet config via fragments | Manage shared config across many machines with fragments | +| 30 | Fleet management / ops | RBAC for customer fleet | Grant scoped access to customers over their own machines | +| 31 | Fleet management / ops | Fleet health monitoring | Dashboard + alerts on fleet health and offline devices | +| 32 | Fleet management / ops | Remote teleop intervention | Remotely take control of a stuck robot to recover it | +| 33 | Fleet management / ops | Scheduled maintenance jobs | Run recurring jobs (calibration, logs) across a fleet | +| 34 | Fleet management / ops | White-labeled billing | Bill end customers under a partner brand for fleet usage | +| 35 | IoT sensing / monitoring | Cold-chain monitoring | Monitor temperature across assets and alert on excursions | +| 36 | IoT sensing / monitoring | Predictive maintenance | Vibration sensing to predict equipment failure | +| 37 | IoT sensing / monitoring | Air-quality network | Network of air-quality sensors reporting to the cloud | +| 38 | IoT sensing / monitoring | Smart-building occupancy | Occupancy + energy sensing for building automation | +| 39 | IoT sensing / monitoring | Utility meter aggregation | Edge-aggregate meter reads and sync upstream | +| 40 | IoT sensing / monitoring | Equipment anomaly detect | Detect leaks/anomalies on industrial equipment at the edge | +| 41 | Computer vision apps | Line defect detection | Detect product defects on a production line and reject | +| 42 | Computer vision apps | PPE compliance | Monitor a site for PPE compliance and alert | +| 43 | Computer vision apps | Access-control camera | License-plate / face access control at a gate | +| 44 | Computer vision apps | Shelf-stock analytics | Analyze retail shelves for out-of-stock | +| 45 | Computer vision apps | Queue/people analytics | Count people and measure queue length | +| 46 | Data / ML pipeline | Capture+sync for training | Continuously capture and sync robot data to build datasets | +| 47 | Data / ML pipeline | Auto-retraining loop | Detect model drift and retrain/redeploy automatically | +| 48 | Data / ML pipeline | Custom training script | Train a specialized model with a custom script on Viam data | +| 49 | Data / ML pipeline | Edge inference + upload | Run inference at the edge and conditionally upload hard cases | +| 50 | Data / ML pipeline | Sensor-data BI dashboard | Query and visualize fleet sensor data for business insight | diff --git a/docs/ai-control/_index.md b/docs/ai-control/_index.md index 19d4b52d01..9d24b75832 100644 --- a/docs/ai-control/_index.md +++ b/docs/ai-control/_index.md @@ -18,6 +18,8 @@ model in a [module](/build-modules/) that implements a component or service API, and your application talks to it through the standard APIs. This section explains how each kind of model fits that pattern. +- [Inference latency and loop rate](inference-latency/): why a model in the + loop cannot run faster than its own inference time, and how to size it. - [Learned and policy-based control](learned-and-policy-control/): when a trained policy beats a hand-written controller, and how it runs on a machine. - [Run a vision-language-action model](run-a-vla/): drive a robot from a camera diff --git a/docs/concepts/inference-latency.md b/docs/ai-control/inference-latency.md similarity index 99% rename from docs/concepts/inference-latency.md rename to docs/ai-control/inference-latency.md index cf5cafd292..f40397d867 100644 --- a/docs/concepts/inference-latency.md +++ b/docs/ai-control/inference-latency.md @@ -1,7 +1,7 @@ --- linkTitle: "Inference latency and loop rate" title: "Inference latency and loop rate" -weight: 30 +weight: 5 layout: "docs" type: "docs" description: "Understand why model inference latency sets the ceiling on how fast a perception or control loop can run, and estimate an achievable loop rate from model size, image resolution, and hardware." diff --git a/docs/ai-control/learned-and-policy-control.md b/docs/ai-control/learned-and-policy-control.md index a9c5f2b97e..a297620023 100644 --- a/docs/ai-control/learned-and-policy-control.md +++ b/docs/ai-control/learned-and-policy-control.md @@ -110,7 +110,7 @@ compute available on the machine, and whether the model runs on CPU, GPU, or an accelerator. Before you commit a policy to hardware, measure its worst-case inference time on the target device and confirm it leaves room for sensor reads and actuation within the loop period. See -[inference latency](/concepts/inference-latency/) for how to reason about this +[inference latency](/ai-control/inference-latency/) for how to reason about this budget. This constraint often shapes the policy itself. A smaller or quantized model that @@ -122,7 +122,7 @@ not really controlling in real time. - Learn how to package and deploy code on a machine in [Build modules](/build-modules/). -- Understand the timing budget in [inference latency](/concepts/inference-latency/). +- Understand the timing budget in [inference latency](/ai-control/inference-latency/). - Review the classical baseline in the [controls package](/reference/controls-package/) before reaching for a learned policy. diff --git a/docs/ai-control/run-a-vla.md b/docs/ai-control/run-a-vla.md index 08679fc601..16f443b841 100644 --- a/docs/ai-control/run-a-vla.md +++ b/docs/ai-control/run-a-vla.md @@ -56,7 +56,7 @@ Weigh latency against model size for your target action rate: a 1 Hz "observe, plan, act" loop tolerates cloud latency, while a 10 Hz visual servoing loop needs edge inference. For a fuller treatment of this tradeoff, see -[Inference latency](/concepts/inference-latency/). +[Inference latency](/ai-control/inference-latency/). ### 2. Wrap the model as a module, or call a hosted API @@ -172,7 +172,7 @@ latency plus the time each command takes to execute, and leave margin so commands do not queue up. When inference is slower than your target action rate, either move the model to the edge, choose a smaller model, or slow the loop to match. -See [Inference latency](/concepts/inference-latency/) for how latency shapes a +See [Inference latency](/ai-control/inference-latency/) for how latency shapes a control loop. ## Next steps diff --git a/docs/concepts/_index.md b/docs/concepts/_index.md deleted file mode 100644 index b5336764ee..0000000000 --- a/docs/concepts/_index.md +++ /dev/null @@ -1,23 +0,0 @@ ---- -linkTitle: "Concepts" -title: "Core concepts" -weight: 3 -layout: "docs" -type: "docs" -no_list: true -description: "Cross-cutting ideas the rest of the docs assume: the platform model, confidence scores, inference latency, and data sampling." ---- - -Some ideas show up across the whole product. A vision page, a data page, and -a fleet page all lean on them, but none of those pages is the right place to -define them. This section is that place. Read a page here once, and the rest -of the docs read more clearly. - -- [How Viam fits together](platform-model/): machines, parts, components, - services, and modules, the vocabulary every other section uses. -- [What a confidence score is (and isn't)](confidence-scores/): why a `0.9` - is not a 90% probability, and how to pick a threshold. -- [Inference latency and loop rate](inference-latency/): why a perception or - control loop cannot run faster than the model behind it. -- [Capture frequency versus sample rate](capture-frequency/): the difference - between how often you record and how fast a sensor measures. diff --git a/docs/concepts/capture-frequency.md b/docs/data/capture-frequency.md similarity index 99% rename from docs/concepts/capture-frequency.md rename to docs/data/capture-frequency.md index 313e3d67a4..c9d5913d74 100644 --- a/docs/concepts/capture-frequency.md +++ b/docs/data/capture-frequency.md @@ -1,7 +1,7 @@ --- linkTitle: "Capture frequency versus sample rate" title: "Capture frequency versus sample rate" -weight: 40 +weight: 5 layout: "docs" type: "docs" description: "Understand how the data-capture polling frequency differs from a sensor's internal sample rate, and how to choose a capture frequency that balances fidelity against storage cost." diff --git a/docs/manipulation/track-and-pick-moving-objects.md b/docs/manipulation/track-and-pick-moving-objects.md index 8e07eaad65..ce327beced 100644 --- a/docs/manipulation/track-and-pick-moving-objects.md +++ b/docs/manipulation/track-and-pick-moving-objects.md @@ -147,7 +147,7 @@ To raise the belt speed, shrink the latency budget: use a faster detector, reduce planning time by constraining the workspace, or shorten arm travel by starting each cycle from a pose near the belt. Measure each stage separately so you tune the one that dominates. For how inference time enters this budget and -how to measure it, see [Inference latency](/concepts/inference-latency/). +how to measure it, see [Inference latency](/ai-control/inference-latency/). If picks miss intermittently, compare your assumed `t_pick` against the measured end-to-end time under load. A budget that holds at rest often grows once the diff --git a/docs/concepts/confidence-scores.md b/docs/vision/confidence-scores.md similarity index 99% rename from docs/concepts/confidence-scores.md rename to docs/vision/confidence-scores.md index c715ab7d53..95f1d07f36 100644 --- a/docs/concepts/confidence-scores.md +++ b/docs/vision/confidence-scores.md @@ -1,7 +1,7 @@ --- linkTitle: "Confidence scores" title: "What a confidence score is (and isn't)" -weight: 20 +weight: 7 layout: "docs" type: "docs" description: "What the confidence value on a detection or classification measures, why it is not a calibrated probability, and how to reason about accept and reject thresholds for a quality-control task." diff --git a/docs/what-is-viam/_index.md b/docs/what-is-viam/_index.md index 4577b37f4f..64b696f4d6 100644 --- a/docs/what-is-viam/_index.md +++ b/docs/what-is-viam/_index.md @@ -58,6 +58,8 @@ Define a combination of components, services, and modules once, then apply that Use fragments to configure a camera-arm combination, a camera-to-object-detection pipeline, or an entire work cell. Fragments support variable substitution and per-machine overwrites, so you can deploy the same base configuration to hundreds of machines while accommodating site-specific settings. +For a closer look at how machines, parts, components, services, and modules relate, and the difference between using the APIs and authoring a module, see [How Viam fits together](platform-model/). + ## Viam capabilities - **[Get hardware running in minutes](/hardware/):** Add a camera, motor, arm, or sensor to your configuration with a few parameters. `viam-server` pulls the driver and exposes the device through a consistent API. No writing drivers, no managing dependencies. diff --git a/docs/concepts/platform-model.md b/docs/what-is-viam/platform-model.md similarity index 99% rename from docs/concepts/platform-model.md rename to docs/what-is-viam/platform-model.md index 9c32968541..4c4aa799a5 100644 --- a/docs/concepts/platform-model.md +++ b/docs/what-is-viam/platform-model.md @@ -1,7 +1,7 @@ --- linkTitle: "How Viam fits together" title: "How Viam fits together" -weight: 10 +weight: 20 layout: "docs" type: "docs" description: "A tour of Viam's core vocabulary, machine, part, component, service, module, and modular resource, for developers new to robotics." From 0e21cd4b2f9c22a8648a64ee93eeabf5797d96fc Mon Sep 17 00:00:00 2001 From: Brandon Shrewsbury Date: Wed, 1 Jul 2026 15:02:54 -0600 Subject: [PATCH 06/10] Fix vale error: avoid 'via' in README use-case table (Viam.AvoidObscure) --- CONCEPT-COVERAGE-README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/CONCEPT-COVERAGE-README.md b/CONCEPT-COVERAGE-README.md index a295681c53..8c5873199e 100644 --- a/CONCEPT-COVERAGE-README.md +++ b/CONCEPT-COVERAGE-README.md @@ -72,7 +72,7 @@ Each concept page lives in the section of the thing it is about, not a generic b | 26 | Industrial manipulation | Random bin picking | Pick randomly-oriented parts using 3D pose estimation | | 27 | Fleet management / ops | Zero-touch provisioning | Provision 1,000 new devices with no per-unit manual setup | | 28 | Fleet management / ops | Staged model rollout | Roll a new ML model to a fleet in canary then full stages | -| 29 | Fleet management / ops | Fleet config via fragments | Manage shared config across many machines with fragments | +| 29 | Fleet management / ops | Fleet config with fragments | Manage shared config across many machines with fragments | | 30 | Fleet management / ops | RBAC for customer fleet | Grant scoped access to customers over their own machines | | 31 | Fleet management / ops | Fleet health monitoring | Dashboard + alerts on fleet health and offline devices | | 32 | Fleet management / ops | Remote teleop intervention | Remotely take control of a stuck robot to recover it | From 28519ced1a474bd44221dc29129022b49b3d658e Mon Sep 17 00:00:00 2001 From: Brandon Shrewsbury Date: Wed, 1 Jul 2026 15:09:50 -0600 Subject: [PATCH 07/10] Fix htmltest: repoint links to this repo's IA, make sibling links absolute htmltest found 8 broken targets (IA-reorg paths and relative sibling links that don't resolve from leaf pages): - /operate/* and /components/base/sensor-controlled/ -> real /reference and /hardware and /build-modules targets - slam-and-mapping relative localization//navigate links -> absolute /navigation/... targets --- docs/ai-control/inference-latency.md | 4 ++-- docs/ai-control/integrate-an-llm.md | 2 +- docs/manipulation/force-and-compliance-control.md | 2 +- docs/navigation/navigate-a-mobile-base.md | 12 ++++++------ docs/navigation/slam-and-mapping.md | 10 +++++----- docs/what-is-viam/platform-model.md | 6 +++--- 6 files changed, 18 insertions(+), 18 deletions(-) diff --git a/docs/ai-control/inference-latency.md b/docs/ai-control/inference-latency.md index f40397d867..0fb55e4b3e 100644 --- a/docs/ai-control/inference-latency.md +++ b/docs/ai-control/inference-latency.md @@ -99,7 +99,7 @@ The tolerable loop rate follows from what the loop controls. **Real-time control** acts on the physical world, where staleness compounds. A base moving at 1 m/s travels 20 cm during a 200 ms inference call, so at 5 Hz every decision is based on a frame already 20 cm out of date. For steering, obstacle avoidance, or closed-loop reaction, you generally want inference well under the physical time constant of the system, and you want that latency to be steady rather than bursty. -Viam's feedback controllers expose their own cadence directly: a [sensor-controlled base](/components/base/sensor-controlled/) accepts a `control_frequency_hz` value (default 10 Hz), and its movement sensors must report at least that fast for the loop to hold rate. +Viam's feedback controllers expose their own cadence directly: a [sensor-controlled base](/reference/components/base/sensor-controlled/) accepts a `control_frequency_hz` value (default 10 Hz), and its movement sensors must report at least that fast for the loop to hold rate. **Monitoring and logging** consume detections rather than steering on them, so a loop running at 1 Hz, or slower, is often plenty. Here you can favor a larger, more accurate model or a remote GPU and accept the higher per-frame latency, because no actuator is waiting on the result. @@ -112,5 +112,5 @@ Because these levers trade accuracy and cost against speed, measuring inference ## Next steps - Learn how detection and classification calls work in the [vision service](/vision/). -- See how a feedback loop consumes sensor input at a fixed rate on a [sensor-controlled base](/components/base/sensor-controlled/). +- See how a feedback loop consumes sensor input at a fixed rate on a [sensor-controlled base](/reference/components/base/sensor-controlled/). - Explore the [components](/components/) that a perception or control loop reads from and acts on. diff --git a/docs/ai-control/integrate-an-llm.md b/docs/ai-control/integrate-an-llm.md index be821e128d..29e9f1e8c9 100644 --- a/docs/ai-control/integrate-an-llm.md +++ b/docs/ai-control/integrate-an-llm.md @@ -24,7 +24,7 @@ Steps 4 and 5 cover this validation and the timeouts and human-confirmation gate ## Prerequisites -- A machine with the components and services your robot uses (for example a [base](/reference/apis/components/base/) and an [arm](/reference/apis/components/arm/)), already [configured](/operate/get-started/supported-hardware/). +- A machine with the components and services your robot uses (for example a [base](/reference/apis/components/base/) and an [arm](/reference/apis/components/arm/)), already [configured](/hardware/). - Credentials for an LLM provider, or a local model you can query. - Familiarity with [writing a module](/build-modules/). diff --git a/docs/manipulation/force-and-compliance-control.md b/docs/manipulation/force-and-compliance-control.md index e2f7ddaa0b..f4e5b75d4b 100644 --- a/docs/manipulation/force-and-compliance-control.md +++ b/docs/manipulation/force-and-compliance-control.md @@ -81,7 +81,7 @@ Force control is an active area rather than a single turnkey primitive. The building blocks are a force-capable arm or an arm paired with a wrist F/T sensor, a fast feedback loop, and a control strategy tuned to the task. In practice, teams implement this pattern as a custom -[module](/operate/get-started/other-hardware/) that reads the F/T sensor, +[module](/build-modules/write-a-driver-module/) that reads the F/T sensor, runs the force loop, and commands the arm. Treat any specific force-control API as something you provide in your module rather than a built-in signature, and size the approach to the hardware you actually have: a sensitive insertion needs diff --git a/docs/navigation/navigate-a-mobile-base.md b/docs/navigation/navigate-a-mobile-base.md index b71a26bb45..d50ae037c4 100644 --- a/docs/navigation/navigate-a-mobile-base.md +++ b/docs/navigation/navigate-a-mobile-base.md @@ -102,12 +102,12 @@ To confirm the final pose, read the localization source directly: If a navigation attempt fails, match the symptom to the input it depends on: -| Symptom | Missing input | Fix | -| ---------------------------------------------------------------------------- | -------------------- | -------------------------------------------------------------------------------------------------------------------------------------------- | -| Error that the base has no pose, or the plan starts from the wrong location. | Localization source. | Confirm the movement sensor has a GPS fix or the SLAM service reports a pose (Step 1). | -| `MoveOnMap` reports no map, or the base leaves the mapped area. | SLAM map. | Confirm the SLAM service is running and the destination falls within its map (Step 1). | -| Base stops early or refuses to plan a path near an object. | Obstacle source. | Pass known obstacles to the `obstacles` argument, or add a [vision service](/operate/reference/services/vision/) obstacle detector (Step 3). | -| Motion service cannot relate the base and its sensor. | Frame system. | Confirm the frame system places the sensor on the base (Step 2). | +| Symptom | Missing input | Fix | +| ---------------------------------------------------------------------------- | -------------------- | ------------------------------------------------------------------------------------------------------------------------------------ | +| Error that the base has no pose, or the plan starts from the wrong location. | Localization source. | Confirm the movement sensor has a GPS fix or the SLAM service reports a pose (Step 1). | +| `MoveOnMap` reports no map, or the base leaves the mapped area. | SLAM map. | Confirm the SLAM service is running and the destination falls within its map (Step 1). | +| Base stops early or refuses to plan a path near an object. | Obstacle source. | Pass known obstacles to the `obstacles` argument, or add a [vision service](/reference/services/vision/) obstacle detector (Step 3). | +| Motion service cannot relate the base and its sensor. | Frame system. | Confirm the frame system places the sensor on the base (Step 2). | ## Next steps diff --git a/docs/navigation/slam-and-mapping.md b/docs/navigation/slam-and-mapping.md index f3f6bfc88a..129d8e0386 100644 --- a/docs/navigation/slam-and-mapping.md +++ b/docs/navigation/slam-and-mapping.md @@ -30,7 +30,7 @@ wall on the map, the robot needs to know where it was standing when it saw the wall; to know where it is standing, it needs a map to compare its view against. SLAM breaks the loop by solving both together. Each new sensor reading nudges the map and the pose estimate at the same time, and the two converge as the robot -explores. This is what separates SLAM from plain [localization](localization/): +explores. This is what separates SLAM from plain [localization](/navigation/localization/): localization answers _where am I_ against a map that already exists, while SLAM produces the map and the answer together. @@ -79,7 +79,7 @@ Because these are configuration choices on the SLAM service rather than separate components, you can switch modes by changing the service configuration. For the configuration shape and supported sensors of a specific implementation, see the SLAM modules in the [Viam Registry](https://app.viam.com/registry) and -[How a robot knows its position](localization/). +[How a robot knows its position](/navigation/localization/). ## How the map feeds navigation @@ -87,13 +87,13 @@ The map and pose are inputs, not the goal. Once SLAM reports where the robot is a map, navigation can plan a route across that map to a destination and drive the base there, steering around the obstacles the map records. That handoff, from "where am I on the map" to "drive me to that spot", is covered in -[Navigate a mobile base to a goal](navigate-a-mobile-base/). +[Navigate a mobile base to a goal](/navigation/navigate-a-mobile-base/). ## Next steps -- [How a robot knows its position](localization/): how odometry, GPS, and SLAM +- [How a robot knows its position](/navigation/localization/): how odometry, GPS, and SLAM compare as sources of position. -- [Navigate a mobile base to a goal](navigate-a-mobile-base/): turn a map and a +- [Navigate a mobile base to a goal](/navigation/navigate-a-mobile-base/): turn a map and a pose into motion toward a destination. - [SLAM modules in the registry](https://app.viam.com/registry): configuration fields and supported sensors for a specific SLAM implementation. diff --git a/docs/what-is-viam/platform-model.md b/docs/what-is-viam/platform-model.md index 4c4aa799a5..9a8055a2db 100644 --- a/docs/what-is-viam/platform-model.md +++ b/docs/what-is-viam/platform-model.md @@ -69,7 +69,7 @@ As a developer, you meet Viam from one of two directions, and telling them apart When you write a **client script**, _you_ are the caller. Your program connects to a machine, gets a handle to a resource, and calls its API methods, read this sensor, move that arm, run detections on this camera. Control flows outward from your code into the machine, and this is how most applications, dashboards, and automations are built. -See [Control a machine](/operate/control/) for this path. +See [Control a machine](/hardware/) for this path. When you **author a module**, the relationship inverts: _the platform_ is the caller. You implement the methods of a component or service API, and `viam-server` invokes your code whenever a client asks that resource to do something. @@ -92,6 +92,6 @@ With this vocabulary in place, the rest of the documentation, configuration, mod ## Next steps -- [Configure and control a machine](/operate/), put components and services on a real part and drive them. +- [Configure and control a machine](/hardware/), put components and services on a real part and drive them. - [Build a module](/build-modules/overview/), author your own modular resource against a standard API. -- [Machine architecture reference](/operate/reference/architecture/), a closer look at how parts, resources, and `viam-server` connect. +- [Machine architecture reference](/reference/machine-to-machine-comms/), a closer look at how parts, resources, and `viam-server` connect. From 622da1a486bc3be48eda828fa51ebfd6ac913d83 Mon Sep 17 00:00:00 2001 From: Brandon Shrewsbury Date: Wed, 1 Jul 2026 15:15:11 -0600 Subject: [PATCH 08/10] Fix htmltest 'directory, no index': /components/ -> /reference/components/ The /components/* alias paths render as directories without an index page. Repoint all bare /components/ links in the new pages to the real /reference/components/* pages (base, movement-sensor, and the index). --- docs/ai-control/inference-latency.md | 2 +- docs/navigation/localization.md | 8 ++++---- docs/navigation/navigate-a-mobile-base.md | 8 ++++---- docs/navigation/sensor-fusion.md | 4 ++-- 4 files changed, 11 insertions(+), 11 deletions(-) diff --git a/docs/ai-control/inference-latency.md b/docs/ai-control/inference-latency.md index 0fb55e4b3e..aaee98ab3f 100644 --- a/docs/ai-control/inference-latency.md +++ b/docs/ai-control/inference-latency.md @@ -113,4 +113,4 @@ Because these levers trade accuracy and cost against speed, measuring inference - Learn how detection and classification calls work in the [vision service](/vision/). - See how a feedback loop consumes sensor input at a fixed rate on a [sensor-controlled base](/reference/components/base/sensor-controlled/). -- Explore the [components](/components/) that a perception or control loop reads from and acts on. +- Explore the [components](/reference/components/) that a perception or control loop reads from and acts on. diff --git a/docs/navigation/localization.md b/docs/navigation/localization.md index 64ac0aeb5a..6d5ab858a2 100644 --- a/docs/navigation/localization.md +++ b/docs/navigation/localization.md @@ -28,7 +28,7 @@ Every wheel slip, uneven tile, or rounding error adds a small mistake to the est After a long run the estimated pose can be meters away from the true pose, even though each individual reading looked reasonable. Drift is why a robot that relies only on odometry gradually loses track of the dock. -Viam exposes wheel odometry through the [movement sensor](/components/movement-sensor/) component, using the `wheeled-odometry` model, which derives velocity and position from motor encoders. +Viam exposes wheel odometry through the [movement sensor](/reference/components/movement-sensor/) component, using the `wheeled-odometry` model, which derives velocity and position from motor encoders. ## GPS: an absolute outdoor fix @@ -39,7 +39,7 @@ Because each reading is independent, GPS does not drift: an error in one reading The trade-offs are environment and precision. GPS needs a clear view of the sky, so it works outdoors but degrades or fails indoors, in tunnels, and under dense cover. Standard GPS is accurate to a few meters, which is fine for a lawn robot crossing a yard but too coarse to dock precisely. -In Viam, GPS receivers and inertial measurement units (IMUs) are also [movement sensor](/components/movement-sensor/) models, so the same component API surfaces both absolute position and orientation. +In Viam, GPS receivers and inertial measurement units (IMUs) are also [movement sensor](/reference/components/movement-sensor/) models, so the same component API surfaces both absolute position and orientation. ## SLAM: building a map while you use it @@ -80,10 +80,10 @@ Which sources a machine needs follows from where it runs and how precise it must - **High precision anywhere** (docking, tight aisles): pair an absolute source with odometry, because odometry alone will not stay accurate long enough to line up. Start from the environment to rule sources in or out (GPS outdoors, SLAM indoors), then decide whether the required precision and run length demand an absolute source at all. -That decision tells you which [movement sensors](/components/movement-sensor/) or range sensors to put on the machine. +That decision tells you which [movement sensors](/reference/components/movement-sensor/) or range sensors to put on the machine. ## Next steps -- [Movement sensor component](/components/movement-sensor/): configure GPS, IMU, and wheeled-odometry models. +- [Movement sensor component](/reference/components/movement-sensor/): configure GPS, IMU, and wheeled-odometry models. - [SLAM and mapping](/navigation/slam-and-mapping/): build and use maps for indoor localization. - [Sensor fusion](/navigation/sensor-fusion/): combine relative and absolute sources into one pose estimate. diff --git a/docs/navigation/navigate-a-mobile-base.md b/docs/navigation/navigate-a-mobile-base.md index d50ae037c4..69bb125813 100644 --- a/docs/navigation/navigate-a-mobile-base.md +++ b/docs/navigation/navigate-a-mobile-base.md @@ -16,9 +16,9 @@ This page assembles the localization and motion inputs a base needs, issues the Before you start, configure the following on your machine: -- A [mobile base](/components/base/) that you can already drive with velocity or position commands. +- A [mobile base](/reference/components/base/) that you can already drive with velocity or position commands. - A localization source that reports the base's pose: - - A [movement sensor](/components/movement-sensor/) that provides GPS position, for a geographic goal. + - A [movement sensor](/reference/components/movement-sensor/) that provides GPS position, for a geographic goal. - A [SLAM service](/navigation/slam-and-mapping/) that provides a map and pose, for a map goal. - The [motion service](/reference/apis/services/motion/), which plans the path and issues drive commands to the base. @@ -95,7 +95,7 @@ For full parameters, obstacle geometry types, and other SDKs, see the [motion se Track progress and completion with `GetPlan` and `ListPlanStatuses`, and stop an in-progress move with `StopPlan`. To confirm the final pose, read the localization source directly: -- For a geographic goal, read the [movement sensor](/components/movement-sensor/) position. +- For a geographic goal, read the [movement sensor](/reference/components/movement-sensor/) position. - For a map goal, read the base pose from the [SLAM service](/navigation/slam-and-mapping/). ## Troubleshooting @@ -113,4 +113,4 @@ If a navigation attempt fails, match the symptom to the input it depends on: - [Localization](/navigation/localization/): compare GPS and SLAM localization sources. - [Motion service API](/reference/apis/services/motion/): full `MoveOnGlobe` and `MoveOnMap` parameters. -- [Base component](/components/base/): tune the base that carries out the plan. +- [Base component](/reference/components/base/): tune the base that carries out the plan. diff --git a/docs/navigation/sensor-fusion.md b/docs/navigation/sensor-fusion.md index e1b528571b..7fbbcbc444 100644 --- a/docs/navigation/sensor-fusion.md +++ b/docs/navigation/sensor-fusion.md @@ -28,7 +28,7 @@ A fused pose reflects the IMU's fast, fine-grained motion and the GPS's absolute ## How Viam's `merged` movement sensor differs -Viam ships a [`merged` movement sensor](/components/movement-sensor/) model, and the name invites a natural assumption. +Viam ships a [`merged` movement sensor](/reference/components/movement-sensor/) model, and the name invites a natural assumption. It is worth being precise, so you set the right expectations. The `merged` model performs **selection and aggregation**, not statistical fusion. @@ -77,4 +77,4 @@ Either way, the fusion lives in software you choose or hardware you select, and ## Next steps - Learn how a machine turns sensor readings into a position estimate in [Localization](/navigation/localization/). -- See the available movement sensor models, including [`merged`](/components/movement-sensor/), to decide which sensors to combine. +- See the available movement sensor models, including [`merged`](/reference/components/movement-sensor/), to decide which sensors to combine. From 2e2251c06d5c47698d020c98e2bc57af36ce3887 Mon Sep 17 00:00:00 2001 From: Brandon Shrewsbury Date: Wed, 1 Jul 2026 15:22:59 -0600 Subject: [PATCH 09/10] Add aliases for moved /concepts/* URLs (fixes Netlify no-more-404) The reorg moved the concepts/ pages; the Netlify no-more-404 plugin flagged the old URLs as 404s. Preserve them as Hugo aliases (301) to their new homes. --- docs/ai-control/inference-latency.md | 2 ++ docs/data/capture-frequency.md | 2 ++ docs/vision/confidence-scores.md | 2 ++ docs/what-is-viam/platform-model.md | 3 +++ 4 files changed, 9 insertions(+) diff --git a/docs/ai-control/inference-latency.md b/docs/ai-control/inference-latency.md index aaee98ab3f..73e8d3d4ae 100644 --- a/docs/ai-control/inference-latency.md +++ b/docs/ai-control/inference-latency.md @@ -5,6 +5,8 @@ weight: 5 layout: "docs" type: "docs" description: "Understand why model inference latency sets the ceiling on how fast a perception or control loop can run, and estimate an achievable loop rate from model size, image resolution, and hardware." +aliases: + - "/concepts/inference-latency/" --- Consider a small program that watches a camera and reacts to what it sees: diff --git a/docs/data/capture-frequency.md b/docs/data/capture-frequency.md index c9d5913d74..9c03d95ecd 100644 --- a/docs/data/capture-frequency.md +++ b/docs/data/capture-frequency.md @@ -5,6 +5,8 @@ weight: 5 layout: "docs" type: "docs" description: "Understand how the data-capture polling frequency differs from a sensor's internal sample rate, and how to choose a capture frequency that balances fidelity against storage cost." +aliases: + - "/concepts/capture-frequency/" --- Suppose you configure data capture on a temperature sensor with `capture_frequency_hz` set to `1`, and the sensor hardware samples its internal thermistor at 100 Hz. diff --git a/docs/vision/confidence-scores.md b/docs/vision/confidence-scores.md index 95f1d07f36..305812204b 100644 --- a/docs/vision/confidence-scores.md +++ b/docs/vision/confidence-scores.md @@ -5,6 +5,8 @@ weight: 7 layout: "docs" type: "docs" description: "What the confidence value on a detection or classification measures, why it is not a calibrated probability, and how to reason about accept and reject thresholds for a quality-control task." +aliases: + - "/concepts/confidence-scores/" --- Point a person detector at an image and it returns something like `person: 0.82`. That `0.82` is a confidence score: a number between `0.0` and `1.0` that rides along with every result from [`GetDetections`](/reference/apis/services/vision/#getdetections) and [`GetClassifications`](/reference/apis/services/vision/#getclassifications). It is one of the most useful signals the vision service gives you, and one of the easiest to misread. This page explains what the number measures, how far you can trust it, and how to turn it into an accept or reject decision. diff --git a/docs/what-is-viam/platform-model.md b/docs/what-is-viam/platform-model.md index 9a8055a2db..2959fd30ad 100644 --- a/docs/what-is-viam/platform-model.md +++ b/docs/what-is-viam/platform-model.md @@ -5,6 +5,9 @@ weight: 20 layout: "docs" type: "docs" description: "A tour of Viam's core vocabulary, machine, part, component, service, module, and modular resource, for developers new to robotics." +aliases: + - "/concepts/platform-model/" + - "/concepts/" --- Picture a small delivery robot: two motors, a camera up front, a GPS unit, and a single-board computer that ties them together. From 8ba0459d97d269771bfd3e022e403455d9374fba Mon Sep 17 00:00:00 2001 From: Brandon Shrewsbury Date: Wed, 1 Jul 2026 16:36:41 -0600 Subject: [PATCH 10/10] Move concept-coverage analysis out of the docs repo The use-case coverage analysis is methodology/run-output, not docs-site content. It now lives in viam-code-map (playbook-11-example-run.md). This PR keeps only the documentation pages. --- CONCEPT-COVERAGE-README.md | 96 -------------------------------------- 1 file changed, 96 deletions(-) delete mode 100644 CONCEPT-COVERAGE-README.md diff --git a/CONCEPT-COVERAGE-README.md b/CONCEPT-COVERAGE-README.md deleted file mode 100644 index 8c5873199e..0000000000 --- a/CONCEPT-COVERAGE-README.md +++ /dev/null @@ -1,96 +0,0 @@ -# Concept coverage analysis: 50 IoT/robotics use-cases - -This branch closes concept gaps found by running a **use-case → concept coverage** analysis (Playbook 11) over 50 real IoT and robotics product use-cases. It adds three new sections (AI & control, navigation, manipulation) plus individual concept pages placed in their topical homes (what-is-viam, vision, data). This README records the use-cases, the method, and the measured gap closure. - -**Spreadsheet (full matrix, per-task concepts, gaps, learning objectives, before/after):** [`playbook11-use-case-coverage.xlsx`](https://drive.google.com/drive/folders/1ErpNHevOAVc_4YwqYM_dpG960QeAaLmA) (in the shared Drive folder). - -## Method - -1. Decompose each use-case into the platform + robotics concepts a user must understand to succeed. -2. Ownership sweep: for each concept, does one findable page **own** (define) it, or is it scattered/missing/buried? (grep-grounded against the docs + RDK source). -3. Turn gaps into target pages with a Diátaxis type, write learning objectives, then draft the pages. -4. Re-run the sweep to measure gap closure. - -## Gap closure (63 concepts) - -| Coverage | Before | After | -| ------------------------------------- | -----: | ----: | -| OWNED | 29 | 46 | -| PARTIAL | 10 | 10 | -| BURIED | 7 | 7 | -| SCATTERED | 3 | 0 | -| MISSING | 14 | 0 | -| **No-owner gaps (MISSING+SCATTERED)** | **17** | **0** | - -The 7 BURIED concepts (kinematics, frames, motion-planning) stay deferred: they are owned inside the motion-planning section, which is under active edit. - -## What this PR adds (19 pages) - -Each concept page lives in the section of the thing it is about, not a generic bucket. - -**New sections:** - -- **`docs/ai-control/`** — inference latency, learned & policy-based control, run a VLA, integrate an LLM, simulation & sim-to-real -- **`docs/navigation/`** — localization, SLAM & mapping, navigate a mobile base, sensor fusion, coordinate a fleet -- **`docs/manipulation/`** — force & compliance control, track & pick moving objects - -**Pages added into existing sections:** - -- **`docs/what-is-viam/platform-model.md`** — how machines, parts, components, services, and modules fit together -- **`docs/vision/confidence-scores.md`** — what a confidence score is (and isn't); an ML/vision concept -- **`docs/data/capture-frequency.md`** — data-capture frequency vs sensor sample rate - -## The 50 use-cases - -| # | Category | Use case | Job | -| --- | ------------------------------ | -------------------------- | ------------------------------------------------------------------------------ | -| 1 | VLA / foundation-model control | VLA bin picking | Prompt a manipulator in natural language to pick a named item from a mixed bin | -| 2 | VLA / foundation-model control | NL task commanding | Command an arm with 'put the red block in the box' and have it execute | -| 3 | VLA / foundation-model control | Open-vocab perception | Detect arbitrary, un-trained objects from a text prompt on a robot camera | -| 4 | VLA / foundation-model control | LLM task planner | Use an LLM to decompose a goal into robot skills and dispatch them | -| 5 | VLA / foundation-model control | VLA mobile manipulation | A mobile manipulator tidies a room from a spoken instruction | -| 6 | VLA / foundation-model control | Voice-to-action control | Drive a robot by voice command with speech + a VLA policy | -| 7 | Policy-based / learned control | RL locomotion | Deploy an RL-trained gait policy on a legged/wheeled base | -| 8 | Policy-based / learned control | Imitation assembly | Teach an assembly skill from demonstrations and replay it | -| 9 | Policy-based / learned control | Visuomotor grasp policy | Run a learned pixel-to-action grasping policy in a loop | -| 10 | Policy-based / learned control | Sim-to-real transfer | Train a control policy in sim and deploy it on hardware | -| 11 | Policy-based / learned control | MPC mobile base | Run model-predictive control for smooth base trajectory tracking | -| 12 | Policy-based / learned control | Force-control insertion | Adaptive force policy for peg-in-hole insertion | -| 13 | Mobile robots / AMR | Warehouse AMR | Goods-to-person transport AMR navigating a mapped warehouse | -| 14 | Mobile robots / AMR | Outdoor delivery | GPS-navigated last-yard delivery robot on sidewalks | -| 15 | Mobile robots / AMR | SLAM cleaning robot | Indoor robot builds a map and cleans coverage-complete | -| 16 | Mobile robots / AMR | Multi-robot fleet coord | Coordinate a fleet of AMRs to avoid deadlock and share tasks | -| 17 | Mobile robots / AMR | Inventory scanning rover | Autonomous rover scans shelves and reports stock | -| 18 | Mobile robots / AMR | Person-following cart | A cart follows a worker through a facility | -| 19 | Mobile robots / AMR | Field-scouting rover | Ag rover autonomously scouts rows and geo-tags findings | -| 20 | Industrial manipulation | Machine tending | Arm picks molded parts, vision QC, sorts good/reject into bins | -| 21 | Industrial manipulation | Palletizing | Arm stacks boxes onto a pallet in a computed pattern | -| 22 | Industrial manipulation | Vision-guided kitting | Assemble a kit by picking parts located by vision | -| 23 | Industrial manipulation | Conveyor tracking pick | Pick moving parts off a running conveyor | -| 24 | Industrial manipulation | Dispensing path follow | Follow a Cartesian path for glue/weld dispensing | -| 25 | Industrial manipulation | CNC loader tending | Load/unload a CNC with force-sensed insertion | -| 26 | Industrial manipulation | Random bin picking | Pick randomly-oriented parts using 3D pose estimation | -| 27 | Fleet management / ops | Zero-touch provisioning | Provision 1,000 new devices with no per-unit manual setup | -| 28 | Fleet management / ops | Staged model rollout | Roll a new ML model to a fleet in canary then full stages | -| 29 | Fleet management / ops | Fleet config with fragments | Manage shared config across many machines with fragments | -| 30 | Fleet management / ops | RBAC for customer fleet | Grant scoped access to customers over their own machines | -| 31 | Fleet management / ops | Fleet health monitoring | Dashboard + alerts on fleet health and offline devices | -| 32 | Fleet management / ops | Remote teleop intervention | Remotely take control of a stuck robot to recover it | -| 33 | Fleet management / ops | Scheduled maintenance jobs | Run recurring jobs (calibration, logs) across a fleet | -| 34 | Fleet management / ops | White-labeled billing | Bill end customers under a partner brand for fleet usage | -| 35 | IoT sensing / monitoring | Cold-chain monitoring | Monitor temperature across assets and alert on excursions | -| 36 | IoT sensing / monitoring | Predictive maintenance | Vibration sensing to predict equipment failure | -| 37 | IoT sensing / monitoring | Air-quality network | Network of air-quality sensors reporting to the cloud | -| 38 | IoT sensing / monitoring | Smart-building occupancy | Occupancy + energy sensing for building automation | -| 39 | IoT sensing / monitoring | Utility meter aggregation | Edge-aggregate meter reads and sync upstream | -| 40 | IoT sensing / monitoring | Equipment anomaly detect | Detect leaks/anomalies on industrial equipment at the edge | -| 41 | Computer vision apps | Line defect detection | Detect product defects on a production line and reject | -| 42 | Computer vision apps | PPE compliance | Monitor a site for PPE compliance and alert | -| 43 | Computer vision apps | Access-control camera | License-plate / face access control at a gate | -| 44 | Computer vision apps | Shelf-stock analytics | Analyze retail shelves for out-of-stock | -| 45 | Computer vision apps | Queue/people analytics | Count people and measure queue length | -| 46 | Data / ML pipeline | Capture+sync for training | Continuously capture and sync robot data to build datasets | -| 47 | Data / ML pipeline | Auto-retraining loop | Detect model drift and retrain/redeploy automatically | -| 48 | Data / ML pipeline | Custom training script | Train a specialized model with a custom script on Viam data | -| 49 | Data / ML pipeline | Edge inference + upload | Run inference at the edge and conditionally upload hard cases | -| 50 | Data / ML pipeline | Sensor-data BI dashboard | Query and visualize fleet sensor data for business insight |