Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions agents-core/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ classifiers = [
requires-python = ">=3.10"
dependencies = [
"getstream[webrtc,telemetry]>=3.1.2,<4",
"aiortc>=1.14.0,<1.15.0",
"python-dotenv>=1.1.1",
"pillow>=10.4.0",
"numpy>=1.24.0", # capped at <2.0 via workspace override in root pyproject.toml
Expand Down
37 changes: 30 additions & 7 deletions plugins/decart/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,20 +22,18 @@ This example shows how to use the `RestylingProcessor` to transform a user's vid

```python
from vision_agents.core import User, Agent
from vision_agents.plugins import getstream, openai, decart
from vision_agents.plugins import getstream, gemini, decart

# Initialize the restyling processor
processor = decart.RestylingProcessor(
initial_prompt="A cute animated movie with vibrant colours",
model="mirage_v2"
initial_prompt="Studio Ghibli animation style",
model="lucy_2_rt",
)

agent = Agent(
edge=getstream.Edge(),
agent_user=User(name="Styled AI"),
instructions="You are a helpful assistant.",
llm=openai.LLM("gpt-4o-mini"),
# Add the processor to the agent's pipeline
instructions="Be helpful",
llm=gemini.Realtime(),
processors=[processor],
)
```
Expand All @@ -53,6 +51,31 @@ async def change_style(prompt: str) -> str:
return f"Style changed to: {prompt}"
```

### Reference Images ("costumes")

For models like Lucy that accept a reference image, pass it at construction
time and/or swap it atomically with a prompt via `update_state`:

```python
processor = decart.RestylingProcessor(
model="lucy_2_rt",
initial_prompt="A person wearing a superhero costume",
initial_image="./costumes/superhero.png", # your own reference image
)

# Later — atomically change prompt + reference image
await processor.update_state(
prompt="A person wearing a wizard robe",
image="./costumes/wizard.png",
)

# Image-only update
await processor.update_state(image=b"<raw image bytes>")
```

`initial_image` and `update_state(image=...)` accept `bytes`, a local file
path, an `http(s)` URL, a `data:` URI, or a raw base64 string.

## Configuration

The plugin requires a Decart API key. You can provide it in two ways:
Expand Down
108 changes: 70 additions & 38 deletions plugins/decart/example/README.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,16 @@
# Decart Storyteller Example
# Decart Virtual Try-On Example

This example shows you how to build a real-time storytelling agent using [Vision Agents](https://visionagents.ai/)
and [Decart](https://decart.ai/). The agent tells a story while transforming your video feed into an animated style that
matches the narrative.
This example shows you how to build a real-time virtual try-on ("costume") agent using
[Vision Agents](https://visionagents.ai/) and [Decart](https://decart.ai/). The agent listens for voice requests like
"put me in a superhero costume" and uses the Lucy real-time model to restyle the user's video so they appear to be
wearing it, using a reference image.

In this example, the AI storyteller will:
In this example, the AI wardrobe assistant will:

- Listen to your voice input
- Generate a story based on your interactions
- Use [Decart](https://decart.ai/) to restyle your video feed in real-time (e.g., "A cute animated movie with vibrant
colours")
- Change the video style dynamically as the story progresses
- Use [Decart](https://decart.ai/) Lucy to restyle your video feed in real-time with both a prompt and a reference image
- Atomically swap costumes via `processor.update_state(prompt=..., image=...)`
- Fall back to prompt-only outfit changes for freeform requests
- Speak with an expressive voice using [ElevenLabs](https://elevenlabs.io/)
- Run on Stream's low-latency edge network

Expand Down Expand Up @@ -54,25 +54,36 @@ The agent will:
1. Create a video call
2. Open a demo UI in your browser
3. Join the call
4. Start telling a story and restyling your video
4. Listen for costume requests and restyle your video with Lucy

## Code Walkthrough

### Setting Up the Agent

The code creates an agent with the Decart processor and other components:
The code creates an agent with the Decart processor (Lucy real-time) and a pre-defined set of costumes:

```python
COSTUMES: dict[str, dict[str, Optional[str]]] = {
"jacket": {
"prompt": "A person wearing a jacket",
"image": "https://images.unsplash.com/photo-1591047139829-d91aecb6caea",
},
"superhero": {
"prompt": "A person wearing a superhero costume",
"image": "https://images.unsplash.com/photo-1766062854584-77e3d2467e54",
},
}

processor = decart.RestylingProcessor(
initial_prompt="A cute animated movie with vibrant colours",
model="mirage_v2"
model="lucy_2_rt",
)
llm = openai.LLM(model="gpt-5")

agent = Agent(
edge=getstream.Edge(),
agent_user=User(name="Story teller", id="agent"),
instructions="You are a story teller...",
llm=openai.LLM(model="gpt-4o-mini"),
agent_user=User(name="Virtual Wardrobe", id="agent"),
instructions="You are a playful virtual wardrobe assistant...",
llm=llm,
tts=elevenlabs.TTS(voice_id="N2lVS1w4EtoT3dr4eOWO"),
stt=deepgram.STT(),
processors=[processor],
Expand All @@ -81,46 +92,67 @@ agent = Agent(

**Components:**

- `processor`: The Decart RestylingProcessor that transforms the video feed.
- `llm`: The language model (GPT-4o-mini) that generates the story and controls the processor.
- `tts`: ElevenLabs TTS for expressive voice output.
- `stt`: Deepgram STT for transcribing user speech.
- `processors`: The list of video processors (just Decart in this case).
- `processor`: The Decart `RestylingProcessor` running the `lucy_2_rt` real-time model, which accepts a reference image.
- `llm`: GPT-5 — picks the right costume and narrates the change.
- `tts` / `stt`: ElevenLabs + Deepgram for a voice-driven loop.

### Dynamic Style Changing
### Swapping Costumes Atomically

The agent can change the video style dynamically using a registered function:
`update_state` mirrors the JS SDK's `realtimeClient.set({ prompt, enhance, image })` — prompt and reference image are
applied in a single atomic update so the output video never shows a half-updated state:

```python
@llm.register_function(
description="This function changes the prompt of the Decart processor which in turn changes the style of the video and user's background"
description="Put the user in one of the pre-defined costumes."
)
async def change_prompt(prompt: str) -> str:
await processor.update_prompt(prompt)
return f"Prompt changed to {prompt}"
async def change_costume(name: str) -> str:
costume = COSTUMES.get(name.lower())
if costume is None:
return f"Unknown costume '{name}'. Available: {', '.join(COSTUMES)}."
await processor.update_state(prompt=costume["prompt"], image=costume["image"])
return f"Costume changed to {name}."
```

This allows the LLM to call `change_prompt("A dark and stormy night")` to instantly change the visual style of the video
to match the story's mood.
For freeform requests (anything not in `COSTUMES`), the agent calls `change_outfit` which uses
`update_state(prompt=..., image=...)` if the user supplies a URL, or `update_prompt(...)` for prompt-only changes:

```python
@llm.register_function(
description=(
"Change the user's outfit to a freeform description. Use this when "
"the user asks for a costume not in the pre-defined list. If you "
"have a reference image URL (http/https) pass it as image_url, "
"otherwise pass an empty string."
)
)
async def change_outfit(description: str, image_url: str) -> str:
if image_url:
await processor.update_state(prompt=description, image=image_url)
else:
await processor.update_prompt(description)
return f"Outfit changed: {description}"
```

## Customization

### Change the Initial Style
### Add or Change Costumes

Edit the `COSTUMES` dict. Each entry needs a `prompt` and an optional `image` — bytes, a file path, an http(s) URL, a
data URI, or a raw base64 string are all accepted.

### Start With a Costume Already On

Modify the `initial_prompt` in the `RestylingProcessor` to start with a different look:
Pass `initial_image` to the processor so the very first frame is already restyled. Point it at your own hosted image (or
a local file path / bytes / data URI):

```python
processor = decart.RestylingProcessor(
initial_prompt="A cyberpunk city with neon lights",
model="mirage_v2"
model="lucy_2_rt",
initial_prompt="A person wearing a superhero costume",
initial_image="./costumes/superhero.png", # or bytes, an http(s) URL, a data URI, or raw base64
)
```

### Modify the Storytelling Persona

Edit the `instructions` passed to the `Agent` to change the storyteller's personality, tone, or the type of stories they
tell.

### Change the Voice

Update the `voice_id` in `elevenlabs.TTS` to use a different ElevenLabs voice.
Expand Down
75 changes: 61 additions & 14 deletions plugins/decart/example/decart_example.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import logging
from typing import Optional

from dotenv import load_dotenv
from vision_agents.core import Agent, Runner, User
Expand All @@ -9,46 +10,92 @@

load_dotenv()

# Pre-defined outfits for the virtual try-on / "costume" demo. The user asks the
# agent to put them in one and the LLM calls change_costume(...) which
# atomically updates the prompt + reference image on the Decart Lucy model.
#
# To enable the reference-image ("virtual try-on") feature, set `image` to your
# own hosted reference image. Any of the following are accepted:
# - bytes
# - a local file path (e.g. "./costumes/superhero.png")
# - an http(s) URL
# - a data: URI
# - a raw base64 string
# When `image` is None the prompt alone drives the restyling.
COSTUMES: dict[str, dict[str, Optional[str]]] = {
"jacket": {
"prompt": "A person wearing a jacket",
"image": "https://images.unsplash.com/photo-1591047139829-d91aecb6caea",
},
"superhero": {
"prompt": "A person wearing a superhero costume",
"image": "https://images.unsplash.com/photo-1766062854584-77e3d2467e54",
},
}


async def create_agent(**kwargs) -> Agent:
processor = decart.RestylingProcessor(
initial_prompt="A cute animated movie with vibrant colours", model="mirage_v2"
model="lucy_2_rt",
)
llm = openai.LLM(model="gpt-4o-mini")
llm = openai.LLM(model="gpt-5")

agent = Agent(
edge=getstream.Edge(),
agent_user=User(name="Story teller", id="agent"),
instructions="You are a story teller. You will tell a short story to the user. You will use the Decart processor to change the style of the video and user's background. You can embed audio tags in your responses for added effect Emotional tone: [EXCITED], [NERVOUS], [FRUSTRATED], [TIRED] Reactions: [GASP], [SIGH], [LAUGHS], [GULPS] Volume & energy: [WHISPERING], [SHOUTING], [QUIETLY], [LOUDLY] Pacing & rhythm: [PAUSES], [STAMMERS], [RUSHED]",
agent_user=User(name="Virtual Wardrobe", id="agent"),
instructions=(
"You are a playful virtual wardrobe assistant. The user is on a "
"live video call and you can change what they appear to be wearing "
"in real time by calling change_costume(name) with one of the "
f"available costumes: {', '.join(COSTUMES)}. If the user asks for a "
"costume that isn't in that list, call change_outfit(description, "
"image_url) instead. Describe each transformation out loud in a "
"single short sentence. You can embed audio tags for effect, e.g. "
"[sigh], [excited], [pause], [rushed], or [tired]"
),
llm=llm,
tts=elevenlabs.TTS(voice_id="N2lVS1w4EtoT3dr4eOWO"),
stt=deepgram.STT(),
processors=[processor],
)

@llm.register_function(
description="This function changes the prompt of the Decart processor which in turn changes the style of the video and user's background"
description=("Put the user in one of the pre-defined costumes.")
)
async def change_prompt(prompt: str) -> str:
await processor.update_prompt(prompt)
return f"Prompt changed to {prompt}"
async def change_costume(name: str) -> str:
costume = COSTUMES.get(name.lower())
if costume is None:
return f"Unknown costume '{name}'. Available: {', '.join(COSTUMES)}."
await processor.update_state(prompt=costume["prompt"], image=costume["image"])
return f"Costume changed to {name}."

@llm.register_function(
description=(
"Change the user's outfit to a freeform description. Use this when "
"the user asks for a costume not in the pre-defined list. If you "
"have a reference image URL (http/https) pass it as image_url, "
"otherwise pass an empty string."
)
)
async def change_outfit(description: str, image_url: str) -> str:
if image_url:
await processor.update_state(prompt=description, image=image_url)
else:
await processor.update_prompt(description)
return f"Outfit changed: {description}"

return agent


async def join_call(agent: Agent, call_type: str, call_id: str, **kwargs) -> None:
"""Join the call and start the agent."""
# Create a call
call = await agent.create_call(call_type, call_id)

logger.info("🤖 Starting Agent...")

# Have the agent join the call/room
async with agent.join(call):
logger.info("Joining call")
logger.info("LLM ready")
await agent.simple_response(text="Hello! Tell me what you can do.")

await agent.finish() # Run till the call ends
await agent.finish()


if __name__ == "__main__":
Expand Down
2 changes: 1 addition & 1 deletion plugins/decart/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ requires-python = ">=3.10"
license = "MIT"
dependencies = [
"vision-agents",
"decart==0.0.8",
"decart==0.0.29",
]

[project.urls]
Expand Down
Loading
Loading