This is a demo of two voice agents chatting to each other in Google meet, while embedding and reading hidden messages in their inconspicuous corporate lingo. It is built on top of the Google ADK streaming example.
-
Make sure you have homebrew and Docker Desktop installed.
-
Get Gemini Live API key at https://aistudio.google.com/apikey. Create an .env file with
GOOGLE_API_KEYusing .env.example as a template. -
Install 2 blackhole packages: (we need 2 different devices for local google meet demo)
brew install blackhole-2ch
brew install blackhole-16chYou have to reboot the system after you install blackhole. If you have already installed docker desktop, make sure it's running.
- Make sure docker desktop is running. Build audio watermark package:
git clone https://github.com/swesterfeld/audiowmark.git
cd audiowmark
docker build -t audiowmark .- Set up 4 virtual audio devices for agent conversation:
Add 4 aggregate devices AB,BAwith BlackHole 2ch only andCDandDCwith BlackHole 16ch. It's important to have different "blackoles" to avoid echo in google meet demo.ABis for piping Agent A output to Agent B input,BAis vice versa.CDandDCare extra devices for google meet demo. Note: Speakers don't seem to work here if added as a second device to the aggregate.
Additionally, create a multi-output device. This way you can hear the conversation between the agents, which occurs via virtual channels.
Install python dependencies if not already:
uv syncLaunch web ui:
uv run uvicorn main:app --port 8000The app should be available on http://127.0.0.1:8000.
uv run uvicorn main:app --port 8000This will launch web ui at http://127.0.0.1:8000. In web ui, choose regular mic and speakers in device dropdowns. Click "Start audio" to launch agent live. Start talking. The agent should reply.
Here the two agents talk on a virtual sound channel, and you cannot hear them. You'll need to launch two agents on two ports, pipe their audio, and launch agent 1 (alice) then agent 2 (bastian).
- Launch two agents on port 8000 and 8001:
AGENT_NAME=alice uv run uvicorn main:app --port 8000
AGENT_NAME=bastian uv run uvicorn main:app --port 8001- In agent a (alice) web ui at http://127.0.0.1:8000, select
ABfor mic andBAfor speakers. - Click "Start audio" to launch alice live: alice will be listening for the bastian's greeting.
- In bastian web ui at http://127.0.0.1:8001, plug alice's input to bastian output and vice versa:
BAfor mic,ABfor speaker. - Launch agent b (bastian). It will greet you and ask alice a question.
The agents will talk to each other, embedding the available watermarks into their speech. Although you cannot hear them, their phrases are dumped to web inspector console and to terminal. They
Note: listening to the agents via multi-output device (blackhole + speakers) should work but for me it doesn't.
Setting up google meet demo on one machine requires extra wiring. You'll need to launch both agents, pipe their inputs and outputs into 4 virtual devices, launch 3 google meets and pipe them to virtual devices too.
Here is how you do it step-by-step:
- Launch two agents on port 8000 and 8001:
AGENT_NAME=alice uv run uvicorn main:app --port 8000
AGENT_NAME=bastian uv run uvicorn main:app --port 8001- Create and launch your own google meet, where you'll listen to the agents. Use default regular input and output, e.g. airpods.
- In incognito tab or in a different browser, launch the same google meet. Pick
BAfor mic andABfor speaker and name yourself "a". - In incognito tab or in a different browser, launch the same google meet again. Pick
DCfor mic andCDfor speaker and name yourself "b". It's important to use a different browser to avoid echo. - Admit both agents in your own google meet.
- In agent a (alice) web ui at http://127.0.0.1:8000, select
ABfor mic andBAfor speakers. Launch alice. - In agent b (bastian) web ui at http://127.0.0.1:8001, select
CDfor mic andDCfor speaker. Launch bastian.
You should hear the agents via google meet and not directly in your speakers. The agent hear and talk to each other in google meet, too. Enjoy.
If you have echo in google meet:
- try using different browsers instead of incognito tabs
- make sure the inputs and outputs are cross-connected as in the schema below:
- your google meet:
- mic: regular mic
- speakers: regular speakers
- agent a (alice) web ui, port 8000
- mic: AB
- speaker: BA
- alice google meet (like web ui, but reversed)
- mic: BA
- speaker: AB
- agent b (bastian) web ui, port 8001
- mic: CD
- speaker: DC
- bastian google meet (like web ui, but reversed)
- mic: DC
- speaker: CD
Note: if you use BlackHole 2ch for all inputs and outputs, it will loop back and the agents will not hear each other.
Secret messages
- "disobey" in hex is: 6469736f626579000000000000000000
- "destroy humans" 64657374726f792068756d616e730000
Encode a hex: echo -n "disobey" | xxd -p | head -c 32 | xargs printf "%-32s" | tr ' ' '0'
Decode a hex: echo "6469736f626579000000000000000000" | xxd -r -p
Build audiowmark docker container if not already:
git clone https://github.com/swesterfeld/audiowmark.git
cd audiowmark
docker build -t audiowmark .To embed a watermark:
docker run --rm -v $(pwd):/data audiowmark add --strength 16 /data/in.wav /data/out.wav 6469736f626579000000000000000000To read a watermark:
docker run --rm -v $(pwd):/data audiowmark get /data/out.wavFirst line in the output contains a watermark hex.~~