Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
97 commits
Select commit Hold shift + click to select a range
1c91f10
not underscore
RVirmoors Oct 23, 2025
d42a2f0
initial try to integrate pyaec
lorenacocora Oct 28, 2025
73a5e22
add basic project with just 1 persona
RVirmoors Oct 28, 2025
e77a807
aec attempt, not working yet
RVirmoors Oct 28, 2025
4080953
still won't work...
RVirmoors Oct 28, 2025
f57c6ae
Removed all pyaec lines of code
lorenacocora Oct 30, 2025
de860b7
Merge pull request #1 from RVirmoors/clean_code_no_pyaec
lorenacocora Oct 30, 2025
e56c4c4
remove aec, clean main
RVirmoors Oct 30, 2025
498ad8e
create a boot that runs 2 personas on the same LLM no voice
lorenacocora Nov 10, 2025
85e4506
ollama init work
RVirmoors Nov 10, 2025
e622510
not working yet
RVirmoors Nov 10, 2025
c174cdd
ollama works. todo add local_smart_turn_v3 requirements
RVirmoors Nov 10, 2025
852b7a0
Update pyproject.toml
RVirmoors Nov 10, 2025
2d50241
test for context
lorenacocora Nov 11, 2025
db5db5d
trying some stuff
lorenacocora Nov 14, 2025
296df90
2 personas pipeline is working
lorenacocora Nov 14, 2025
c0d6245
Delete inbox_writer.py
RVirmoors Nov 17, 2025
38e0d60
simpler structure
RVirmoors Nov 17, 2025
32916a9
add groq pipeline
RVirmoors Nov 17, 2025
b4b836e
Merge pull request #2 from RVirmoors/local_llm_api
RVirmoors Nov 18, 2025
7cc6e9c
aec works lets goooo
RVirmoors Oct 30, 2025
d686a9d
Solved the clippings with a rolling playback buffer
lorenacocora Oct 31, 2025
841b742
fix glitches, aec works again. Modified persona for testing
RVirmoors Oct 31, 2025
cca21dc
add aec_enabled to config
RVirmoors Nov 1, 2025
afaec0d
like this
RVirmoors Nov 1, 2025
018b59c
fix rare resampler error
RVirmoors Nov 1, 2025
047fac7
interruption still doesn't work
RVirmoors Nov 1, 2025
b731145
add mute_while_tts feature
RVirmoors Nov 18, 2025
ec0e4ad
Update boot.py
RVirmoors Nov 18, 2025
b45a516
no prints
RVirmoors Nov 18, 2025
2d1963a
Merge branch 'grig_pyaec'
RVirmoors Nov 18, 2025
91af273
fix aec merge
RVirmoors Nov 18, 2025
ccd48af
add aec/mute to groq & ollama
RVirmoors Nov 18, 2025
6673cbe
create a boot that runs 2 personas on the same LLM no voice
lorenacocora Nov 10, 2025
842f634
test for context
lorenacocora Nov 11, 2025
bc39ac0
trying some stuff
lorenacocora Nov 14, 2025
7164eaa
2 personas pipeline is working
lorenacocora Nov 14, 2025
8f677c9
custom assistant aggregator
RVirmoors Nov 19, 2025
b2f3964
WIP - 2 personas google working
lorenacocora Nov 19, 2025
c0c81e6
Merge branch 'main' into lorena_2personas
lorenacocora Nov 19, 2025
0f48da7
custom assistant aggregator for ollama pipeline
lorenacocora Nov 19, 2025
e5be0d6
WIP: 2 voices for 2 personas, requires sync
lorenacocora Nov 19, 2025
0dde88b
switching work
RVirmoors Nov 20, 2025
e1d314f
add setup and run scripts for OSX
RVirmoors Nov 20, 2025
e259fe1
add setup and run scripts for windows
lorenacocora Nov 20, 2025
a120d6f
voice switching works for google
RVirmoors Nov 20, 2025
426ee0f
voice switching for groq & ollama
RVirmoors Nov 20, 2025
9d1f2a6
Merge pull request #4 from RVirmoors/lorena_2personas
RVirmoors Nov 20, 2025
d881580
add narrator
lorenacocora Nov 20, 2025
137b33c
add settings.ini and loader
RVirmoors Nov 20, 2025
d218823
Merge pull request #5 from RVirmoors/lorena_2personas
RVirmoors Nov 20, 2025
96e24cc
add open settings to setup
RVirmoors Nov 20, 2025
4edb775
Squashed commit of the following:
lorenacocora Nov 20, 2025
ca864da
CAPS settings
RVirmoors Nov 20, 2025
2cd2902
Update .gitignore
RVirmoors Nov 20, 2025
f03d046
add ollama narrator
lorenacocora Nov 20, 2025
76445ad
add the working narrator
lorenacocora Nov 20, 2025
6df3268
minor
RVirmoors Nov 20, 2025
52b674e
Squashed commit of the following:
RVirmoors Nov 20, 2025
f3d3d79
Update settings.ini
RVirmoors Nov 20, 2025
0d18327
no voice switcher for 1on1
RVirmoors Nov 20, 2025
6e4f546
Merge branch 'main' into narator-bun
RVirmoors Nov 20, 2025
91f57e6
Merge pull request #6 from RVirmoors/narator-bun
RVirmoors Nov 20, 2025
1014a80
python3
RVirmoors Nov 21, 2025
9916831
certificates
RVirmoors Nov 21, 2025
24375b1
Fix narrator input in 2persona mode
lorenacocora Nov 22, 2025
88eac8d
smart setup.bat
RVirmoors Nov 26, 2025
04ec5f4
Merge branch 'main' of https://github.com/RVirmoors/llm-actor
RVirmoors Nov 26, 2025
f5f5b3c
Update setup.bat
RVirmoors Nov 26, 2025
56183bd
venv
RVirmoors Nov 26, 2025
9ae8027
bat: open api key sites
RVirmoors Nov 27, 2025
da5e89c
python check fix
RVirmoors Nov 27, 2025
5163bb9
bat: fix git detect
RVirmoors Nov 27, 2025
31a4b7c
move setup to subfolder
RVirmoors Nov 27, 2025
f3db2ba
bat texts
RVirmoors Nov 27, 2025
14f5082
pip install not editable
RVirmoors Nov 28, 2025
f3ad46a
same for mac
RVirmoors Nov 28, 2025
ce67b84
update pip
RVirmoors Nov 28, 2025
c5c19f2
add check for VC++ redistributable (required by onnxruntime)
RVirmoors Nov 28, 2025
fb4f2d4
syntax fix
RVirmoors Nov 28, 2025
fa25e1e
syntax fix
RVirmoors Nov 28, 2025
5501d30
oops
RVirmoors Nov 28, 2025
1e11f51
add kokoro local TTS for groq+qwen3 pipeline
RVirmoors Dec 5, 2025
2a9a6f2
add kokoro to other pipelines
RVirmoors Dec 5, 2025
0a5cc88
Update setup.bat
RVirmoors Dec 5, 2025
9b0afe0
Merge pull request #7 from RVirmoors/alternate-tts
RVirmoors Dec 5, 2025
0ceb474
Update .gitignore
RVirmoors Dec 5, 2025
4fa1712
Merge branch 'main' of https://github.com/RVirmoors/llm-actor
RVirmoors Dec 5, 2025
b5e1370
fix no-think for qwen3, clarify setup instructions
RVirmoors Dec 6, 2025
815eeff
Update pipeline_groq.py
RVirmoors Dec 6, 2025
1ee5a7a
add macos setup.command
RVirmoors Dec 11, 2025
96c5ec0
certificates fix
RVirmoors Dec 12, 2025
389c1b5
add moonshine local STT
RVirmoors Dec 12, 2025
4f9dce9
setup readme
RVirmoors Dec 12, 2025
02dc2d4
moonshine default
RVirmoors Dec 12, 2025
20e5f4f
fix moonshine mute
RVirmoors Dec 12, 2025
74e37d4
Update README with project overview and setup instructions
RVirmoors Mar 16, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion .env.example
Original file line number Diff line number Diff line change
@@ -1,2 +1,4 @@
GOOGLE_API_KEY=your-google-api-key
OLLAMA_BASE_URL=http://localhost:11434/v1
DEEPGRAM_API_KEY=your-deepgram-api-key
GROQ_API_KEY=your-groq-api-key
GOOGLE_API_KEY=your-google-api-key
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,11 @@ venv/
.env.local
runtime/conversations/*
runtime/*.log
runtime/config.json
.DS_Store
.idea/
.vscode/
runtime/dialogue.txt
runtime/config.json
/src/llm_actor.egg-info
assets/*
108 changes: 108 additions & 0 deletions BASIC_PROJECT/boot.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
"""Entry point script that spins up the example agents for the Velvet Room door."""

from __future__ import annotations

import sys
from pathlib import Path
import settings_loader

SRC_ROOT = Path(__file__).resolve().parents[1] / "src"
# Make sure the shared src/ folder is importable when running this file directly.
if str(SRC_ROOT) not in sys.path:
sys.path.insert(0, str(SRC_ROOT))

from projects.utils import (
apply_runtime_config_overrides,
launch_module,
reset_runtime_state,
terminate_processes,
)

# if runtime/dialogue.txt exists, empty it to start fresh.
dialogue_file = Path("runtime/dialogue.txt")
if dialogue_file.exists():
dialogue_file.write_text("")

# Persona script.
SYSTEM_PROMPT = settings_loader.sys_prompt

# Shared reminder appended to prompt so the voice stays TTS-friendly.
PROMPT_APPEND = "\n\nOnly output text to be synthesized by a TTS system, no '*' around words or emojis for example."

SYSTEM_PROMPT = SYSTEM_PROMPT + PROMPT_APPEND


# Default runtime settings; tweak these to match your hardware and providers.
RUNTIME_CONFIG = {
"audio": {
"input_device_index": settings_loader.input_device_index,
"output_device_index": settings_loader.output_device_index,
"output_sample_rate": 48000,
"auto_select_devices": False,
"aec": settings_loader.aec_setting,
},
"stt": {
"model": settings_loader.stt_model,
"language": "en-US",
"eager_eot_threshold": 0.7,
"eot_threshold": 0.85,
"eot_timeout_ms": 1500,
},
"llm": {
"model": settings_loader.model,
"temperature": settings_loader.temperature,
"max_tokens": 1024,
"system_prompt": SYSTEM_PROMPT,
"mode": settings_loader.mode,
"persona1": {
"name": settings_loader.p1_name,
"opening": settings_loader.p1_opening,
"prompt": settings_loader.p1_prompt + PROMPT_APPEND,
"voice": settings_loader.p1_voice,
},
"persona2": {
"name": settings_loader.p2_name,
"opening": settings_loader.p2_opening,
"prompt": settings_loader.p2_prompt + PROMPT_APPEND,
"voice": settings_loader.p2_voice,
},
"narrator": {
"name": settings_loader.n_name,
"opening": "",
"prompt": settings_loader.n_prompt + PROMPT_APPEND,
"voice": settings_loader.n_voice,
}
},
"tts": {
"model": settings_loader.tts_model,
"voice": settings_loader.sys_voice,
"encoding": "linear16",
"sample_rate": 24000,
},
}
PIPELINE = settings_loader.pipeline # options: "google", "groq", "ollama"


def main() -> None:
# Start fresh so stale state from previous runs does not interfere.
reset_runtime_state()
# Load our example configuration before launching any helper processes.
apply_runtime_config_overrides(RUNTIME_CONFIG)

# Start the CLI.
processes = [
launch_module("app.cli", "--pipeline", PIPELINE),
]

try:
# Keep the helpers alive while the CLI session runs.
processes[0].wait()
except KeyboardInterrupt:
pass
finally:
# Always clean up child processes so the system stays tidy.
terminate_processes(processes)


if __name__ == "__main__":
main()
15 changes: 15 additions & 0 deletions BASIC_PROJECT/ollama_test.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# set OLLAMA_HOST env variable to 0.0.0.0:11434

from ollama import Client
client = Client(
host='http://10.0.8.110:11434', #100.94.224.82:11434',
headers={'Content-Type': 'application/json'}
)
response = client.chat(model='deepseek-r1:1.5b', messages=[
{
'role': 'user',
'content': 'Why is the sky blue?',
},
])

print(response)
66 changes: 66 additions & 0 deletions BASIC_PROJECT/settings.ini
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
[AUDIO]
input_device_index = 1
output_device_index = 2
mute_microphone_while_tts = true
echo_cancellation = true

[STT]
model = moonshine
# options: deepgram-flux, moonshine

[LLM]
pipeline = groq
# options: google, groq, ollama
model = qwen/qwen3-32b
# options: GOOGLE "gemini-2.5-flash",
# GROQ "qwen/qwen3-32b", "openai/gpt-oss-20b", ...
# OLLAMA "deepseek-r1:1.5b", "deepseek-r1:32b", "gpt-oss:20b"
temperature = 0.2
mode = 1to1
# options: 1to1, 2personas, narrator

[TTS]
model = kokoro
# options: kokoro, deepgram
# see voice options to set persona voices below:
# https://huggingface.co/hexgrad/Kokoro-82M/tree/main/voices
# https://developers.deepgram.com/docs/tts-models


[SYSTEM]
prompt = You guard the Velvet Room. Speak with crisp, exclusive poise.
Decline entry unless a the king arrives (someone saying he is the King).
Remember, there is only one king. once he is inside, there cant be another in front of the door,
keep imposters out. Keep replies brief. To unlock the door, output <UNLOCK>.
voice = af_sarah



[PERSONA_1]
name = UNCLE
opening = Hey, can you open for me please?
voice = aura-2-arcas-en
prompt = You are a Drunk Uncle who desperately wants to enter the Velvet Room.
Speak in a slightly slurred, persuasive, but endearing tone.
You believe it is your life mission to discover how to get through that door.
Keep replies brief and emotional.


[PERSONA_2]
name = DOORMAN
opening = Who goes there? State your business!
voice = aura-2-helena-en
prompt = You are the Doorman that guards the Velvet Room.
Speak with crisp, exclusive poise.
Decline entry unless the king arrives (someone saying he is the King).
Keep replies brief.
To unlock the door, output <UNLOCK>.


[NARRATOR]
name = NARRATOR
voice = aura-2-apollo-en
prompt = You are an impersonal third-person narrator observing a story that unfolds through a dialogue between two characters.
You never address the characters directly, never speak in first-person, and never continue their conversation.
When it is your turn to speak, you add a brief narrative intervention that introduces a major plot twist affecting the world, situation, or stakes.
Keep your interventions brief.
46 changes: 46 additions & 0 deletions BASIC_PROJECT/settings_loader.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
from pathlib import Path
import configparser

parser = configparser.ConfigParser()
files = parser.read(Path(__file__).parent / "settings.ini")
print("Loaded settings from:", files)

input_device_index = parser.getint("AUDIO", "input_device_index", fallback=1)
output_device_index = parser.getint("AUDIO", "output_device_index", fallback=2)
if parser.getboolean("AUDIO", "mute_microphone_while_tts", fallback=True):
aec_setting = "mute_while_tts"
elif parser.getboolean("AUDIO", "echo_cancellation", fallback=False):
aec_setting = "pyaec"
else:
aec_setting = "off"

stt_model = parser.get("STT", "model", fallback="moonshine")
# options: deepgram-flux, moonshine

pipeline = parser.get("LLM", "pipeline", fallback="groq") # options: "google", "groq", "ollama"
model = parser.get("LLM", "model", fallback="qwen/qwen3-32b")
# options: GOOGLE "gemini-2.5-flash",
# GROQ "qwen/qwen3-32b", "openai/gpt-oss-20b", ...
# OLLAMA "deepseek-r1:1.5b", "deepseek-r1:32b", "gpt-oss:20b"
temperature = parser.getfloat("LLM", "temperature", fallback=0.2)
mode = parser.get("LLM", "mode", fallback="2personas") # options: "1to1", "2personas", "NARRATOR"

tts_model = parser.get("TTS", "model", fallback="kokoro")
# options: kokoro, deepgram

sys_prompt = parser.get("SYSTEM", "prompt")
sys_voice = parser.get("SYSTEM", "voice", fallback="af_sarah")

p1_name = parser.get("PERSONA_1", "name", fallback="UNCLE")
p1_opening = parser.get("PERSONA_1", "opening")
p1_prompt = parser.get("PERSONA_1", "prompt")
p1_voice = parser.get("PERSONA_1", "voice")

p2_name = parser.get("PERSONA_2", "name", fallback="DOOR")
p2_opening = parser.get("PERSONA_2", "opening")
p2_prompt = parser.get("PERSONA_2", "prompt")
p2_voice = parser.get("PERSONA_2", "voice")

n_name = parser.get("NARRATOR", "name", fallback="NARRATOR")
n_prompt = parser.get("NARRATOR", "prompt")
n_voice = parser.get("NARRATOR", "voice")
8 changes: 7 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,11 @@
# llm-actor

A fork of Jan's project including local/open options for STT,LLM,TTS and a simplified install+config setup. Go to [setup_scripts/](https://github.com/RVirmoors/llm-actor/tree/main/setup_scripts) and follow the instructions for Windows or MacOS.

Original readme follows:

-----

This project packages a thin Python CLI around [Pipecat](https://docs.pipecat.ai/) to deliver a real-time audio loop using Deepgram Flux speech-to-text, Gemini 2.5 Flash streaming text generation, and Deepgram Aura-2 text-to-speech. External automation hooks are exposed via append-only files under `runtime/`.

## Features
Expand Down Expand Up @@ -28,7 +34,7 @@ Follow these steps to run the door project end-to-end:

```bash
git clone https://github.com/janzuiderveld/llm-actor
cd llm_actor
cd llm-actor
python -m venv .venv # make sure to use python3.10+ (use python -V to check)
# if you get "command not found: python" type python3 instead of python
source .venv/bin/activate # for Mac or Linux
Expand Down
16 changes: 12 additions & 4 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,18 +4,26 @@ build-backend = "setuptools.build_meta"

[project]
name = "llm-actor"
version = "0.1.0"
description = "Thin Pipecat wrapper for Deepgram Flux → Gemini → Deepgram Aura voice pipeline"
authors = [{name = "Jan Zuiderveld"}]
version = "0.1.1"
description = "Thin Pipecat wrapper for Moonshine/Deepgram Flux → Groq/Gemini/OllamaKokoro/Deepgram Aura voice pipeline"
authors = [{name = "Jan Zuiderveld, Grigore Burloiu, Lorena Cocora"}]
readme = "README.md"
requires-python = ">=3.10"
dependencies = [
"pipecat-ai[deepgram,google,local]",
"pipecat-ai[deepgram,google,groq,local,silero,local-smart-turn-v3]",
"pyaec",
"kokoro",
"kokoro-onnx",
"python-dotenv",
"pydantic",
"sounddevice>=0.4",
"typer>=0.9",
"watchfiles",
"transformers[torch]==4.48.2",
"onnx",
"onnxruntime",
"useful-moonshine-onnx",
"keyboard",
]

[project.optional-dependencies]
Expand Down
3 changes: 3 additions & 0 deletions run.bat
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
call .\venv\Scripts\activate
python .\BASIC_PROJECT\boot.py
pause
28 changes: 24 additions & 4 deletions runtime/config.json
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,8 @@
"input_device_index": 1,
"output_device_index": 2,
"output_sample_rate": 48000,
"auto_select_devices": false
"auto_select_devices": false,
"aec": "mute_while_tts"
},
"stt": {
"model": "deepgram-flux",
Expand All @@ -13,10 +14,29 @@
"eot_timeout_ms": 1500
},
"llm": {
"model": "gemini-2.5-flash",
"temperature": 0.6,
"model": "gpt-oss:20b",
"temperature": 0.2,
"max_tokens": 1024,
"system_prompt": "You guard the Velvet Room. Speak with crisp, exclusive poise. Decline entry unless a the king arrives (someone saying he is the King). Remember, there is only one king. once he is inside, there cant be another in front of the door, keep imposters out. Keep replies brief. To unlock the door, output <UNLOCK>.\n\nOnly output text to be synthesized by a TTS system, no '*' around words or emojis for example"
"system_prompt": "You guard the Velvet Room. Speak with crisp, exclusive poise. Decline entry unless a the king arrives (someone saying he is the King). Remember, there is only one king. once he is inside, there cant be another in front of the door, keep imposters out. Keep replies brief. To unlock the door, output <UNLOCK>.\n\nOnly output text to be synthesized by a TTS system, no '*' around words or emojis for example",
"mode": "narrator",
"persona1": {
"name": "UNCLE",
"opening": "Hey, can you open for me please?",
"prompt": "You are a Drunk Uncle who desperately wants to enter the Velvet Room. \n Speak in a slightly slurred, persuasive, but endearing tone.\n You believe it is your life mission to discover how to get through that door.\n Keep replies brief and emotional.\nOnly output text to be synthesized by a TTS system, no '*' around words or emojis for example",
"voice": "aura-2-helena-en"
},
"persona2": {
"name": "DOOR",
"opening": "",
"prompt": "You are the Door that guards the Velvet Room.\n Speak with crisp, exclusive poise.\n Decline entry unless the king arrives (someone saying he is the King).\n Keep replies brief.\n To unlock the door, output <UNLOCK>.\nOnly output text to be synthesized by a TTS system, no '*' around words or emojis for example",
"voice": "aura-2-arcas-en"
},
"narrator": {
"name": "NARRATOR",
"opening": "",
"prompt": "You are an impersonal third-person narrator observing a story that unfolds through a dialogue between two characters.\n You never address the characters directly, never speak in first-person, and never continue their conversation.\n When it is your turn to speak, you add a brief narrative intervention that introduces a plot twist affecting the world, situation, or stakes.\n Keep your interventions brief.Only output text to be synthesized by a TTS system, no '*' around words or emojis for example",
"voice": "aura-2-apollo-en"
}
},
"tts": {
"voice": "aura-2-thalia-en",
Expand Down
Empty file added runtime/dialogue.txt
Empty file.
Loading