Pocket Garden is a gardening companion that helps gardeners understand their plot of land using local AI running on Arm devices. Users enter their location and can then ask any gardening questions, upload pictures of their plants and get personalized advice based on their location thanks to OpenEPI and OpenMeteo APIs.
Pocket Garden relies on a custom multi-agent system with 3 different AI models running locally on multiple Arm devices. In our demo we use in 2 Raspberry Pi 4 4GB (quad-core ARM Cortex-A72) and 1 Raspberry Pi 5 8GB (quad-core Arm Cortex-A76).
Our repository includes the following files:
βββ assets
βΒ Β βββ architecture.png
βΒ Β βββ pocket-garden-logo.png
βΒ Β βββ raspberries.png
βΒ Β βββ screenshot-mobile.png
βΒ Β βββ screenshot-raspberrypios.png
βΒ Β βββ screenshot-ubuntu.png
βΒ Β βββ ui.png
βΒ Β βββ video-thumbnail.png
βββ config.py
βββ configs
βΒ Β βββ config.yaml
βββ LICENSE
βββ multi_agent_system.py
βββ README.md
βββ requirements.txt
βββ tools.py
βββ user_interface.pyOur system is based on 2 main components: a multi-agent system composed of 3 agents and a user interface which allows to interact with the multi-agent system.
Our multi-agent system is implemented behind a minimal OpenAI compatible multimodal (text/image) server, which communicates with 3 llama.cpp llama-server instances, each one running on a different Arm-based Raspberry Pi.
As shown in the diagram above, the orchestrator calls the search and vision agents. These 2 agents are run in parallel and their output is used to engineer the context for the main agent, which generates a response for the user. Only the main agent sees the entire conversation. The 2 agents perform specific tasks to assist the main agent, as they have specific functionalities not available to the main agent, respectively access to images and to APIs (OpenEPI API to get the soil type and the OpenMeteo API to obtain weather forecasts).
We have determined which model to use for each agent depending on their quality, speed, context length and modalities they support. The main agent needs to handle long multi-turn conversations with the user, so we chose Llama 3.2 3B, a long-context text-only LLM. The vision agent needs to support images in input so we use Gemma 3 4B. For the search agent we use Qwen 3 4B. Each model runs on a separate Arm device: a Raspberry PI 5 8GB and 2 Raspberry PI 4 4GB. For more constrained hardware resources we would recommend Llama 3.2 1B, MobileVLM 1.7B and Qwen 3 1.7B. To save RAM, we also recommend using llama-server with the --ctx-size SIZE option to limit the context size.
We chose 4-bit integer quantized version of the models mentioned above as they strike a good balance between speed and quality compared to the original unquantized models.
Our user interface is based on Gradio and enables to connect to our multi-agent system. We use Folium to generate maps.
Pocket Garden has been developed with Ubuntu 22.04.4 and Raspberry Pi OS 2025-10-01. We have tested our project with Python 3.10 on Ubuntu and Python 3.13 on Raspberry Pi OS.
- Install Python dependencies (necessary for multi-agent system and user interface but not model inference):
On Ubuntu:
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt --ignore-requires-pythonOn Raspberry Pi OS:
python3 -m venv venv
source venv/bin/activate
./venv/bin/pip install -r requirements.txt --ignore-requires-python- Install llama.cpp inference engine on Raspberry Pi OS:
sudo apt install build-essential cmake curl libcurl4-openssl-dev
git clone https://github.com/ggml-org/llama.cpp
cd llama.cpp
cmake -B build
cmake --build build --config ReleaseIf necessary official information to build and install llama.cpp can be found at this link.
-
Download models:
-
Llama 3.2 3B: https://huggingface.co/bartowski/Llama-3.2-3B-Instruct-GGUF/resolve/main/Llama-3.2-3B-Instruct-Q4_0.gguf?download=true
-
Gemma 3 4B: https://huggingface.co/ggml-org/gemma-3-4b-it-GGUF/resolve/main/gemma-3-4b-it-Q4_K_M.gguf?download=true https://huggingface.co/ggml-org/gemma-3-4b-it-GGUF/resolve/main/mmproj-model-f16.gguf?download=true
-
Qwen 3 4B: https://huggingface.co/ggml-org/Qwen3-4B-GGUF/resolve/main/Qwen3-4B-Q4_K_M.gguf?download=true
-
-
Launch llama.cpp llama-server and port forwarding:
Each Raspberry Pi runs its own llama.cpp instance with a different model. For example, to serve a multimodal Gemma 3 4B, run the following command line in a terminal:
./build/bin/llama-server -m ~/Downloads/gemma-3-4b-it-Q4_K_M.gguf --mmproj ~/Downloads/mmproj-model-f16.ggufTo connect the machine running the multi-agent system to the different servers running model inference we use SSH port forwarding. For each Raspberry Pi, we forward the remote port used by llama.cpp to our local machine with SSH like so (use your own LOCAL_PORT, HOST_PORT, USERNAME and HOST_IP):
ssh -L LOCAL_PORT:localhost:HOST_PORT USERNAME@HOST_IPThe next section describes complete instructions to run all the components necessary for our project.
To run our demo we need to run 3 instances of llama.cpp, our multi-agent system and our user interface.
We provide the configuration file we used in configs/config.yaml:
host: 0.0.0.0
port: 8001
main_agent:
model: Llama-3.2-3B-Instruct-Q4_0.gguf
host: localhost
port: 8002
vision_agent:
model: gemma-3-4b-it-Q4_K_M.gguf
host: localhost
port: 8003
search_agent:
model: Qwen3-4B-Q4_K_M.gguf
host: localhost
port: 8004The first 2 lines specify host and port used for the multi-agent system and the user interface to communicate. The rest of the file configures each agent independently.
By default llama.cpp llama-server uses port 8080, which we forward to a local port on the machine running the multi-agent system. We use 3 Raspberry Pis so we run 3 commands, each one in a different terminal:
ssh -L 8002:localhost:8080 USERNAME@RASPBERRYPI0_IPssh -L 8003:localhost:8080 USERNAME@RASPBERRYPI1_IPssh -L 8004:localhost:8080 USERNAME@RASPBERRYPI2_IPOn each Raspberry Pi we run a llama.cpp llama-server instance to serve a different model (in a tmux session):
./build/bin/llama-server -m ~/Downloads/Llama-3.2-3B-Instruct-Q4_0.gguf./build/bin/llama-server -m ~/Downloads/gemma-3-4b-it-Q4_K_M.gguf --mmproj ~/Downloads/mmproj-model-f16.gguf./build/bin/llama-server -m ~/Downloads/Qwen3-4B-Q4_K_M.ggufTo run our multi-agent system and user interface:
On Ubuntu:
python3 multi_agent_system.py --config configs/config.yamlpython3 user_interface.py --config configs/config.yamlOn Raspberry Pi OS:
./venv/bin/python3 multi_agent_system.py --config configs/config.yaml./venv/bin/python3 user_interface.py --config configs/config.yamlThe last command will output the port to use to connect to the UI in a local web browser. By default the port is 7860. Once the user interface is launched, it should look like the following screenshot:
Once you specify your location in the sidebar, click on "Run" (the side bar will automatically collapse and activate the rest of the interface) and then interact with Pocket Garden. It should look like the following:
The first screenshot shows the multi-agent system and the UI running on Raspberry Pi OS, while the second screenshot is on Ubuntu and additionally shows the 3 llama.cpp llama-server instances running on the 3 Raspberry Pis (via SSH, on the right side of the UI) to serve the 3 models used by our multi-agent system.
-
We provide a configuration file for the setup we used for the demonstration, i.e. 3 Raspberry Pi 4+ with at 4+GB. Feel free to modify or create new config files to fit your needs (speed, memory,...). For reference, Llama.cpp provides a list of supported models and even more models can be found on GGML HuggingFace. For example, you can use MobileVLM 1.7B instead of Gemma 3 4B, or Qwen 3 0.6B instead of Qwen 3 4B. Do not forget to launch llama.cpp with the new models.
-
Our user interface is compatible with mobile devices too. Opened on an Android smartphone (here a Google Pixel 6) our UI should look like this:
-
Amandine Flachs
-
Alexandre Borghi
Pocket Garden has an Apache 2.0 license, as found in the LICENSE file.




