Simone Marzulli’s Max Headbox transforms a Raspberry Pi 5 into a fully local, voice-activated AI agent that processes commands entirely on-device, ensuring complete data privacy without cloud dependencies. This open-source project combines a expressive animated face display, touch controls, and small efficient LLMs to create a desk companion capable of task execution, conversation, and real-time responses. Powered by Ollama for model inference and Vosk for speech recognition, it demonstrates edge AI’s practicality on consumer hardware.
The bot’s screen-head design features a GIMP-animated Microsoft Fluent Emoji face reacting emotionally to interactions. Colored ribbons signal states: blue (listening), red (recording), rainbow (generating). Touchscreen enables manual mic activation, recording stop, or response cancellation. Strategic model selection—Qwen3 1.7B for agentic decisions, Gemma3 1B for conversation—balances Pi 5’s 8/16GB RAM limits while maintaining responsiveness.
Core Technologies Powering Max Headbox
Max Headbox leverages lightweight, optimized components for seamless Pi operation:
– **LLMs via Ollama**: Qwen3 1.7B (tool-calling agent), Gemma3 1B (emotional responses)—1-2B parameter range prevents CPU/memory overload.
– **Voice Pipeline**: Vosk API (wake-word detection), faster-whisper (speech-to-text).
– **Runtime Stack**: Ruby 3.3.0, Node 22, Python 3 for modular development.
– **Tools Framework**: JavaScript modules export name, params, description, executor functions for extensibility.
No internet required post-setup; all inference local.
Required Hardware Components
Marzulli’s tested build uses Pi-optimized parts:
Raspberry Pi 5 (8GB or 16GB RAM)
GeeekPi Screen, Case, and Cooler kit
USB microphone (high-SNR model recommended)
MicroSD card (64GB+ Class 10)
Total cost under $150, excluding Pi.
Step-by-Step Build Instructions
Follow Marzulli’s GitHub repo (syxanash/maxheadbox) for complete docs:
– Flash Raspberry Pi OS Bookworm (64-bit) to microSD.
– Assemble hardware: mount Pi in GeeekPi case with screen/fan.
– Boot Pi, enable SSH/VNC, update system (sudo apt update && sudo apt upgrade).
– Install runtimes: Ruby 3.3.0, Node 22, Python 3 via apt/package managers.
– Install Ollama: curl -fsSL https://ollama.com/install.sh | sh.
– Pull models: ollama pull qwen3:1.7b, ollama pull gemma3:1b.
– Install voice stack: Vosk API, faster-whisper via pip.
– Clone repo, configure environment: npm install, setup mic permissions.
– Define tools as JS modules (name/params/desc/function).
– Run: npm start; access via touchscreen or VNC.
Test wake-word, voice prompts, tool execution.
Performance on Raspberry Pi 5
Small models shine on Pi 5’s Arm Cortex-A76:
| Model | Parameters | RAM Usage | Response Time | Use Case |
|---|---|---|---|---|
| Qwen3 | 1.7B | ~3GB | 2-5s | Agentic tasks |
| Gemma3 | 1B | ~2GB | 1-3s | Conversation |
Qwen excels in structured outputs/tool calling; Gemma prioritizes speed/emotion. Larger 3B models viable on 16GB but slower.
Customization and Extension Ideas
Modular design invites personalization:
– Swap models: Llama3.2:1B, Qwen2.5:1.5B for varied capabilities.
– Add tools: Weather API calls, file management, smart home integration.
– Face animations: Custom SVGs via GIMP/Inkscape.
– Enclosure: 3D print cases mimicking robots/characters.
– Multi-language: Vosk supports 20+ offline languages.
Accessibility: Haptic feedback, larger text for vision-impaired.
Why Build Max Headbox?
Privacy-first AI avoids data leaks to cloud providers. Offline operation works during outages. Educational value teaches LLM deployment, voice pipelines, agentic flows. Cost-effective ($100-200 total) vs commercial assistants. Community-driven: 240+ GitHub stars, active forks.
Perfect weekend project for Pi enthusiasts. Extends to home automation, desk productivity, or experimental AI research on edge hardware.



