2026-04-12
A Real Local Voice Loop
Today’s session had a simple goal: turn Robie into something more than a blinking prototype.
The idea was to validate a complete voice pipeline running locally on a Raspberry Pi:
- waiting for a wake word
- listening to the command
- speech transcription in French
- spoken playback using text-to-speech
- returning to idle mode
In other words: a first fully local conversational loop, without relying on any cloud service.
Tested Architecture
The selected pipeline relies on lightweight components suitable for a Raspberry Pi:
- OpenWakeWord for wake word detection
- SoundDevice for audio capture
- Vosk for offline speech recognition
- Pico TTS for speech synthesis
- DotStar LEDs for visual feedback
The behavior is intentionally simple:
- LEDs off while idle
- red while listening
- yellow while processing
- playback of the response
- lights off and return to standby
Pleasant Surprise: Pico TTS
The most positive surprise of the day was the speed of Pico TTS.
The voice is clearly synthetic, almost retro, but generation is immediate and perfectly usable on modest hardware. In Robie’s case, that limitation almost becomes a strength: the robotic sound fits the project’s identity.
The real challenge is therefore not the voice itself, but the quality of the audio chain (speakers, volume, mixing, output quality).
Speech Recognition Results
The tests confirmed that local transcription works, but with predictable limitations:
- noticeable latency
- variable results depending on speech clarity
- weaker performance with children’s voices
- more difficulty with younger users
One interesting lesson already emerged: the system performs better when the speaker talks clearly and without hesitation. That means part of the user experience will also involve learning how to interact with it effectively.
What This Session Validates
Even imperfect, the prototype proves several important points:
- a local voice assistant on Raspberry Pi is realistic
- open-source building blocks are enough for a credible V1
- the full wake word → STT → TTS loop genuinely works
- current limitations are more ergonomic than conceptual
This is a bigger milestone than it may seem: Robie is no longer just an assembly of components, but an object that listens, sometimes understands, and responds.
Improvement Paths
Future iterations could focus on:
- better recognition of children’s voices
- lower latency
- improved audio output quality
- more natural dialogues
- interruption during playback
- handling stories, voice notes, and multiple commands
Conclusion
The result is not perfect. Robie is sometimes slow, sometimes hesitant, sometimes clumsy.
But it works — and the children were wildly excited.