Skip to main content

Whisper Speech-to-Text

View source on GitHub

This guide walks you through building and running the Whisper speech-to-text reference on Avocado OS. The app records audio from a USB microphone on a Raspberry Pi 5, transcribes it locally using the Whisper tiny model, and logs the results to the journal.

Prerequisites

  • macOS 10.12+ or Linux (Ubuntu 22.04+, Fedora 39+)
  • Docker Desktop installed and running
  • The latest version of the Avocado CLI
  • Raspberry Pi 5 (or Raspberry Pi 4)
  • USB microphone
  • SD card

Initialize

Clone the reference or initialize a new project from it:

avocado init --reference python-whisper python-whisper
cd python-whisper

To use a Raspberry Pi 4 instead:

avocado init --reference python-whisper --target raspberrypi4 python-whisper
cd python-whisper

Install

Install the SDK toolchain, extension dependencies, and runtime packages:

avocado install -f

This pulls the SDK container image and installs nativesdk-uv for pip package compilation.

Build

Build the extensions and assemble the runtime image:

avocado build

The build step runs app-compile.sh inside the SDK container, which uses uv pip install to download openai-whisper, numpy, and sounddevice, then downloads the Whisper tiny model (~75MB). Then app-install.sh copies the packages to /usr/lib/app/packages/ and the model to /usr/lib/app/model/ in the extension sysroot.

Deploy

SD card

Insert your SD card and provision:

avocado provision -r dev --profile sd

Insert the SD card into the Raspberry Pi, connect the USB microphone, and apply power.

Verify

SSH into the Pi or connect via serial console. Log in as root with an empty password. The app service starts automatically on boot.

Check the service is running:

systemctl status app

Watch transcriptions in real time:

journalctl -u app -f

You should see output like:

whisper app starting on avocado-raspberrypi5
model: tiny (from /usr/lib/app/model)
record: 5s at 16000Hz
Loading Whisper model...
Model loaded.
Using audio device: plughw:1,0

Recording 5s of audio...
Transcribing...
{"device": "avocado-raspberrypi5", "timestamp": 1711234567, "text": "hello world this is a test", "language": "en", "audio_seconds": 5, "inference_seconds": 3.42}

Test the microphone manually

arecord -l # list capture devices
arecord -D plughw:1,0 -f S16_LE -r 16000 -c 1 -d 5 /tmp/test.wav # record 5 seconds
aplay /tmp/test.wav # play it back

Customize

Change the model size

Edit app/overlay/usr/local/bin/app.py:

MODEL_SIZE = "base" # use a larger model for better accuracy
RECORD_SECONDS = 10 # record longer clips

Available models:

ModelSizeRPi 5 Speed (5s audio)Accuracy
tiny75MB~3sGood for clear speech
base140MB~6sBetter accuracy
small460MB~20sMuch better accuracy

Rebuild after changes

After any change, rebuild and reprovision:

avocado build
avocado provision -r dev --profile sd