NVIDIA GStreamer YOLO Object Detection

View source on GitHub

This guide walks you through building and running the YOLO object detection reference on Avocado OS. The app captures video from a USB camera on the Jetson Orin Nano, runs GPU-accelerated object detection, and serves the annotated feed as an MJPEG stream.

Prerequisites

macOS 10.12+ or Linux (Ubuntu 22.04+, Fedora 39+)
Docker Desktop installed and running
The latest version of the Avocado CLI
An NVIDIA Jetson Orin board: Orin Nano Developer Kit (default) or AGX Orin Developer Kit
USB (UVC) camera (e.g., Opal Tadpole)
SD card or USB cable for flashing

To target the AGX Orin, pass --target jetson-agx-orin-devkit to the avocado commands below. The committed TensorRT engine is built for the Orin Nano, so on the AGX the on-device fallback compiles a matching engine on first boot (a one-time, few-minute cost, then cached) — the dashboard comes up automatically once it finishes.

Initialize

Clone the reference or initialize a new project from it:

avocado init --reference nvidia-gstreamer-yolo nvidia-gstreamer-yolo
cd nvidia-gstreamer-yolo

Install

Install the SDK toolchain, extension dependencies, and runtime packages:

avocado install -f

This pulls the SDK container image and resolves the runtime packages across the extensions. The dependency layer (vision-runtime) includes TensorRT (python3-tensorrt, tensorrt-core), the CUDA Python bindings (python3-cuda), cuDNN, OpenCV (used for image pre/post-processing only), the NVIDIA GStreamer plugins, the UVC camera driver, and Python + PyGObject + Flask — all from the feed, no vendored pip packages.

Extension layout

The pipeline is split across four extensions so an OTA only re-ships what changed — see the README's Extension layout table. In short: vision-runtime (deps), vision-models (the ONNX), vision-engines (the prebuilt TensorRT engine), and vision-app (the code, plus its tunables in app.service).

Build

Build the extensions and assemble the runtime image:

avocado build

The build composes each extension. Each one ships as a plain overlay — the YOLO11n model from vision-models/overlay/usr/lib/app/models/, the prebuilt TensorRT engine from vision-engines/overlay/usr/lib/app/engines/, and the application code from vision-app/overlay/. There is no pip/compile step — Flask and the rest come from the feed via vision-runtime.

Deploy

Flash the image to an SD card or eMMC, connect a USB camera, and boot the Jetson Orin Nano.

avocado provision -r dev

Verify

Log in as root with an empty password. The reference ships a prebuilt FP16 TensorRT engine (committed in the vision-engines overlay and shipped read-only to /usr/lib/app/engines/), so the app starts detecting immediately — no on-device compile wait.

yolo-engine-build.service still runs at boot but is a near-instant no-op while the prebuilt engine is present. It only compiles an engine when there isn't one for the active model — e.g. you swap to yolo11s, or the embedded engine fails to load after a TensorRT version bump in the feed (in which case the app rebuilds it on-device automatically, a one-time ~few-minute cost that's then cached under /var/lib/app).

Open your browser to http://<device-ip>:5000 to view the dashboard.

systemctl status yolo-engine-build   # one-time engine compile (first boot)
systemctl status app
journalctl -u app -f

You should see output like:

app starting
  device: avocado-jetson-orin-nano-devkit
  camera: /dev/video0 (1280x720@30fps)
  model: /usr/lib/app/models/yolo11n.onnx
  tensorrt: 10.3.0
  loading TensorRT engine: /usr/lib/app/engines/yolo11n.onnx.fp16.engine
  TensorRT engine active (warmup: 42ms, in=(1, 3, 640, 640) out=(1, 84, 8400))
  detector ready — backend=TensorRT target=CUDA
  trying pipeline 1/4 [nvidia-mjpeg-decode]...
  pipeline started: [nvidia-mjpeg-decode] (GPU=True)

If TensorRT can't initialize for any reason, the app falls back to CPU inference via OpenCV DNN (backend=OpenCV target=CPU — much slower) so the dashboard still works.

The dashboard shows a live annotated video feed, detected objects with confidence scores, inference FPS, and device metrics.

API Endpoints

GET /api/stats — device metrics, model status, and current detections
GET /api/detections — current detections only
GET /stream — live MJPEG stream with bounding boxes
GET / — web dashboard

Customize

Configure your camera

Check what your camera supports and adjust the settings:

v4l2-ctl --list-formats-ext -d /dev/video0

Uncomment and edit the camera tunables in vision-app/overlay/usr/lib/systemd/system/app.service:

Environment=CAMERA_DEVICE=/dev/video0
Environment=CAMERA_WIDTH=1280
Environment=CAMERA_HEIGHT=720
Environment=CAMERA_FRAMERATE=30

On-device you can systemctl edit app to drop in an override and systemctl restart app for a quick test without rebuilding.

Change the model

Replace yolo11n.onnx with a different YOLO11 variant for accuracy vs. speed:

Model	Size	Speed	Accuracy
`yolo11n.onnx`	11 MB	Fastest	Good
`yolo11s.onnx`	37 MB	Fast	Better
`yolo11m.onnx`	77 MB	Medium	High

Export a new model to ONNX (TensorRT 10 handles recent opsets — opset 17 is a safe default):

uv run --with ultralytics python3 -c "
from ultralytics import YOLO
model = YOLO('yolo11s.pt')
model.export(format='onnx', opset=17)
"
cp yolo11s.onnx vision-models/overlay/usr/lib/app/models/

Point the app at the new file via a MODEL_PATH Environment line in app.service (or just replace yolo11n.onnx). The engine filename is derived from the model name, so a new model has no prebuilt engine and triggers a fresh on-device build automatically — or commit a prebuilt one into vision-engines/overlay/usr/lib/app/engines/ to keep first boot instant (see Regenerating the engine). To force a rebuild after re-exporting the same filename, delete the cached engine: rm /var/lib/app/*.engine.

Adjust detection sensitivity

Uncomment CONFIDENCE_THRESHOLD / NMS_THRESHOLD in app.service:

Environment=CONFIDENCE_THRESHOLD=0.3    # lower = more detections (default 0.5)
Environment=NMS_THRESHOLD=0.45          # non-maximum suppression threshold

Regenerating the engine

A TensorRT engine is GPU- and TRT-version-specific, so it's built on matching hardware (the Orin Nano) and committed into the vision-engines overlay. To regenerate it (after swapping the model, or on a TensorRT bump), let the device build one and copy it back into the reference:

scp root@<device-ip>:/var/lib/app/<model>.onnx.fp16.engine \
    vision-engines/overlay/usr/lib/app/engines/

The next avocado build ships it directly via the overlay. If you skip this, the on-device fallback still produces a working engine at first boot — it just costs a few minutes once.

Rebuild after changes

After any change, rebuild and reprovision:

avocado build
avocado provision -r dev

Prerequisites​

Initialize​

Install​

Extension layout​

Build​

Deploy​

Verify​

API Endpoints​

Customize​

Configure your camera​

Change the model​

Adjust detection sensitivity​

Regenerating the engine​

Rebuild after changes​