Model Guide
Supported models, ONNX export, and optimization for YOLOs-CPP.
Supported Models
YOLOE (Open-Vocabulary Detection & Segmentation)
YOLOE (arXiv 2503.07465) is a promptable YOLO variant for
detection and instance segmentation. In Python, you can choose classes with set_classes()
before inference, use prompt-free checkpoints with a large fixed vocabulary, or (in some setups)
visual prompts.
YOLOE and ONNX runtime (Ultralytics)
Ultralytics documents ONNX export for deployment in other applications and devices; ONNX Runtime (including C and C++) is a normal path—this project is not Python-only at inference time.
The important limitation for exported ONNX:
- Text-prompt YOLOE: Ultralytics shows
model.set_classes([...])beforemodel.export(format="onnx"). That bakes the chosen class list into the graph weights / head for that export. A plain.onnxfile does not accept arbitrary new text prompts at runtime the way interactive Pythonset_classes()does. In C++, the strings you pass are labels for the fixed output channels (same count and order as export). - Prompt-free YOLOE-PF (
-pfcheckpoints): These ship with a large built-in vocabulary and do not need prompts at inference. Ultralytics documents this as a fixed predefined list (e.g. 4,585 classes in current docs—confirm against your Ultralytics version). That is broad “open-set” behavior inside that list, not unlimited free-form phrases at ORT runtime. - Visual prompting: Ultralytics states that visual prompting at inference currently requires the Python API. After you fix classes from a visual prompt, you can export to ONNX and deploy that exported model in C++ like any other text-prompt export.
Summary: ONNX deployment is not Python-only; authoring dynamic prompts (beyond relabeling fixed channels) is mostly a Python/export-time concern unless your ONNX explicitly includes a runtime text path matching Ultralytics’ export.
Prompt Modes
| Mode | How it works | When to use |
|---|---|---|
| Text prompt | Class list set in Python before export(format="onnx") — baked into that ONNX |
Fixed custom list, fastest ORT inference |
| Visual prompt | Inference-time visual prompt: Python API; then export retains fixed classes | Prototype in Python, ship an exported ONNX |
| Prompt-free | Large fixed built-in vocabulary in -pf checkpoints (Ultralytics: e.g. thousands of classes) |
Discovery within that vocabulary; no per-query text in ORT |
Available Models
| Model | Backbone | Params | LVIS mAP† | Speed (T4)† | CPU 640×640‡ | ONNX Size |
|---|---|---|---|---|---|---|
| YOLOE-26N-seg | YOLO26 | 3.2M | — | ~8ms (125 FPS) | 97ms (10 FPS) | 10.6 MB |
| YOLOE-26S-seg | YOLO26 | 11M | 29.9% | 6.2ms (161 FPS) | 256ms (3 FPS) | 39.9 MB |
| YOLOE-26M-seg | YOLO26 | 21M | 33.1% | 6.2ms (161 FPS) | ~600ms | ~80 MB |
| YOLOE-26L-seg | YOLO26 | 32.3M | 36.8% | 6.2ms (161 FPS) | ~900ms | ~130 MB |
| YOLOE-26X-seg | YOLO26 | 53M | 38.0% | ~8ms | — | — |
| YOLOE-11S-seg | YOLO11 | 10M | — | 6.2ms | — | — |
| YOLOE-11M-seg | YOLO11 | 20M | — | 6.2ms | — | — |
| YOLOE-11L-seg | YOLO11 | 26.2M | 35.2% | 6.2ms (161 FPS) | — | — |
† Paper numbers (Ultralytics, T4 GPU, full LVIS evaluation).
‡ Measured on Intel i7-1185G7 CPU, ONNX Runtime 1.20.1, 6-class text-prompt export, 200 iterations.
See benchmarks/results/BENCHMARK_REPORT.md for full statistics.
CPU Benchmark — YOLOE vs Closed-Set Baseline
Measured: i7-1185G7, ORT 1.20.1 CPU, 640×640, 200 iters, 30 warmup, dog.jpg 768×576
| Model | Type | Classes | FPS | Avg(ms) | P50(ms) | P90(ms) | P99(ms) | σ(ms) | Mem |
|---|---|---|---|---|---|---|---|---|---|
| yoloe-26n-seg | Open-vocab | 6 | 10 | 97.0 | 92.3 | 118.9 | 181.0 | 23.5 | 218 MB |
| yoloe-26s-seg | Open-vocab | 6 | 3 | 255.6 | 246.1 | 301.0 | 367.4 | 44.0 | 346 MB |
| yolo26n-seg (baseline) | Closed-set | 80 | 9 | 110.6 | 104.9 | 147.8 | 291.4 | 37.5 | 219 MB |
Finding: YOLOE-26n-seg with 6 exported classes is faster and more consistent than the 80-class YOLO26n-seg on CPU (+11% FPS, −12% avg latency, −37% P99). The YOLOE architecture overhead is zero at inference — RepRTA/SAVPE modules are fully re-parameterised into the standard YOLO head on export. Speed advantage grows as the target vocabulary shrinks: fewer classes = fewer postprocessing channels.
All -pf prompt-free variants are also available (e.g. yoloe-26s-seg-pf.pt).
Exporting to ONNX
pip install -U ultralytics
python scripts/export_yoloe_onnx.py
To export a custom class list (same count and order as in C++), use:
python scripts/export_yoloe_classes.py --out models/my-yoloe.onnx dog person car bus bicycle motorcycle truck
Or manually:
from ultralytics import YOLOE
# Text-prompt export (classes baked in)
model = YOLOE("yoloe-26s-seg.pt")
model.set_classes(["person", "car", "bus"])
model.export(format="onnx", nms=False) # nms=False — C++ handles NMS
# Prompt-free export (large fixed vocab — match labels file line count to this export)
model_pf = YOLOE("yoloe-26s-seg-pf.pt")
model_pf.export(format="onnx", nms=False)
Note: Always export with
nms=False. YOLOs-CPP uses class-agnostic NMS (enabled by default inYOLOESegDetector) which prevents duplicate detections across YOLOE's large vocabulary.
ONNX filenames (consistent with scripts/export_yoloe_onnx.py)
| Artifact | When |
|---|---|
yoloe-26s-seg-text.onnx, yoloe-11s-seg-text.onnx, yoloe-26l-seg-text.onnx |
Text-prompt exports from the script (renamed after model.export()) |
yoloe-26s-seg-pf.onnx |
Prompt-free export |
yoloe-26n-seg.onnx (or any name) |
Manual export keeps Ultralytics’ default stem unless you rename |
Use the same class list in C++ as in model.set_classes([...]) for that ONNX.
C++ Usage
#include "yolos/tasks/yoloe.hpp"
using namespace yolos::yoloe;
// ── Text-prompt mode (classes baked in at export time) ──────────────────────
YOLOESegDetector det(
"models/yoloe-26s-seg-text.onnx",
{"person", "car", "bus", "bicycle", "motorcycle", "truck"}
/* useGPU = true by default */
);
auto segs = det.segment(frame, 0.35f, 0.45f);
det.drawSegmentations(frame, segs, 0.45f);
// Same API with a nano export (e.g. manual: yoloe-26n-seg.onnx)
// YOLOESegDetector det_n("models/yoloe-26n-seg.onnx", { ... }, useGPU);
// ── Prompt-free mode (large fixed vocabulary — labels file must match export) ─
YOLOESegDetector det_pf(
"models/yoloe-26s-seg-pf.onnx",
"models/yoloe_pf.names" // one class name per line; line count = ONNX class count
);
auto segs_pf = det_pf.segment(frame);
// ── Relabel fixed channels (same count as export — not new concepts) ─────────
det.setClasses({"person", "car", "bus", "bicycle", "motorcycle", "truck"});
// ── Detection only (no masks): export a YOLOE *detection* checkpoint to ONNX, then:
// YOLOEDetector det_only("models/my_yoloe_det.onnx", {"person", "car"});
// auto dets = det_only.detect(frame);
// ── Factory helper ──────────────────────────────────────────────────────────
auto p = createYOLOESegDetector(
"models/yoloe-26l-seg-text.onnx",
{"person", "bus"},
/*useGPU=*/true,
/*agnosticNms=*/true
);
Running the Demo
Use image_yoloe_seg for still images (image → image) and video_yoloe_seg for videos (MP4 → MP4).
# Text-prompt mode — single image
./image_yoloe_seg data/dog.jpg out.jpg \
models/yoloe-26n-seg.onnx \
"person,car,bus,bicycle,motorcycle,truck" 0
# Text-prompt mode — video
./video_yoloe_seg data/Transmission.mp4 out.mp4 \
models/yoloe-26n-seg.onnx \
"person,car,bus,bicycle,motorcycle,truck" 1
# Prompt-free mode — video (labels file must match PF ONNX class count)
./video_yoloe_seg data/Transmission.mp4 out.mp4 \
models/yoloe-26s-seg-pf.onnx \
models/yoloe_pf.names 1
Running Benchmarks
# Quick single-model benchmark
./benchmarks/build/yolo_unified_benchmark \
image yoloe-26n-seg yoloe-seg \
models/yoloe-26n-seg.onnx \
"person,car,bus,bicycle,motorcycle,truck" \
data/dog.jpg --iterations=200 --warmup=30 --json
# Comprehensive suite — all models including YOLOE, sorted by FPS
./benchmarks/build/yolo_unified_benchmark \
comprehensive data/dog.jpg models --iterations=150 --warmup=20
# GPU benchmark
./benchmarks/build/yolo_unified_benchmark \
comprehensive data/dog.jpg models --gpu --iterations=500 --warmup=50
See benchmarks/results/BENCHMARK_REPORT.md for the
full report including latency percentiles, memory usage, and comparative analysis.
Label Files
| File | Classes | Use Case |
|---|---|---|
coco.names |
80 | Closed-set detection |
yoloe_pf.names (example name) |
Must match your PF ONNX | Prompt-free YOLOE — one line per output class |
For prompt-free models, the line count must equal the number of classes in that ONNX (see
Ultralytics docs / ONNX metadata for your checkpoint). A LVIS category list is one way to
obtain names when your export is LVIS-aligned (~1203 categories); Ultralytics YOLOE-PF docs describe
a larger fixed vocabulary (e.g. 4,585 classes)—use the list that matches your exported
yoloe-*-pf.onnx, not a generic file from another model.
Example (LVIS categories — only if compatible with your export):
from ultralytics.data.dataset import LVISDataset
names = LVISDataset.get_lvis_categories()
with open("models/lvis.names", "w") as f:
f.write("\n".join(names))
Detection
| Model | Params | mAP | Speed (GPU) |
|---|---|---|---|
| YOLOv5n | 1.9M | 28.0 | 6.3ms |
| YOLOv8n | 3.2M | 37.3 | 6.2ms |
| YOLOv11n | 2.6M | 39.5 | 6.5ms |
| YOLO26n | 2.5M | 40.2 | 7.1ms |
Segmentation
| Model | Params | mAP | Speed (GPU) |
|---|---|---|---|
| YOLOv8n-seg | 3.4M | 36.7 | 8.4ms |
| YOLOv11n-seg | 2.9M | 38.9 | 8.1ms |
| YOLO26n-seg | 2.8M | 39.4 | 8.8ms |
Pose Estimation
| Model | Params | mAP | Speed (GPU) |
|---|---|---|---|
| YOLOv8n-pose | 3.3M | 50.4 | 5.9ms |
| YOLOv11n-pose | 2.9M | 52.1 | 5.7ms |
| YOLO26n-pose | 2.8M | 53.0 | 6.2ms |
OBB (Oriented Bounding Boxes)
| Model | Params | Dataset |
|---|---|---|
| YOLOv8n-obb | 3.1M | DOTA |
| YOLOv11n-obb | 2.7M | DOTA |
| YOLO26n-obb | 2.6M | DOTA |
Classification
| Model | Params | Top-1 Acc |
|---|---|---|
| YOLOv8n-cls | 2.7M | 66.6% |
| YOLOv11n-cls | 1.6M | 70.0% |
| YOLO26n-cls | 1.5M | 71.2% |
Exporting to ONNX
Using Ultralytics
from ultralytics import YOLO
# Load model
model = YOLO("yolo11n.pt")
# Export to ONNX
model.export(
format="onnx",
imgsz=640,
opset=12, # ONNX opset version
simplify=False,
half=False,
dynamic=False,
nms=False # NMS is done in C++
)
Export Options
| Option | Value | Notes |
|---|---|---|
opset |
12-17 | Use 12 for max compatibility |
imgsz |
640 | Match your inference resolution |
half |
False | FP32 for accuracy (FP16 optional) |
dynamic |
False | Static shapes for best performance |
nms |
False | C++ handles NMS |
Batch Export Script
python models/export_onnx.py
Label Files
| File | Classes | Use Case |
|---|---|---|
coco.names |
80 | General detection |
Dota.names |
15 | Aerial/satellite OBB |
imagenet_classes.txt |
1000 | Classification |
Model Paths
// Detection
"models/yolo11n.onnx"
// Segmentation
"models/yolo11n-seg.onnx"
// Pose
"models/yolo11n-pose.onnx"
// OBB
"models/yolo11n-obb.onnx"
// Classification
"models/yolo11n-cls.onnx"
Quantization
Quantized models offer: - 2-4x smaller file size - 1.5-2x faster CPU inference - Slight accuracy loss (~1-2% mAP)
Quantize with ONNX
from onnxruntime.quantization import quantize_dynamic
quantize_dynamic(
"model.onnx",
"model_int8.onnx"
)
See quantized_models/yolos_quantization.py for examples.
Custom Models
To use custom-trained models:
- Train with Ultralytics
- Export to ONNX with compatible settings
- Create matching label file
- Load in YOLOs-CPP
yolos::det::YOLODetector detector(
"custom_model.onnx",
"custom_labels.txt",
true
);