Model Guide

Supported models, ONNX export, and optimization for YOLOs-CPP.

Supported Models

YOLOE (Open-Vocabulary Detection & Segmentation)

YOLOE (arXiv 2503.07465) is a promptable YOLO variant for detection and instance segmentation. In Python, you can choose classes with set_classes() before inference, use prompt-free checkpoints with a large fixed vocabulary, or (in some setups) visual prompts.

YOLOE and ONNX runtime (Ultralytics)

Ultralytics documents ONNX export for deployment in other applications and devices; ONNX Runtime (including C and C++) is a normal path—this project is not Python-only at inference time.

The important limitation for exported ONNX:

Text-prompt YOLOE: Ultralytics shows model.set_classes([...]) before model.export(format="onnx"). That bakes the chosen class list into the graph weights / head for that export. A plain .onnx file does not accept arbitrary new text prompts at runtime the way interactive Python set_classes() does. In C++, the strings you pass are labels for the fixed output channels (same count and order as export).
Prompt-free YOLOE-PF (-pf checkpoints): These ship with a large built-in vocabulary and do not need prompts at inference. Ultralytics documents this as a fixed predefined list (e.g. 4,585 classes in current docs—confirm against your Ultralytics version). That is broad “open-set” behavior inside that list, not unlimited free-form phrases at ORT runtime.
Visual prompting: Ultralytics states that visual prompting at inference currently requires the Python API. After you fix classes from a visual prompt, you can export to ONNX and deploy that exported model in C++ like any other text-prompt export.

Summary: ONNX deployment is not Python-only; authoring dynamic prompts (beyond relabeling fixed channels) is mostly a Python/export-time concern unless your ONNX explicitly includes a runtime text path matching Ultralytics’ export.

Prompt Modes

Mode	How it works	When to use
Text prompt	Class list set in Python before `export(format="onnx")` — baked into that ONNX	Fixed custom list, fastest ORT inference
Visual prompt	Inference-time visual prompt: Python API; then export retains fixed classes	Prototype in Python, ship an exported ONNX
Prompt-free	Large fixed built-in vocabulary in `-pf` checkpoints (Ultralytics: e.g. thousands of classes)	Discovery within that vocabulary; no per-query text in ORT

Available Models

Model	Backbone	Params	LVIS mAP†	Speed (T4)†	CPU 640×640‡	ONNX Size
YOLOE-26N-seg	YOLO26	3.2M	—	~8ms (125 FPS)	97ms (10 FPS)	10.6 MB
YOLOE-26S-seg	YOLO26	11M	29.9%	6.2ms (161 FPS)	256ms (3 FPS)	39.9 MB
YOLOE-26M-seg	YOLO26	21M	33.1%	6.2ms (161 FPS)	~600ms	~80 MB
YOLOE-26L-seg	YOLO26	32.3M	36.8%	6.2ms (161 FPS)	~900ms	~130 MB
YOLOE-26X-seg	YOLO26	53M	38.0%	~8ms	—	—
YOLOE-11S-seg	YOLO11	10M	—	6.2ms	—	—
YOLOE-11M-seg	YOLO11	20M	—	6.2ms	—	—
YOLOE-11L-seg	YOLO11	26.2M	35.2%	6.2ms (161 FPS)	—	—

† Paper numbers (Ultralytics, T4 GPU, full LVIS evaluation).
‡ Measured on Intel i7-1185G7 CPU, ONNX Runtime 1.20.1, 6-class text-prompt export, 200 iterations.
See benchmarks/results/BENCHMARK_REPORT.md for full statistics.

CPU Benchmark — YOLOE vs Closed-Set Baseline

Measured: i7-1185G7, ORT 1.20.1 CPU, 640×640, 200 iters, 30 warmup, dog.jpg 768×576

Model	Type	Classes	FPS	Avg(ms)	P50(ms)	P90(ms)	P99(ms)	σ(ms)	Mem
yoloe-26n-seg	Open-vocab	6	10	97.0	92.3	118.9	181.0	23.5	218 MB
yoloe-26s-seg	Open-vocab	6	3	255.6	246.1	301.0	367.4	44.0	346 MB
yolo26n-seg (baseline)	Closed-set	80	9	110.6	104.9	147.8	291.4	37.5	219 MB

Finding: YOLOE-26n-seg with 6 exported classes is faster and more consistent than the 80-class YOLO26n-seg on CPU (+11% FPS, −12% avg latency, −37% P99). The YOLOE architecture overhead is zero at inference — RepRTA/SAVPE modules are fully re-parameterised into the standard YOLO head on export. Speed advantage grows as the target vocabulary shrinks: fewer classes = fewer postprocessing channels.

All -pf prompt-free variants are also available (e.g. yoloe-26s-seg-pf.pt).

Exporting to ONNX

pip install -U ultralytics
python scripts/export_yoloe_onnx.py

To export a custom class list (same count and order as in C++), use:

python scripts/export_yoloe_classes.py --out models/my-yoloe.onnx dog person car bus bicycle motorcycle truck

Or manually:

from ultralytics import YOLOE

# Text-prompt export (classes baked in)
model = YOLOE("yoloe-26s-seg.pt")
model.set_classes(["person", "car", "bus"])
model.export(format="onnx", nms=False)   # nms=False — C++ handles NMS

# Prompt-free export (large fixed vocab — match labels file line count to this export)
model_pf = YOLOE("yoloe-26s-seg-pf.pt")
model_pf.export(format="onnx", nms=False)

Note: Always export with nms=False. YOLOs-CPP uses class-agnostic NMS (enabled by default in YOLOESegDetector) which prevents duplicate detections across YOLOE's large vocabulary.

ONNX filenames (consistent with `scripts/export_yoloe_onnx.py`)

Artifact	When
`yoloe-26s-seg-text.onnx`, `yoloe-11s-seg-text.onnx`, `yoloe-26l-seg-text.onnx`	Text-prompt exports from the script (renamed after `model.export()`)
`yoloe-26s-seg-pf.onnx`	Prompt-free export
`yoloe-26n-seg.onnx` (or any name)	Manual export keeps Ultralytics’ default stem unless you rename

Use the same class list in C++ as in model.set_classes([...]) for that ONNX.

C++ Usage

#include "yolos/tasks/yoloe.hpp"
using namespace yolos::yoloe;

// ── Text-prompt mode (classes baked in at export time) ──────────────────────
YOLOESegDetector det(
    "models/yoloe-26s-seg-text.onnx",
    {"person", "car", "bus", "bicycle", "motorcycle", "truck"}
    /* useGPU = true by default */
);
auto segs = det.segment(frame, 0.35f, 0.45f);
det.drawSegmentations(frame, segs, 0.45f);

// Same API with a nano export (e.g. manual: yoloe-26n-seg.onnx)
// YOLOESegDetector det_n("models/yoloe-26n-seg.onnx", { ... }, useGPU);

// ── Prompt-free mode (large fixed vocabulary — labels file must match export) ─
YOLOESegDetector det_pf(
    "models/yoloe-26s-seg-pf.onnx",
    "models/yoloe_pf.names"   // one class name per line; line count = ONNX class count
);
auto segs_pf = det_pf.segment(frame);

// ── Relabel fixed channels (same count as export — not new concepts) ─────────
det.setClasses({"person", "car", "bus", "bicycle", "motorcycle", "truck"});

// ── Detection only (no masks): export a YOLOE *detection* checkpoint to ONNX, then:
// YOLOEDetector det_only("models/my_yoloe_det.onnx", {"person", "car"});
// auto dets = det_only.detect(frame);

// ── Factory helper ──────────────────────────────────────────────────────────
auto p = createYOLOESegDetector(
    "models/yoloe-26l-seg-text.onnx",
    {"person", "bus"},
    /*useGPU=*/true,
    /*agnosticNms=*/true
);

Running the Demo

Use image_yoloe_seg for still images (image → image) and video_yoloe_seg for videos (MP4 → MP4).

# Text-prompt mode — single image
./image_yoloe_seg data/dog.jpg out.jpg \
    models/yoloe-26n-seg.onnx \
    "person,car,bus,bicycle,motorcycle,truck" 0

# Text-prompt mode — video
./video_yoloe_seg data/Transmission.mp4 out.mp4 \
    models/yoloe-26n-seg.onnx \
    "person,car,bus,bicycle,motorcycle,truck" 1

# Prompt-free mode — video (labels file must match PF ONNX class count)
./video_yoloe_seg data/Transmission.mp4 out.mp4 \
    models/yoloe-26s-seg-pf.onnx \
    models/yoloe_pf.names 1

Running Benchmarks

# Quick single-model benchmark
./benchmarks/build/yolo_unified_benchmark \
    image yoloe-26n-seg yoloe-seg \
    models/yoloe-26n-seg.onnx \
    "person,car,bus,bicycle,motorcycle,truck" \
    data/dog.jpg --iterations=200 --warmup=30 --json

# Comprehensive suite — all models including YOLOE, sorted by FPS
./benchmarks/build/yolo_unified_benchmark \
    comprehensive data/dog.jpg models --iterations=150 --warmup=20

# GPU benchmark
./benchmarks/build/yolo_unified_benchmark \
    comprehensive data/dog.jpg models --gpu --iterations=500 --warmup=50

See benchmarks/results/BENCHMARK_REPORT.md for the full report including latency percentiles, memory usage, and comparative analysis.

Label Files

File	Classes	Use Case
`coco.names`	80	Closed-set detection
`yoloe_pf.names` (example name)	Must match your PF ONNX	Prompt-free YOLOE — one line per output class

For prompt-free models, the line count must equal the number of classes in that ONNX (see Ultralytics docs / ONNX metadata for your checkpoint). A LVIS category list is one way to obtain names when your export is LVIS-aligned (~1203 categories); Ultralytics YOLOE-PF docs describe a larger fixed vocabulary (e.g. 4,585 classes)—use the list that matches your exported yoloe-*-pf.onnx, not a generic file from another model.

Example (LVIS categories — only if compatible with your export):

from ultralytics.data.dataset import LVISDataset
names = LVISDataset.get_lvis_categories()
with open("models/lvis.names", "w") as f:
    f.write("\n".join(names))

Detection

Model	Params	mAP	Speed (GPU)
YOLOv5n	1.9M	28.0	6.3ms
YOLOv8n	3.2M	37.3	6.2ms
YOLOv11n	2.6M	39.5	6.5ms
YOLO26n	2.5M	40.2	7.1ms

Segmentation

Model	Params	mAP	Speed (GPU)
YOLOv8n-seg	3.4M	36.7	8.4ms
YOLOv11n-seg	2.9M	38.9	8.1ms
YOLO26n-seg	2.8M	39.4	8.8ms

Pose Estimation

Model	Params	mAP	Speed (GPU)
YOLOv8n-pose	3.3M	50.4	5.9ms
YOLOv11n-pose	2.9M	52.1	5.7ms
YOLO26n-pose	2.8M	53.0	6.2ms

OBB (Oriented Bounding Boxes)

Model	Params	Dataset
YOLOv8n-obb	3.1M	DOTA
YOLOv11n-obb	2.7M	DOTA
YOLO26n-obb	2.6M	DOTA

Classification

Model	Params	Top-1 Acc
YOLOv8n-cls	2.7M	66.6%
YOLOv11n-cls	1.6M	70.0%
YOLO26n-cls	1.5M	71.2%

Exporting to ONNX

Using Ultralytics

from ultralytics import YOLO

# Load model
model = YOLO("yolo11n.pt")

# Export to ONNX
model.export(
    format="onnx",
    imgsz=640,
    opset=12,        # ONNX opset version
    simplify=False,
    half=False,
    dynamic=False,
    nms=False        # NMS is done in C++
)

Export Options

Option	Value	Notes
`opset`	12-17	Use 12 for max compatibility
`imgsz`	640	Match your inference resolution
`half`	False	FP32 for accuracy (FP16 optional)
`dynamic`	False	Static shapes for best performance
`nms`	False	C++ handles NMS

Batch Export Script

python models/export_onnx.py

Label Files

File	Classes	Use Case
`coco.names`	80	General detection
`Dota.names`	15	Aerial/satellite OBB
`imagenet_classes.txt`	1000	Classification

Model Paths

// Detection
"models/yolo11n.onnx"

// Segmentation
"models/yolo11n-seg.onnx"

// Pose
"models/yolo11n-pose.onnx"

// OBB
"models/yolo11n-obb.onnx"

// Classification
"models/yolo11n-cls.onnx"

Quantization

Quantized models offer: - 2-4x smaller file size - 1.5-2x faster CPU inference - Slight accuracy loss (~1-2% mAP)

Quantize with ONNX

from onnxruntime.quantization import quantize_dynamic

quantize_dynamic(
    "model.onnx",
    "model_int8.onnx"
)

See quantized_models/yolos_quantization.py for examples.

Custom Models

To use custom-trained models:

Train with Ultralytics
Export to ONNX with compatible settings
Create matching label file
Load in YOLOs-CPP

yolos::det::YOLODetector detector(
    "custom_model.onnx",
    "custom_labels.txt",
    true
);

Model Guide

Supported Models

YOLOE (Open-Vocabulary Detection & Segmentation)

YOLOE and ONNX runtime (Ultralytics)

Prompt Modes

Available Models

CPU Benchmark — YOLOE vs Closed-Set Baseline

Exporting to ONNX

ONNX filenames (consistent with scripts/export_yoloe_onnx.py)

C++ Usage

Running the Demo

Running Benchmarks

Label Files

Detection

Segmentation

Pose Estimation

OBB (Oriented Bounding Boxes)

Classification

Exporting to ONNX

Using Ultralytics

Export Options

Batch Export Script

Label Files

Model Paths

Quantization

Quantize with ONNX

Custom Models

ONNX filenames (consistent with `scripts/export_yoloe_onnx.py`)