The default. Best accuracy, widest language support. Runs on Metal GPU with 10+ model sizes from 75MB to 2.9GB.
Fastest option. Runs on the Neural Engine so your GPU stays free. Supports CTC vocabulary boosting where your dictionary entries bias the acoustic decoder directly.
Apple's native framework. Available on macOS Tahoe and later with system-level optimization.
Download once, cached locally. Switch between models whenever you want.
| Model | Size | Speed | Notes |
|---|---|---|---|
| Tiny | 75 MB | Fastest | Quick, lower accuracy |
| Base | 142 MB | Fast | Good for simple dictation |
| Small | 466 MB | Medium | Balanced |
| Medium | 1.5 GB | Slow | High accuracy |
| Large V3 | 2.9 GB | Slowest | Maximum accuracy |
| Large V3 Turbo | 1.5 GB | Fast | 8x faster than Large V3 |
| Large V3 Turbo Q5Default | 547 MB | Fast | Default — best balance of speed, size, accuracy |
| Large V3 Q5 | 1.1 GB | Medium | Quantized, smaller file |
| Distil Large V3 | 756 MB | Very Fast | 6x faster than Large V3 |
| Distil Small (EN) | 166 MB | Very Fast | English only |
Start with Whisper. It's the default for a reason: best accuracy and language coverage. Want raw speed and mostly dictate in English? Try NVIDIA. Apple Speech needs macOS Tahoe or later.
Large V3 Turbo Q5 (547 MB) hits the sweet spot for most people. Smaller models (Tiny, Base) run faster on older hardware. Large V3 (2.9 GB) gives you maximum accuracy when audio quality is rough.
Yes. Change the backend or model size and the new one loads while the old one unloads. No restart.
Tiny uses ~100MB. Large V3 uses ~3GB. The default (Large V3 Turbo Q5) sits around 600MB. Whisperer checks memory before loading and warns you if there's not enough.