Whisperer supports three transcription backends. Each runs on different hardware and has different tradeoffs. Here's the quick comparison.
At a glance#
| Whisper | NVIDIA | Apple Speech | |
|---|---|---|---|
| Runs on | Metal GPU | Neural Engine | System ML |
| Languages | 99+ | 25 | Whatever Apple supports |
| Speed | Fast | Fastest | Depends |
| Best for | Accuracy, language coverage | Raw speed | Zero setup |
Whisper — The Default#
The default. Runs on Apple Silicon's Metal GPU.
What's good:
- 99+ languages
- 10+ model sizes (75 MB to 2.9 GB)
- Large V3 Turbo Q5 (547 MB) is a solid default: fast, accurate, reasonable size
- Quantized models (Q5) cut file size by ~60% without much accuracy loss
When to use: Start here. Works well for most people.
Recommended model: Large V3 Turbo Q5 (547 MB). Fast, accurate, not huge.
NVIDIA — The Fast One#
NVIDIA runs on the Neural Engine (ANE) via CoreML. Separate chip from the GPU, so Metal stays free for other work.
What's good:
- Fastest option
- CTC vocabulary boosting: your dictionary entries bias the decoder directly
- Leaves GPU available for video editing, ML, etc.
Limitations:
- 25 languages (not 99+)
- Fewer model options
When to use: You want speed, you work in a supported language, and you have specialized terminology that benefits from vocabulary boosting.
Apple Speech — The Native Option#
macOS 26+ only. Uses Apple's built-in SpeechAnalyzer framework. No models to download.
When to use: You're on macOS 26, want zero setup, or need a lightweight option.
How they work together#
All three share the same interface:
- Switch engines without restarting
- Dictionary, filler removal, AI post-processing work identically
- Same 16kHz mono Float32 audio input
Live Preview#
The live preview uses a separate lightweight model on the Neural Engine. So if you're using Whisper (GPU), preview runs on ANE with no contention. If you're using NVIDIA (ANE), the EOU model shares the chip but it's optimized to stay out of the way.
Recommendation#
Start with Whisper (Large V3 Turbo Q5). If you need more speed and work in a top-25 language, try NVIDIA. You can always switch back.
More details: best Whisper model for dictation, offline transcription engines.