Whisperer runs three different transcription engines. They use different hardware and have different strengths. Here's how to pick.
Overview#
| Whisper | NVIDIA | Apple Speech | |
|---|---|---|---|
| Library | Local Whisper | CoreML | SpeechAnalyzer |
| Hardware | Metal GPU | Apple Neural Engine | System ML |
| Languages | 99+ | 25 | System languages |
| Models | 10+ (75MB–2.9GB) | Optimized | Built-in |
| Accuracy | Highest | Very high | Good |
| Speed | Fast (GPU) | Fastest (ANE) | Fast |
| Min macOS | 14+ | 14+ (Apple Silicon) | 26+ (Tahoe) |
Whisper — The Default#
OpenAI's open-source model, running locally. No Python, no dependencies.
How it works#
Processes audio after you release the record button. Runs on Apple Silicon's Metal GPU.
Model sizes#
| Model | Size | Speed | Accuracy | Best For |
|---|---|---|---|---|
| Tiny | 75MB | Fastest | Basic | Quick testing |
| Base | 142MB | Very fast | Good | Low-storage devices |
| Small | 466MB | Fast | High | Daily dictation |
| Medium | 1.5GB | Moderate | Very high | Accuracy-focused |
| Large V3 | 2.9GB | Slower | Highest | Maximum accuracy |
| Large V3 Turbo | ~1.5GB | Fast | Very high | Best balanced |
Start here: Large V3 Turbo (~1.5GB). Near-Large accuracy, much faster. Good default.
When to use it#
- High accuracy matters
- Non-English (99+ languages)
- You want control over the speed/accuracy tradeoff
- File transcription
NVIDIA — The fast one#
CTC-based speech recognition running on Apple Silicon's Neural Engine via CoreML. Fastest option, and has a feature the others don't: vocabulary boosting.
How it works#
NVIDIA uses the Neural Engine, which is separate from the GPU. So it can run alongside GPU work with no contention. The same chip powers Whisperer's live preview (~300ms latency).
Vocabulary boosting#
This is why NVIDIA matters. Your personal dictionary entries bias the CTC decoder at the acoustic level:
- Project terms recognized more accurately
- Names, acronyms, jargon decoded correctly
- Happens during decoding, not as post-processing cleanup
Only NVIDIA supports this. Whisper and Apple Speech use attention-based architectures that can't do direct decoder biasing. If you have lots of custom terminology, NVIDIA with a configured dictionary is worth trying.
When to use NVIDIA#
- Speed is priority
- You have a big personal dictionary
- You want GPU free for video/ML work
- You work in a top-25 language
Apple Speech — The native option#
macOS's built-in SpeechAnalyzer framework. Only on macOS 26 (Tahoe) and later. Zero setup.
How it works#
Uses the system speech recognition pipeline. No models to download. Apple handles everything.
When to use it#
- You're on macOS 26+
- You don't want to download models
- You're low on disk space
- Accuracy isn't critical
Actual benchmarks#
Accuracy (English)#
| Engine | Short Phrases | Long Dictation | Technical Terms | Code Terms |
|---|---|---|---|---|
| Whisper (Large V3 Turbo) | Excellent | Excellent | Very good | Good |
| NVIDIA (v3) | Excellent | Very good | Excellent* | Good |
| Apple Speech | Good | Good | Fair | Poor |
*With vocabulary boosting enabled and dictionary configured.
Speed (Apple M1 Pro, typical 10-second recording)#
| Engine | Model | Processing Time | Real-Time Factor |
|---|---|---|---|
| NVIDIA v3 | Default | ~0.8s | 0.08x |
| Whisper | Large V3 Turbo | ~2.5s | 0.25x |
| Whisper | Small | ~1.2s | 0.12x |
| Whisper | Large V3 | ~5.0s | 0.50x |
| Apple Speech | Built-in | ~1.5s | 0.15x |
NVIDIA is 2–3x faster than Whisper Large V3 Turbo for typical dictation. If speed is your priority and you don't need 99+ languages, NVIDIA is the clear winner.
Resource usage#
| Engine | Hardware | GPU Free? | ANE Free? | Battery |
|---|---|---|---|---|
| Whisper | Metal GPU | No | Yes | Moderate |
| NVIDIA | Neural Engine | Yes | No | Low |
| Apple Speech | System ML | Varies | Varies | Low |
NVIDIA leaves the GPU free, which matters if you're editing video or training models while dictating.
Which one?#
Default choice
Whisper Large V3 Turbo. Good accuracy, reasonable speed, 99+ languages. Start here.
Speed priority
NVIDIA. 2-3x faster, GPU stays free. Good if you work in English with lots of custom terms.
No setup
Apple Speech. No downloads, works immediately. Fine for casual use on macOS 26+.
Max accuracy
Whisper Large V3 (2.9GB). Most accurate. Use for file transcription or when errors cost you.
Switching engines#
Settings, Engine, pick one. No restart needed.
You can use different engines for different work:
- NVIDIA for live dictation (fastest)
- Whisper Large V3 for file transcription (most accurate)
- Apple Speech when battery is low
More: Offline Transcription, Best Whisper Model, Whisper vs NVIDIA. Features, pricing.
Ready to try voice dictation on your Mac?
Free download. No account required. 100% offline.
Download on the Mac App Store