Whisperer supports three transcription backends. Each runs on different hardware and has different tradeoffs. Here's the quick comparison.

At a glance#

	Whisper	NVIDIA	Apple Speech
Runs on	Metal GPU	Neural Engine	System ML
Languages	99+	25	Whatever Apple supports
Speed	Fast	Fastest	Depends
Best for	Accuracy, language coverage	Raw speed	Zero setup

Whisper — The Default#

The default. Runs on Apple Silicon's Metal GPU.

What's good:

99+ languages
10+ model sizes (75 MB to 2.9 GB)
Large V3 Turbo Q5 (547 MB) is a solid default: fast, accurate, reasonable size
Quantized models (Q5) cut file size by ~60% without much accuracy loss

When to use: Start here. Works well for most people.

Recommended model: Large V3 Turbo Q5 (547 MB). Fast, accurate, not huge.

NVIDIA — The Fast One#

NVIDIA runs on the Neural Engine (ANE) via CoreML. Separate chip from the GPU, so Metal stays free for other work.

What's good:

Fastest option
CTC vocabulary boosting: your dictionary entries bias the decoder directly
Leaves GPU available for video editing, ML, etc.

Limitations:

25 languages (not 99+)
Fewer model options

When to use: You want speed, you work in a supported language, and you have specialized terminology that benefits from vocabulary boosting.

Apple Speech — The Native Option#

macOS 26+ only. Uses Apple's built-in SpeechAnalyzer framework. No models to download.

When to use: You're on macOS 26, want zero setup, or need a lightweight option.

How they work together#

All three share the same interface:

Switch engines without restarting
Dictionary, filler removal, AI post-processing work identically
Same 16kHz mono Float32 audio input

Live Preview#

The live preview uses a separate lightweight model on the Neural Engine. So if you're using Whisper (GPU), preview runs on ANE with no contention. If you're using NVIDIA (ANE), the EOU model shares the chip but it's optimized to stay out of the way.

Recommendation#

Start with Whisper (Large V3 Turbo Q5). If you need more speed and work in a top-25 language, try NVIDIA. You can always switch back.

More details: best Whisper model for dictation, offline transcription engines.

Whisper vs. NVIDIA vs. Apple Speech — Which Transcription Backend Is Best?