Whisperer runs three different transcription engines. They use different hardware and have different strengths. Here's how to pick.

Overview#

	Whisper	NVIDIA	Apple Speech
Library	Local Whisper	CoreML	SpeechAnalyzer
Hardware	Metal GPU	Apple Neural Engine	System ML
Languages	99+	25	System languages
Models	10+ (75MB–2.9GB)	Optimized	Built-in
Accuracy	Highest	Very high	Good
Speed	Fast (GPU)	Fastest (ANE)	Fast
Min macOS	14+	14+ (Apple Silicon)	26+ (Tahoe)

Whisper — The Default#

OpenAI's open-source model, running locally. No Python, no dependencies.

How it works#

Processes audio after you release the record button. Runs on Apple Silicon's Metal GPU.

Model sizes#

Model	Size	Speed	Accuracy	Best For
Tiny	75MB	Fastest	Basic	Quick testing
Base	142MB	Very fast	Good	Low-storage devices
Small	466MB	Fast	High	Daily dictation
Medium	1.5GB	Moderate	Very high	Accuracy-focused
Large V3	2.9GB	Slower	Highest	Maximum accuracy
Large V3 Turbo	~1.5GB	Fast	Very high	Best balanced

Tip

Start here: Large V3 Turbo (~1.5GB). Near-Large accuracy, much faster. Good default.

When to use it#

High accuracy matters
Non-English (99+ languages)
You want control over the speed/accuracy tradeoff
File transcription

NVIDIA — The fast one#

CTC-based speech recognition running on Apple Silicon's Neural Engine via CoreML. Fastest option, and has a feature the others don't: vocabulary boosting.

How it works#

NVIDIA uses the Neural Engine, which is separate from the GPU. So it can run alongside GPU work with no contention. The same chip powers Whisperer's live preview (~300ms latency).

Vocabulary boosting#

This is why NVIDIA matters. Your personal dictionary entries bias the CTC decoder at the acoustic level:

Project terms recognized more accurately
Names, acronyms, jargon decoded correctly
Happens during decoding, not as post-processing cleanup

Info

Only NVIDIA supports this. Whisper and Apple Speech use attention-based architectures that can't do direct decoder biasing. If you have lots of custom terminology, NVIDIA with a configured dictionary is worth trying.

When to use NVIDIA#

Speed is priority
You have a big personal dictionary
You want GPU free for video/ML work
You work in a top-25 language

Apple Speech — The native option#

macOS's built-in SpeechAnalyzer framework. Only on macOS 26 (Tahoe) and later. Zero setup.

How it works#

Uses the system speech recognition pipeline. No models to download. Apple handles everything.

When to use it#

You're on macOS 26+
You don't want to download models
You're low on disk space
Accuracy isn't critical

Actual benchmarks#

Accuracy (English)#

Engine	Short Phrases	Long Dictation	Technical Terms	Code Terms
Whisper (Large V3 Turbo)	Excellent	Excellent	Very good	Good
NVIDIA (v3)	Excellent	Very good	Excellent*	Good
Apple Speech	Good	Good	Fair	Poor

*With vocabulary boosting enabled and dictionary configured.

Speed (Apple M1 Pro, typical 10-second recording)#

Engine	Model	Processing Time	Real-Time Factor
NVIDIA v3	Default	~0.8s	0.08x
Whisper	Large V3 Turbo	~2.5s	0.25x
Whisper	Small	~1.2s	0.12x
Whisper	Large V3	~5.0s	0.50x
Apple Speech	Built-in	~1.5s	0.15x

Tip

NVIDIA is 2–3x faster than Whisper Large V3 Turbo for typical dictation. If speed is your priority and you don't need 99+ languages, NVIDIA is the clear winner.

Resource usage#

Engine	Hardware	GPU Free?	ANE Free?	Battery
Whisper	Metal GPU	No	Yes	Moderate
NVIDIA	Neural Engine	Yes	No	Low
Apple Speech	System ML	Varies	Varies	Low

NVIDIA leaves the GPU free, which matters if you're editing video or training models while dictating.

Which one?#

Default choice

Whisper Large V3 Turbo. Good accuracy, reasonable speed, 99+ languages. Start here.

Speed priority

NVIDIA. 2-3x faster, GPU stays free. Good if you work in English with lots of custom terms.

No setup

Apple Speech. No downloads, works immediately. Fine for casual use on macOS 26+.

Max accuracy

Whisper Large V3 (2.9GB). Most accurate. Use for file transcription or when errors cost you.

Switching engines#

Settings, Engine, pick one. No restart needed.

You can use different engines for different work:

NVIDIA for live dictation (fastest)
Whisper Large V3 for file transcription (most accurate)
Apple Speech when battery is low

More: Offline Transcription, Best Whisper Model, Whisper vs NVIDIA. Features, pricing.

Ready to try voice dictation on your Mac?

Free download. No account required. 100% offline.

Download on the Mac App Store

Whisper vs NVIDIA vs Apple Speech: Which Transcription Engine to Use

Overview#

Whisper — The Default#

How it works#

Model sizes#

When to use it#

NVIDIA — The fast one#

How it works#

Vocabulary boosting#

When to use NVIDIA#

Apple Speech — The native option#

How it works#

When to use it#

Actual benchmarks#

Accuracy (English)#

Speed (Apple M1 Pro, typical 10-second recording)#

Resource usage#

Which one?#

Default choice

Speed priority

No setup

Max accuracy

Switching engines#

Related articles

Whisper vs. NVIDIA vs. Apple Speech — Which Transcription Backend Is Best?

Try it.