March 28, 20265 min read

Whisper vs NVIDIA vs Apple Speech: Which Transcription Engine to Use

Whisperer runs three different transcription engines. They use different hardware and have different strengths. Here's how to pick.

Overview#

WhisperNVIDIAApple Speech
LibraryLocal WhisperCoreMLSpeechAnalyzer
HardwareMetal GPUApple Neural EngineSystem ML
Languages99+25System languages
Models10+ (75MB–2.9GB)OptimizedBuilt-in
AccuracyHighestVery highGood
SpeedFast (GPU)Fastest (ANE)Fast
Min macOS14+14+ (Apple Silicon)26+ (Tahoe)

Whisper — The Default#

OpenAI's open-source model, running locally. No Python, no dependencies.

How it works#

Processes audio after you release the record button. Runs on Apple Silicon's Metal GPU.

Model sizes#

ModelSizeSpeedAccuracyBest For
Tiny75MBFastestBasicQuick testing
Base142MBVery fastGoodLow-storage devices
Small466MBFastHighDaily dictation
Medium1.5GBModerateVery highAccuracy-focused
Large V32.9GBSlowerHighestMaximum accuracy
Large V3 Turbo~1.5GBFastVery highBest balanced
Tip

Start here: Large V3 Turbo (~1.5GB). Near-Large accuracy, much faster. Good default.

When to use it#

  • High accuracy matters
  • Non-English (99+ languages)
  • You want control over the speed/accuracy tradeoff
  • File transcription

NVIDIA — The fast one#

CTC-based speech recognition running on Apple Silicon's Neural Engine via CoreML. Fastest option, and has a feature the others don't: vocabulary boosting.

How it works#

NVIDIA uses the Neural Engine, which is separate from the GPU. So it can run alongside GPU work with no contention. The same chip powers Whisperer's live preview (~300ms latency).

Vocabulary boosting#

This is why NVIDIA matters. Your personal dictionary entries bias the CTC decoder at the acoustic level:

  • Project terms recognized more accurately
  • Names, acronyms, jargon decoded correctly
  • Happens during decoding, not as post-processing cleanup
Info

Only NVIDIA supports this. Whisper and Apple Speech use attention-based architectures that can't do direct decoder biasing. If you have lots of custom terminology, NVIDIA with a configured dictionary is worth trying.

When to use NVIDIA#

  • Speed is priority
  • You have a big personal dictionary
  • You want GPU free for video/ML work
  • You work in a top-25 language

Apple Speech — The native option#

macOS's built-in SpeechAnalyzer framework. Only on macOS 26 (Tahoe) and later. Zero setup.

How it works#

Uses the system speech recognition pipeline. No models to download. Apple handles everything.

When to use it#

  • You're on macOS 26+
  • You don't want to download models
  • You're low on disk space
  • Accuracy isn't critical

Actual benchmarks#

Accuracy (English)#

EngineShort PhrasesLong DictationTechnical TermsCode Terms
Whisper (Large V3 Turbo)ExcellentExcellentVery goodGood
NVIDIA (v3)ExcellentVery goodExcellent*Good
Apple SpeechGoodGoodFairPoor

*With vocabulary boosting enabled and dictionary configured.

Speed (Apple M1 Pro, typical 10-second recording)#

EngineModelProcessing TimeReal-Time Factor
NVIDIA v3Default~0.8s0.08x
WhisperLarge V3 Turbo~2.5s0.25x
WhisperSmall~1.2s0.12x
WhisperLarge V3~5.0s0.50x
Apple SpeechBuilt-in~1.5s0.15x
Tip

NVIDIA is 2–3x faster than Whisper Large V3 Turbo for typical dictation. If speed is your priority and you don't need 99+ languages, NVIDIA is the clear winner.

Resource usage#

EngineHardwareGPU Free?ANE Free?Battery
WhisperMetal GPUNoYesModerate
NVIDIANeural EngineYesNoLow
Apple SpeechSystem MLVariesVariesLow

NVIDIA leaves the GPU free, which matters if you're editing video or training models while dictating.

Which one?#

Default choice

Whisper Large V3 Turbo. Good accuracy, reasonable speed, 99+ languages. Start here.

Speed priority

NVIDIA. 2-3x faster, GPU stays free. Good if you work in English with lots of custom terms.

No setup

Apple Speech. No downloads, works immediately. Fine for casual use on macOS 26+.

Max accuracy

Whisper Large V3 (2.9GB). Most accurate. Use for file transcription or when errors cost you.

Switching engines#

Settings, Engine, pick one. No restart needed.

You can use different engines for different work:

  • NVIDIA for live dictation (fastest)
  • Whisper Large V3 for file transcription (most accurate)
  • Apple Speech when battery is low

More: Offline Transcription, Best Whisper Model, Whisper vs NVIDIA. Features, pricing.

Ready to try voice dictation on your Mac?

Free download. No account required. 100% offline.

Download on the Mac App Store

Related articles

Try it.

Pay once. Keep it forever. Nothing goes to the cloud.

Free trial included. Pro Pack $14.99 lifetime.