March 28, 20262 min read

Whisper vs. NVIDIA vs. Apple Speech — Which Transcription Backend Is Best?

Whisperer supports three transcription backends. Each runs on different hardware and has different tradeoffs. Here's the quick comparison.

At a glance#

WhisperNVIDIAApple Speech
Runs onMetal GPUNeural EngineSystem ML
Languages99+25Whatever Apple supports
SpeedFastFastestDepends
Best forAccuracy, language coverageRaw speedZero setup

Whisper — The Default#

The default. Runs on Apple Silicon's Metal GPU.

What's good:

  • 99+ languages
  • 10+ model sizes (75 MB to 2.9 GB)
  • Large V3 Turbo Q5 (547 MB) is a solid default: fast, accurate, reasonable size
  • Quantized models (Q5) cut file size by ~60% without much accuracy loss

When to use: Start here. Works well for most people.

Recommended model: Large V3 Turbo Q5 (547 MB). Fast, accurate, not huge.

NVIDIA — The Fast One#

NVIDIA runs on the Neural Engine (ANE) via CoreML. Separate chip from the GPU, so Metal stays free for other work.

What's good:

  • Fastest option
  • CTC vocabulary boosting: your dictionary entries bias the decoder directly
  • Leaves GPU available for video editing, ML, etc.

Limitations:

  • 25 languages (not 99+)
  • Fewer model options

When to use: You want speed, you work in a supported language, and you have specialized terminology that benefits from vocabulary boosting.

Apple Speech — The Native Option#

macOS 26+ only. Uses Apple's built-in SpeechAnalyzer framework. No models to download.

When to use: You're on macOS 26, want zero setup, or need a lightweight option.

How they work together#

All three share the same interface:

  • Switch engines without restarting
  • Dictionary, filler removal, AI post-processing work identically
  • Same 16kHz mono Float32 audio input

Live Preview#

The live preview uses a separate lightweight model on the Neural Engine. So if you're using Whisper (GPU), preview runs on ANE with no contention. If you're using NVIDIA (ANE), the EOU model shares the chip but it's optimized to stay out of the way.

Recommendation#

Start with Whisper (Large V3 Turbo Q5). If you need more speed and work in a top-25 language, try NVIDIA. You can always switch back.

More details: best Whisper model for dictation, offline transcription engines.

Related articles

Try it.

Pay once. Keep it forever. Nothing goes to the cloud.

Free trial included. Pro Pack $14.99 lifetime.