Three engines, no cloud

Whisper on Metal GPU. NVIDIA on Neural Engine. Apple Speech. Pick from 10+ model sizes based on how much speed vs accuracy you need. All local.

Transcription Backends

Whisper

Default

The default. Best accuracy, widest language support. Runs on Metal GPU with 10+ model sizes from 75MB to 2.9GB.

Engine

Local Whisper

Hardware

Metal GPU

Languages

99+ languages

NVIDIA

Fastest

Fastest option. Runs on the Neural Engine so your GPU stays free. Supports CTC vocabulary boosting where your dictionary entries bias the acoustic decoder directly.

Engine

CoreML

Hardware

Apple Neural Engine

Languages

25 languages

Apple Speech

Native

Apple's native framework. Available on macOS Tahoe and later with system-level optimization.

Engine

SpeechAnalyzer (macOS 26+)

Hardware

System ML

Languages

System languages

Whisper Model Comparison

Download once, cached locally. Switch between models whenever you want.

Model	Size	Speed	Notes
Tiny	75 MB	Fastest	Quick, lower accuracy
Base	142 MB	Fast	Good for simple dictation
Small	466 MB	Medium	Balanced
Medium	1.5 GB	Slow	High accuracy
Large V3	2.9 GB	Slowest	Maximum accuracy
Large V3 Turbo	1.5 GB	Fast	8x faster than Large V3
Large V3 Turbo Q5Default	547 MB	Fast	Default — best balance of speed, size, accuracy
Large V3 Q5	1.1 GB	Medium	Quantized, smaller file
Distil Large V3	756 MB	Very Fast	6x faster than Large V3
Distil Small (EN)	166 MB	Very Fast	English only

Engineering Under the Hood

Model stays loaded in memory. Recording starts instantly with no delay.

P-core pinning on Apple Silicon. Only performance cores run inference; E-cores stay out of the way.

GPU warm-up on load. A silent transcription compiles Metal shaders so your first real recording doesn't stall.

Greedy decoding at temperature 0.0. Per-chunk latency becomes predictable.

Memory check before loading. You get a warning if there's not enough RAM.

Hot-swap models without restarting.

Frequently Asked Questions

Which backend should I pick?

Start with Whisper. It's the default for a reason: best accuracy and language coverage. Want raw speed and mostly dictate in English? Try NVIDIA. Apple Speech needs macOS Tahoe or later.

Which Whisper model is best?

Large V3 Turbo Q5 (547 MB) hits the sweet spot for most people. Smaller models (Tiny, Base) run faster on older hardware. Large V3 (2.9 GB) gives you maximum accuracy when audio quality is rough.

Can I switch backends without restarting?

Yes. Change the backend or model size and the new one loads while the old one unloads. No restart.

How much RAM do models need?

Tiny uses ~100MB. Large V3 uses ~3GB. The default (Large V3 Turbo Q5) sits around 600MB. Whisperer checks memory before loading and warns you if there's not enough.

Best Whisper Model for Dictation

Whisper vs NVIDIA vs Apple Speech

Live Preview Engine

100+ Languages

Pick your engine, run it locally

All three backends run on your Mac. No server, no subscription.

Try it.

Pay once. Keep it forever. Nothing goes to the cloud.

Free trial included. Pro Pack $14.99 lifetime.

Three engines, no cloud

Transcription Backends

Whisper

NVIDIA

Apple Speech

Whisper Model Comparison

Engineering Under the Hood

Frequently Asked Questions

Which backend should I pick?

Which Whisper model is best?

Can I switch backends without restarting?

How much RAM do models need?

Related

Pick your engine, run it locally

Try it.