Three engines, no cloud

Whisper on Metal GPU. NVIDIA on Neural Engine. Apple Speech. Pick from 10+ model sizes based on how much speed vs accuracy you need. All local.

Transcription Backends

Whisper

Default

The default. Best accuracy, widest language support. Runs on Metal GPU with 10+ model sizes from 75MB to 2.9GB.

Engine
Local Whisper
Hardware
Metal GPU
Languages
99+ languages

NVIDIA

Fastest

Fastest option. Runs on the Neural Engine so your GPU stays free. Supports CTC vocabulary boosting where your dictionary entries bias the acoustic decoder directly.

Engine
CoreML
Hardware
Apple Neural Engine
Languages
25 languages

Apple Speech

Native

Apple's native framework. Available on macOS Tahoe and later with system-level optimization.

Engine
SpeechAnalyzer (macOS 26+)
Hardware
System ML
Languages
System languages

Whisper Model Comparison

Download once, cached locally. Switch between models whenever you want.

ModelSizeSpeedNotes
Tiny75 MBFastestQuick, lower accuracy
Base142 MBFastGood for simple dictation
Small466 MBMediumBalanced
Medium1.5 GBSlowHigh accuracy
Large V32.9 GBSlowestMaximum accuracy
Large V3 Turbo1.5 GBFast8x faster than Large V3
Large V3 Turbo Q5Default547 MBFastDefault — best balance of speed, size, accuracy
Large V3 Q51.1 GBMediumQuantized, smaller file
Distil Large V3756 MBVery Fast6x faster than Large V3
Distil Small (EN)166 MBVery FastEnglish only

Engineering Under the Hood

Model stays loaded in memory. Recording starts instantly with no delay.
P-core pinning on Apple Silicon. Only performance cores run inference; E-cores stay out of the way.
GPU warm-up on load. A silent transcription compiles Metal shaders so your first real recording doesn't stall.
Greedy decoding at temperature 0.0. Per-chunk latency becomes predictable.
Memory check before loading. You get a warning if there's not enough RAM.
Hot-swap models without restarting.

Frequently Asked Questions

Which backend should I pick?

Start with Whisper. It's the default for a reason: best accuracy and language coverage. Want raw speed and mostly dictate in English? Try NVIDIA. Apple Speech needs macOS Tahoe or later.

Which Whisper model is best?

Large V3 Turbo Q5 (547 MB) hits the sweet spot for most people. Smaller models (Tiny, Base) run faster on older hardware. Large V3 (2.9 GB) gives you maximum accuracy when audio quality is rough.

Can I switch backends without restarting?

Yes. Change the backend or model size and the new one loads while the old one unloads. No restart.

How much RAM do models need?

Tiny uses ~100MB. Large V3 uses ~3GB. The default (Large V3 Turbo Q5) sits around 600MB. Whisperer checks memory before loading and warns you if there's not enough.

Pick your engine, run it locally

All three backends run on your Mac. No server, no subscription.

Try it.

Pay once. Keep it forever. Nothing goes to the cloud.

Free trial included. Pro Pack $14.99 lifetime.