Three transcription engines, zero cloud

    Whisper on Metal GPU, Parakeet on Neural Engine, or Apple Speech. Choose from 10+ model sizes to match your hardware and accuracy needs. Everything runs locally on your Mac.

    Transcription Backends

    Whisper

    Default

    The default backend. Best accuracy and broadest language coverage. Uses Apple Silicon Metal GPU acceleration for fast inference. Supports 10+ model sizes from Tiny (75MB) to Large V3 (2.9GB).

    Engine
    whisper.cpp (vendored C library)
    Hardware
    Metal GPU
    Languages
    99+ languages

    Parakeet

    Fastest

    Fastest backend on Apple Silicon. Runs on the dedicated Neural Engine, leaving GPU free for other tasks. Features CTC vocabulary boosting — your dictionary entries and prompt words bias the decoder at the acoustic level.

    Engine
    FluidAudio (CoreML)
    Hardware
    Apple Neural Engine
    Languages
    25 languages (v3), English-only (v2)

    Apple Speech

    Native

    Native Apple framework available on macOS Tahoe and later. Uses Apple's built-in speech recognition models with system-level optimization.

    Engine
    SpeechAnalyzer (macOS 26+)
    Hardware
    System ML
    Languages
    System languages

    Whisper Model Comparison

    All models are downloaded once and cached locally. Switch between them at any time.

    ModelSizeSpeedNotes
    Tiny75 MBFastestQuick, lower accuracy
    Base142 MBFastGood for simple dictation
    Small466 MBMediumBalanced
    Medium1.5 GBSlowHigh accuracy
    Large V32.9 GBSlowestMaximum accuracy
    Large V3 Turbo1.5 GBFast8x faster than Large V3
    Large V3 Turbo Q5Default547 MBFastDefault — best balance of speed, size, accuracy
    Large V3 Q51.1 GBMediumQuantized, smaller file
    Distil Large V3756 MBVery Fast6x faster than Large V3
    Distil Small (EN)166 MBVery FastEnglish only

    Engineering Under the Hood

    Model stays pre-loaded in memory for instant recording start with zero loading delay
    P-core thread pinning on Apple Silicon — only performance cores used, E-cores excluded to prevent straggler effects
    GPU warm-up on model load — silent transcription compiles Metal shaders so first real recording has no stall
    Deterministic greedy decoding (temperature=0.0) makes per-chunk latency predictable
    Memory safety check before model load — warns if insufficient system memory
    Hot-swap models without restarting the app

    Frequently Asked Questions

    Which transcription backend should I choose?

    Start with Whisper (the default). It offers the best accuracy and language coverage. If you want maximum speed and primarily dictate in English, try Parakeet. Apple Speech requires macOS Tahoe or later.

    Which Whisper model is best?

    The default Large V3 Turbo Q5 (547 MB) offers the best balance of speed, accuracy, and file size. Use smaller models (Tiny, Base) for speed on older hardware. Use Large V3 (2.9 GB) for maximum accuracy in challenging audio.

    Can I switch between backends without restarting?

    Yes. Whisperer supports hot-swapping — change your transcription backend or model size, and the new engine loads while the old one is released. No app restart needed.

    How much RAM do the models need?

    Models range from ~100MB (Tiny) to ~3GB (Large V3) of memory. Whisperer checks available system memory before loading and warns if insufficient. The default model (Large V3 Turbo Q5) uses about 600MB.

    On-device transcription, your choice of engine

    Download Whisperer free. All transcription engines run locally on your Mac.

    Ready to ditch typing?

    Join developers and power users who dictate faster than they type. One-time purchase. No subscription. No cloud.

    Free trial included. Pro Pack $14.99 lifetime.