voice whisperkit comparison privacy

Voice Notes on Mac: Siri Dictation vs WhisperKit vs Cloud Services

Compare voice-to-text options for Mac: Siri Dictation, WhisperKit (local), and cloud APIs. We test accuracy, privacy, speed, and offline support.

SlashNote Team

Sometimes the fastest way to capture an idea is to say it. But voice-to-text on Mac is not one-size-fits-all. Some options send your audio to the cloud. Others process everything on your device. The differences matter — especially for privacy and reliability.

Here is how the three main approaches compare.

The Three Approaches

FeatureSiri DictationWhisperKit (Local)Cloud APIs
ProcessingApple servers (partial on-device)100% on-deviceThird-party servers
Internet requiredYes (mostly)NoYes
PrivacyApple’s serversNothing leaves your MacVaries by provider
Languages60+100+100+
AccuracyVery goodVery goodExcellent
SpeedNear real-timeNear real-timeDepends on network
CostFree (built-in)Free (open-source)Pay per minute
Custom vocabularyLimitedNoSome providers
Works offlineLimited languagesYes, fullyNo
Used inAll macOS appsSlashNoteVarious apps

Siri Dictation: The Built-In Option

How it works

Siri Dictation is built into macOS. Press the dictation key (Fn twice or the microphone icon on the keyboard) and start speaking. The system transcribes your speech and types it wherever your cursor is.

On newer Macs with Apple Silicon, some processing happens on-device for supported languages. But for most use cases, audio is still sent to Apple’s servers for processing.

Accuracy

Siri Dictation has improved significantly over the years. For everyday speech in common languages (English, Spanish, Mandarin, etc.), accuracy is very good — typically 95%+ for clear speech.

It handles punctuation commands well: saying “period,” “comma,” “new line,” or “question mark” inserts the correct punctuation.

Strengths:

  • Excellent for common languages
  • Good punctuation handling
  • Continuous dictation (keeps listening)
  • Works in every text field on macOS

Weaknesses:

  • Technical jargon and proper nouns can be hit-or-miss
  • Background noise reduces accuracy more than local models
  • Occasional network lag causes delayed transcription

Privacy

Audio data goes to Apple’s servers for processing. Apple states it does not associate dictation data with your Apple ID after 6 months and deletes the data within 2 years.

With on-device dictation (Apple Silicon, supported languages), data stays local. But this isn’t available for all languages, and the system may still fall back to server processing.

Best for

  • Quick dictation in any app (emails, messages, documents)
  • Users who don’t want to install anything
  • Casual note-taking with punctuation commands

WhisperKit: 100% On-Device

How it works

WhisperKit is an open-source speech-to-text engine based on OpenAI’s Whisper model, optimized to run on Apple Neural Engine. It processes audio entirely on your Mac — no network connection needed, no data sent anywhere.

SlashNote uses WhisperKit for all voice features. You hold a modifier key (Cmd for raw transcription, Ctrl for AI-processed), speak, and release. The text appears in your note.

Accuracy

WhisperKit’s accuracy is comparable to Siri Dictation for most languages and often better for technical vocabulary. Because Whisper was trained on a massive multilingual dataset, it handles accents, code-switching (mixing languages), and domain-specific terms well.

Strengths:

  • Strong with technical jargon and mixed-language speech
  • Consistent accuracy regardless of network conditions
  • Handles accents and dialects well
  • Automatic language detection across 100+ languages

Weaknesses:

  • No real-time streaming (processes after you finish speaking)
  • No punctuation commands (relies on natural speech patterns)
  • First inference may take a moment as the model loads

Privacy

This is WhisperKit’s strongest feature. Zero audio data leaves your device. Ever.

  • No network requests during processing
  • No audio stored after transcription
  • No accounts, no API keys, no telemetry
  • Runs on Apple Neural Engine — fast, efficient, completely local

For anyone handling sensitive information — legal notes, medical dictation, private thoughts — this level of privacy is not available from cloud solutions.

Two modes in SlashNote

Voice Note (Hold Cmd): Raw transcription. You speak, WhisperKit converts to text, the text appears in your note. Simple, fast.

AI Voice Note (Hold Ctrl): You speak, WhisperKit converts to text, then AI processes the text into a structured note. Stream-of-consciousness input, organized output. The AI step uses your chosen provider (cloud or Ollama for fully local).

Best for

  • Privacy-sensitive voice notes
  • Offline use (airplane, spotty WiFi, no internet)
  • Technical dictation (code terms, product names)
  • Multilingual users who switch between languages

Cloud APIs: Maximum Power

How it works

Cloud speech-to-text APIs — OpenAI Whisper API, Google Cloud Speech-to-Text, Amazon Transcribe, Microsoft Azure Speech — send your audio to remote servers for processing. They return text with high accuracy and often include features like speaker diarization and custom vocabularies.

These APIs are typically used by apps rather than end users directly. If your note-taking app offers cloud-based voice input, it uses one of these services under the hood.

Accuracy

Cloud APIs generally offer the highest accuracy because they run the largest models on powerful server hardware.

OpenAI’s Whisper API and Google’s Speech-to-Text consistently top benchmarks:

  • 97%+ word error rate for clean speech
  • Strong performance on noisy audio
  • Excellent multilingual support
  • Speaker diarization (who said what)

Strengths:

  • Highest raw accuracy
  • Best handling of noisy environments
  • Speaker identification
  • Custom vocabulary support (some providers)
  • Real-time streaming capability

Weaknesses:

  • Requires internet connection
  • Audio sent to third-party servers
  • Cost per minute of audio
  • Latency depends on network conditions
  • Rate limits on API usage

Privacy

This is where cloud APIs fall short. Your audio — your actual voice — is sent to servers operated by OpenAI, Google, Amazon, or Microsoft.

Each provider’s privacy policy is different:

  • OpenAI Whisper API: Does not use API data for training by default. Audio is retained for 30 days for abuse monitoring.
  • Google Cloud Speech: Data processed and deleted. May be used for service improvement unless opted out.
  • Amazon Transcribe: Processed data may be used to improve the service. Can opt out.
  • Microsoft Azure: Data retention varies by configuration.

For non-sensitive content, this may be acceptable. For private notes, medical dictation, or legal work, cloud processing introduces unnecessary risk.

Cost

Cloud APIs charge per minute of audio:

  • OpenAI Whisper: ~$0.006 per minute
  • Google Speech-to-Text: ~$0.006-0.024 per minute (depending on model)
  • Amazon Transcribe: ~$0.024 per minute
  • Azure Speech: ~$0.016 per minute

For occasional use, costs are minimal. For heavy dictation (hours per day), they add up.

Best for

  • Meeting transcription with multiple speakers
  • Professional transcription services
  • Apps that need maximum accuracy in noisy environments
  • Use cases where privacy is not a primary concern

Head-to-Head Comparison

Accuracy test

For a simple test — reading a paragraph of mixed technical and everyday English in a quiet room — all three approaches score within 2-3% of each other. The differences emerge in edge cases:

ScenarioSiriWhisperKitCloud API
Quiet room, clear speechExcellentExcellentExcellent
Background noise (cafe)GoodGoodVery good
Technical jargon (programming)FairGoodVery good
Mixed languagesFairVery goodVery good
Heavy accentGoodGoodVery good
OfflineLimitedFull supportNot available

Speed test

MetricSiriWhisperKitCloud API
Start-to-text (10 sec audio)~1-2 sec~2-3 sec~2-5 sec
Start-to-text (60 sec audio)~2-3 sec~5-8 sec~5-10 sec
First-time loadInstant~3-5 secInstant
Requires internetMostly yesNoYes

Siri is fastest for short dictation because it streams in real-time. WhisperKit processes after you finish speaking but is consistent. Cloud APIs depend on network conditions.

Privacy scorecard

CriterionSiriWhisperKitCloud API
Audio stays on devicePartialAlwaysNever
No account requiredApple IDNo accountAPI key
Works without internetLimitedAlwaysNever
Provider can hear your audioYes (mostly)NeverYes
Data retentionUp to 2 yearsNone30 days+
Suitable for sensitive contentDependsYesNo

Which Should You Use?

Use Siri Dictation if:

  • You just need quick dictation in any app
  • You don’t want to install anything extra
  • Punctuation commands (“period”, “new paragraph”) are important to your workflow
  • You primarily use one language

Use WhisperKit (via SlashNote) if:

  • Privacy matters — no audio should leave your device
  • You work offline regularly or on unreliable networks
  • You dictate technical content (code, product names, jargon)
  • You switch between languages
  • You want voice input integrated with AI note processing

Use Cloud APIs if:

  • You need the absolute highest accuracy
  • You transcribe meetings with multiple speakers
  • You need custom vocabulary for specialized domains
  • Privacy is not a concern for the content being transcribed

The Bigger Picture

Voice input is becoming a standard feature in productivity tools. The question is not whether to use it, but how.

For most note-taking — capturing ideas, recording thoughts, quick reminders — any of these three approaches works. The difference is where your voice data goes.

If you value privacy and want voice notes that stay entirely on your Mac, WhisperKit through SlashNote gives you that with no setup, no accounts, and no compromises on accuracy.

Download SlashNote — voice notes that never leave your Mac

Download for Free

Available on the Mac App Store for macOS 15+

Download on the App Store