Setup & how it works
Just follow the first-run wizard. No fiddly configuration.
Download & open
Get the .dmg and move it to Applications. The wizard opens on first launch.
Grant permissions
Allow microphone and calendar. No screen-recording permission needed.
Prepare transcription (auto)
Python and ffmpeg are bundled. The first run prepares the transcription libraries automatically (internet required).
Recommended model auto-downloads
It downloads the model for your language with a progress bar. Ready right after.
Speaker labels (optional)
Want “who said what”? Add a free HuggingFace token (optional).
Choosing a model
Accuracy and speed come down to the model. Just pick the best one your Mac can run. Models download automatically on first use, and you can change them anytime under Settings → Transcription.
Short answer
Japanese-specialized — more accurate than large-v3, lighter, and runs on almost any Mac.
Accurate in both Japanese and English.
Top multilingual accuracy. Pick the variant for your Mac below.
The best model your Mac can run
Apple Silicon uses the GPU (mlx); Intel uses the CPU (faster). The Japanese Kotoba models run on CPU and are lightweight, so they work on nearly any Mac.
| Your Mac | Japanese | Multilingual |
|---|---|---|
| Apple Silicon · 16GB+ (M1 Pro/Max, M2/M3/M4, etc.) | Kotoba Whisper v2.0 | large-v3 (mlx / GPU) |
| Apple Silicon · 8GB (base M1/M2/M3) | Kotoba Whisper v2.0 | medium (mlx / GPU) |
| Intel · 16GB+ | Kotoba Whisper v2.0 | large-v3 (faster / CPU, slow) |
| Intel · 8GB | Kotoba Whisper v2.0 | medium (faster / CPU) |
| Lightweight / draft use | small | small |
When a meeting ends, transcription starts automatically. It can take a little while — that’s your Mac doing the work. Processing time depends on your Mac and the model you picked above (lighter models are faster; higher-accuracy ones take longer). You’ll get a notification when it’s ready.
All models
| Model | Languages | Accuracy | Speed | Size | Min RAM | Notes |
|---|---|---|---|---|---|---|
| Kotoba Whisper v2.0 | Japanese | ◎ | ◎ | ~1.5GB | 4GB+ | Japanese-specialized. More accurate & faster than large-v3 — best pick for Japanese. |
| Kotoba Whisper Bilingual v1.0 | Japanese · English | ◎ | ◎ | ~1.5GB | 4GB+ | Handles both Japanese and English. |
| Whisper large-v3 | Multilingual | ◎ | △ | ~3GB | 8GB+ (16GB+ recommended) | Top multilingual accuracy, but heavy. mlx on Apple Silicon, faster on Intel. |
| Whisper medium | Multilingual | ○ | ○ | ~1.5GB | 4GB+ | Balanced multilingual (default). |
| Whisper small | Multilingual | △ | ◎ | ~0.5GB | 2GB+ | Lightest & fastest. For low-spec Macs or drafts. |
| Distil-Whisper large-v3 | English | ◎ | ○ | ~1.5GB | 4GB+ | English-specialized. large-v3-class accuracy, fast & light. |
Accuracy and speed are relative guides (◎ > ○ > △). Accuracy is within the model’s target language; speed depends on your Mac, model size and engine (mlx / faster).
The app automatically warns you if a model won’t run on your Mac (not enough RAM/disk, or Apple Silicon-only). When in doubt: Kotoba Whisper v2.0 for Japanese, large-v3 for everything else.