Setup & how it works

Just follow the first-run wizard. No fiddly configuration.

Download & open

Get the .dmg and move it to Applications. The wizard opens on first launch.

Grant permissions

Allow microphone and calendar. No screen-recording permission needed.

Prepare transcription (auto)

Python and ffmpeg are bundled. The first run prepares the transcription libraries automatically (internet required).

Recommended model auto-downloads

It downloads the model for your language with a progress bar. Ready right after.

Speaker labels (optional)

Want “who said what”? Add a free HuggingFace token (optional).

First-run wizard

Onboarding wizard (real UI)

How to get the speaker-label token Browse the FAQ

Choosing a model

Accuracy and speed come down to the model. Just pick the best one your Mac can run. Models download automatically on first use, and you can change them anytime under Settings → Transcription.

Short answer

Japanese meetings

Kotoba Whisper v2.0

Japanese-specialized — more accurate than large-v3, lighter, and runs on almost any Mac.

Mixed Japanese + English

Kotoba Whisper Bilingual v1.0

Accurate in both Japanese and English.

Other languages / multilingual

Whisper large-v3

Top multilingual accuracy. Pick the variant for your Mac below.

The best model your Mac can run

Apple Silicon uses the GPU (mlx); Intel uses the CPU (faster). The Japanese Kotoba models run on CPU and are lightweight, so they work on nearly any Mac.

Your Mac	Japanese	Multilingual
Apple Silicon · 16GB+ (M1 Pro/Max, M2/M3/M4, etc.)	Kotoba Whisper v2.0	large-v3 (mlx / GPU)
Apple Silicon · 8GB (base M1/M2/M3)	Kotoba Whisper v2.0	medium (mlx / GPU)
Intel · 16GB+	Kotoba Whisper v2.0	large-v3 (faster / CPU, slow)
Intel · 8GB	Kotoba Whisper v2.0	medium (faster / CPU)
Lightweight / draft use	small	small

When a meeting ends, transcription starts automatically. It can take a little while — that’s your Mac doing the work. Processing time depends on your Mac and the model you picked above (lighter models are faster; higher-accuracy ones take longer). You’ll get a notification when it’s ready.

All models

Model	Languages	Accuracy	Speed	Size	Min RAM	Notes
Kotoba Whisper v2.0	Japanese	◎	◎	~1.5GB	4GB+	Japanese-specialized. More accurate & faster than large-v3 — best pick for Japanese.
Kotoba Whisper Bilingual v1.0	Japanese · English	◎	◎	~1.5GB	4GB+	Handles both Japanese and English.
Whisper large-v3	Multilingual	◎	△	~3GB	8GB+ (16GB+ recommended)	Top multilingual accuracy, but heavy. mlx on Apple Silicon, faster on Intel.
Whisper medium	Multilingual	○	○	~1.5GB	4GB+	Balanced multilingual (default).
Whisper small	Multilingual	△	◎	~0.5GB	2GB+	Lightest & fastest. For low-spec Macs or drafts.
Distil-Whisper large-v3	English	◎	○	~1.5GB	4GB+	English-specialized. large-v3-class accuracy, fast & light.

Accuracy and speed are relative guides (◎ > ○ > △). Accuracy is within the model’s target language; speed depends on your Mac, model size and engine (mlx / faster).

The app automatically warns you if a model won’t run on your Mac (not enough RAM/disk, or Apple Silicon-only). When in doubt: Kotoba Whisper v2.0 for Japanese, large-v3 for everything else.

Try it free

First month free. Install and follow the first-run wizard — that’s it.

Download .dmg