If you’re transcribing Japanese meetings, the answer is simple: for Japanese-centric work, Kotoba Whisper v2.0 (light, fast, and strong in Japanese), and for multilingual needs, Whisper large-v3 (a top-tier general-purpose model, but heavier). With OffReco, you can run either one fully on-device. This article lays out how the two differ and how to compare them on accuracy, speed, and memory — with sources.
What’s actually different
In short, it’s the difference between a general-purpose multilingual model and a lightweight, Japanese-specialized one.
- Whisper large-v3: the top model in OpenAI’s Whisper series. A general-purpose model that covers many languages with a single network — highly accurate, but with a large parameter count, so it’s heavier to run.
- Kotoba Whisper v2.0: a Japanese-specialized model distilled from that large-v3. Per the HuggingFace model card, the student keeps large-v3’s full encoder, slims the decoder down to two layers, and is retrained on Japanese data (source: HuggingFace model card).
So Kotoba v2.0 sits where it “inherits large-v3’s Japanese ability while becoming lighter and faster.”
Accuracy, speed, and memory compared
Every figure below comes only from the HuggingFace model card.
| Aspect | Kotoba Whisper v2.0 | Whisper large-v3 |
|---|---|---|
| Type | Japanese-specialized (distilled from large-v3) | General-purpose multilingual |
| Japanese error rate | Reported on par with or better than large-v3 (e.g., CER 9.2 on CommonVoice 8 Japanese) | Highly accurate general-purpose model |
| Speed | Reported ~6.3x faster than large-v3 | Larger and heavier to run |
| Inference library | faster-whisper weights also provided | Runs on common Whisper implementations |
The key point: according to the model card, Kotoba v2.0 reaches a Japanese error rate (CER) on par with or better than large-v3 while being about 6.3x faster (for example, CER 9.2 on CommonVoice 8 Japanese). It also ships weights for the high-speed faster-whisper library. Because large-v3 is larger, it tends to be heavier to run and to need more memory — a qualitative trend that’s part of why a lighter distilled model exists in the first place.
Note we deliberately don’t add any figures beyond these. Real-world accuracy and speed vary with your recording environment and hardware, so the surest test is your own meetings.
Which should you choose?
It comes down to your use case.
- Mostly Japanese meetings: Kotoba Whisper v2.0 is the strong default. It’s accurate in Japanese yet light and fast, and tends to run at practical speeds on many Macs without a dedicated GPU.
- You need multilingual (English or other languages mixed in): Whisper large-v3 fits, covering a wide range of languages with one model.
- Limited hardware / you just want it fast: the lightweight Kotoba v2.0 is the safe pick. With faster-whisper weights available, it’s easy to get usable speed.
If you’re unsure, a practical order is to start with Kotoba v2.0 for Japanese meeting notes and switch to large-v3 once multilingual needs come up.
How this works in OffReco
What sets OffReco apart is that it runs both models fully on-device, and lets you choose.
- Both run locally: Apple Silicon infers on the GPU (mlx), Intel on the CPU (faster-whisper). You can switch models in the setup screen (how to choose a model).
- Audio and transcript stay on your Mac: recording, transcription, and speaker separation all happen on the machine, so neither the audio nor the transcript leaves it. Transcription works in airplane mode (only the first-run model download needs a connection).
- Fully automatic, low barrier: it auto-detects meetings and transcribes when you stop recording. The first month is free, then ¥200/month or ¥2,000/year, on macOS 14.2 or later.
Kotoba v2.0 when you’re Japanese-centric and want light and fast; large-v3 when you need multilingual. Either way you can use it without uploading to the cloud, so download it and compare on your own meetings. Related reading: Choosing a transcription app that’s strong in Japanese (what Kotoba Whisper is) and running Whisper locally with no setup.