If you’re transcribing Japanese meetings, the answer is simple: for Japanese-centric work, Kotoba Whisper v2.0 (light, fast, and strong in Japanese), and for multilingual needs, Whisper large-v3 (a top-tier general-purpose model, but heavier). With OffReco, you can run either one fully on-device. This article lays out how the two differ and how to compare them on accuracy, speed, and memory — with sources.

What’s actually different

In short, it’s the difference between a general-purpose multilingual model and a lightweight, Japanese-specialized one.

Whisper large-v3: the top model in OpenAI’s Whisper series. A general-purpose model that covers many languages with a single network — highly accurate, but with a large parameter count, so it’s heavier to run.
Kotoba Whisper v2.0: a Japanese-specialized model distilled from that large-v3. Per the HuggingFace model card, the student keeps large-v3’s full encoder, slims the decoder down to two layers, and is retrained on Japanese data (source: HuggingFace model card).

So Kotoba v2.0 sits where it “inherits large-v3’s Japanese ability while becoming lighter and faster.”

Accuracy, speed, and memory compared

Every figure below comes only from the HuggingFace model card.

Aspect	Kotoba Whisper v2.0	Whisper large-v3
Type	Japanese-specialized (distilled from large-v3)	General-purpose multilingual
Japanese error rate	Reported on par with or better than large-v3 (e.g., CER 9.2 on CommonVoice 8 Japanese)	Highly accurate general-purpose model
Speed	Reported ~6.3x faster than large-v3	Larger and heavier to run
Inference library	faster-whisper weights also provided	Runs on common Whisper implementations

The key point: according to the model card, Kotoba v2.0 reaches a Japanese error rate (CER) on par with or better than large-v3 while being about 6.3x faster (for example, CER 9.2 on CommonVoice 8 Japanese). It also ships weights for the high-speed faster-whisper library. Because large-v3 is larger, it tends to be heavier to run and to need more memory — a qualitative trend that’s part of why a lighter distilled model exists in the first place.

Note we deliberately don’t add any figures beyond these. Real-world accuracy and speed vary with your recording environment and hardware, so the surest test is your own meetings.

Which should you choose?

It comes down to your use case.

Mostly Japanese meetings: Kotoba Whisper v2.0 is the strong default. It’s accurate in Japanese yet light and fast, and tends to run at practical speeds on many Macs without a dedicated GPU.
You need multilingual (English or other languages mixed in): Whisper large-v3 fits, covering a wide range of languages with one model.
Limited hardware / you just want it fast: the lightweight Kotoba v2.0 is the safe pick. With faster-whisper weights available, it’s easy to get usable speed.

If you’re unsure, a practical order is to start with Kotoba v2.0 for Japanese meeting notes and switch to large-v3 once multilingual needs come up.

How this works in OffReco

What sets OffReco apart is that it runs both models fully on-device, and lets you choose.

Both run locally: Apple Silicon infers on the GPU (mlx), Intel on the CPU (faster-whisper). You can switch models in the setup screen (how to choose a model).
Audio and transcript stay on your Mac: recording, transcription, and speaker separation all happen on the machine, so neither the audio nor the transcript leaves it. Transcription works in airplane mode (only the first-run model download needs a connection).
Fully automatic, low barrier: it auto-detects meetings and transcribes when you stop recording. The first month is free, then ¥200/month or ¥2,000/year, on macOS 14.2 or later.

Kotoba v2.0 when you’re Japanese-centric and want light and fast; large-v3 when you need multilingual. Either way you can use it without uploading to the cloud, so download it and compare on your own meetings. Related reading: Choosing a transcription app that’s strong in Japanese (what Kotoba Whisper is) and running Whisper locally with no setup.

Kotoba Whisper v2.0 vs Whisper large-v3: Japanese accuracy and speed compared

What’s actually different

Accuracy, speed, and memory compared

Which should you choose?

How this works in OffReco