My faucet was leaking. I messaged a repair guy on WhatsApp. He sent back a voice message: brand name, model number, discontinued parts, dimensions, labor cost. Two minutes of dense, technical Portuguese.

There was no way I could reliably capture all of that just by listening. So I turned on WhatsApp's built-in transcription.

It gave me this: "…infelizmente as torneiras da seleção importadas… não é _____ _____ dificulta… __ __ _____ _____…"

Not exactly useful.

The real problem

The bad transcript was frustrating. But the bigger issue was the design assumption behind it.

To use WhatsApp's transcription feature, I had to choose one transcript language upfront. Not per conversation. Not per message. One setting, applied everywhere.

I set it to Portuguese, because most voice messages I receive in Brazil are in Portuguese. But that is already the problem.

My life doesn't run on one language. I think in Japanese. I work in English. I live in Portuguese. I get by in French and Spanish. The product assumed I lived in one.

This isn't a niche problem

Millions of people live in a second language. Millions more work across languages every day. And this is not technically impossible.

ChatGPT can pick up English or German without asking me to change a setting. Granola handles mid-meeting language switches. Whisper transcribed the same WhatsApp audio accurately, with no language setting at all.

The technology exists. The design assumption does not match reality.

Where localization actually breaks

Multilingual support is not just about translation. It's about voice input, transcription, search, settings, and help docs — everywhere language meets real life.

If the AI speaks many languages but the microphone assumes one, the UX breaks right there.