Voice messages

Posted on Sun 05 December 2021 in blog • 4 min read

As of late, I’ve noticed that when people share one of my articles on asynchronous communications on Twitter (particularly any from the Getting out of Meeting Hell series, or the one on meetings that should have been an email), there’s a reply from a brand account that likes to plug/advertise their service. That service recommends that synchronous meetings be replaced by “asynchronous meetings” based on voice messages.

I’d like to point out that I consider that an utterly terrible idea.

Let me explain why.

Voice is slow

First, voice messages suffer from the exact same drawback that meetings do: they are incredibly slow. Most of us speak at a rate of approximately 4 syllables per second.¹ In English, that translates to about 120-140 words per minute. That means that as a listener, you absorb the content of a voice message at the same rate. You might make that a little more efficient by increasing playback speed, but that’s only feasible to about a 25% speed increase, which lands you around 150 words per minute.

In contrast, unless you are dyslexic (I’ll get to that in a bit) you can read at 240 words per minute.

In other words, conveying a certain amount of information by voice takes nearly twice as long as doing the same in writing. And that’s if your verbal expression is perfect, which it never is — any voice message will come with its fair share of filler words (“uh”, “um”, “y’know”) and incomplete sentences.

Add to that the occasional slurred word or phrase, or idioms unfamiliar to the recipient of the message. If you come across something that’s unclear while reading, it takes you fractions of a second to re-read a sentence, and maybe a few seconds to re-read from the beginning of a paragraph. But in the listening case, it may take you upward of a minute to go back and re-listen to a passage you missed or didn’t understand. (Anyone who both reads books and listens to audiobooks will relate to this.)

Voice is more difficult to follow and retrieve

Secondly, voice messages are usually much more difficult to understand for a recipient listening in their second or third language, particularly if the other person is a native speaker using an accent that is unfamiliar to them — say, a French person listening to heavily Scots-accented English or a pronounced Australian twang. Written messages might still have their ambiguities — as an example, the word “doubt” meaning “question”, a common substitution in Indian English, frequently confuses non-Indian English speakers — but those are far fewer and easier to resolve for a reader.

Furthermore, until speech recognition is perfect and automatic transcription is thus exquisitely faithful, your voice messages aren’t searchable. You could say that they’re practically half-off-the-record. Good luck trying to come back to an important bit of information that someone conveyed via a voice message that you have only a vague recollection of. Or, worse, trying to establish the context in which a decision was made, and having to piece it together from multiple voice messages.

Voice doesn’t convey as much nuance as you think

Thirdly, the notion that you ought to be using voice messages to add “nuance” when you can’t convey such nuance in your writing strikes me as patently ludicrous.

When you need to convey emotion or feeling or nuance to a greater extent than you would be able to in writing, that is absolutely a situation in which you should meet with a person face-to-face, one-on-one, and where that doesn’t permit itself, get on a video call. At that point, when a written message won’t cut it, a voice message absolutely won’t.

One good use?

Now, there may be one useful use of voice messages that I can think of: they may work for people with dyslexia, for whom consuming a lot of writing may cause cognitive overload. In that case, voice messages might be a workable alternative. If so, then that would make the option to communicate via voice message a very valid accessibility consideration. That said, I’ve talked to people who are dyslexic and who said that voice messages are not an adequate substitute for interactive verbal communication to them — but that’s of course highly anecdotal, and shouldn’t dismiss the idea outright.

However: even if voice messages are a good thing for people with dyslexia, though, I think that as screen readers continuously improve, generating speech from text may a be preferable option. That’s because it retains the searchability advantage for everyone, and the efficiency advantage for non-dyslexics, while also accomodating people with dyslexia. But, I want to re-emphasize that that’s just a hunch, and I may well be completely wrong. If you’re dyslexic and have thoughts on this, I’d love to hear from you! Please find me on Twitter or Mastodon.

Fun useless fact: the rate of 4 syllables per second is practically universal across spoken languages. How many words a native speaker of a particular language speaks in a minute depends largely on the average number of syllables per word in that language. ↩