Siri, Alexa, Assistant, Cortana: your voice, their control, who wins?
In March 2017, the entrepreneur Elon Musk announced investment in Neuralink, a company exploring brain implants that will allow humans to communicate more directly with artificial intelligence. This needs to happen, he says, in order for the two to co-exist successfully. This is Musk at his most typically ambitious: any such technology is decades away.
Some of the tech industry’s other great minds and biggest cash reserves are being deployed on an existing problem of mankind-machine communication: voice recognition. On smartphones, in cars and in homes, the technology is widely available to consumers, but only a third of smartphone users regularly instruct their devices via speech.
However, usage is increasing at a great rate, not only for engaging the help of digital assistants, but also in the field of security. Using your voice instead of other biometric measures or passwords might one day become the safest form of online identification.
Microsoft has a speech-recognition system that can understand random conversations with the same error rate as a human transcribing them
The rewards of success in the voice tech sector could be immense. According to Grand View Research, a US consultancy, the global voice recognition market will be worth $127.58bn (£101bn) by 2024. The US firm Nuance Communications, one of the world’s leading voice tech companies, announced Q1 2017 revenue of $487.7m (£388.9m).
Tech giants such as Google, Amazon, Apple and Microsoft have introduced increasingly sophisticated software and home assistant products that use AI and speech recognition technology to decode voices, cut out the keyboard and, they hope, create a seamless understanding between mankind and machine.
Getting voice recognition right is one of AI’s thorniest problems. It is a deep learning challenge that entails building software to listen to, process and decode language, itself one of the human brain’s most sophisticated functions. Beyond the ability to recognise successfully a word in a language that could, thanks to regional accents and dialects, be pronounced dozes of different ways, there are also issues of background noise, of context and of multiple voices in a conversation. Then factor in the number of languages. The BBC World Service broadcasts in 40 languages; of approximately 6,500 living languages, about 100 are spoken by 7m people or more.
In October 2016, Microsoft announced that it has a speech-recognition system that can understand recorded phone conversations on random subjects with the same error rate (5.9 per cent) as a human transcribing them. It was only able to do so using six neural networks, computer systems that mimic the brain. So six computer brains are currently on a par with one human’s: there’s some way to go before a one-on-one match-up.
In the home, the battle for voice recognition supremacy is currently billed as between Amazon Echo (using Alexa) versus Google Home (with Assistant) with the latter launching in the UK on April 6, 2017. Apple is rumoured to have a Siri-for-home product launching in 2018. Microsoft is yet to announce a competing device.
Both devices allow users to speak to control appliances, lighting, heating and web-connected TVs; to play music; maintain a basic appointments calendar; and listen to news. Only with Echo can you shop, through Amazon, of course, although Home will offer shopping in the future and get UK train times. Home has the advantage answering questions thanks to its links with the Google search engine.
In January 2017, a TV news presenter in San Diego said, live on air after a report of a six-year-old girl ordering toys and cookies via her parents’ Echo, “I love that little girl saying ‘Alexa ordered me a dollhouse’”. Echo devices listening to that TV broadcast automatically ordered dolls houses.
This highlights two leading concerns with voice-command devices. Firstly, the aforementioned teething problems with the tech: the word “ordered” was misinterpreted as “order”. Secondly, and more profoundly for many potential users, there is the issue of privacy: when and to what exactly is your voice-control tech listening, and what becomes of that data that creates? All devices have settings allowing the level of interaction to be set, but voice tech is a key element of the debate surrounding online privacy and the use of big data.
There can be no doubt that voice-recognition tech will become increasingly popular in the home and also on the roads, with the rise of driverless cars powered by the dialogue between the vehicle and its designated controller. Early adopters will chuckle at the hiccups and glitches, but perfection may not be far away. Microsoft’s six-brain system was thought, by its creators, to achieve a near human-level of understanding after two or three years of operation. In the end, it took less than one.