The Economic Times | How voice is becoming the fastest way to go online

The following article is an industry feature that appeared in The Economic Times and was written by Shelley Singh, Technology Editor. The article is an in-depth feature on the need and growth of VUI in various sectors. Mr. Ramesh Subramanian was featured on behalf of Infogain. To read the original article click here

"Talking to a device via voice interface is a very liberating feeling compared to figuring out a vernacular keyboard." 

Did you know Sundar Pichai still has panipuri cravings? Students of the Brihanmumbai municipal school in Andheri’s DN Nagar are now privy to that savoury secret after the Google boss’ visit here last week.

And when a student asked Pichai what it takes to be an engineer, the boy next door turned-Silicon Valley pin-up said, “Do you have a radio and TV at home? When it gets old, just learn to break that apart.” It was the perfect photo-op. But Pichai used this opportunity to see how Bolo — a reader app powered by Google AI for text to speech and speech recognition — works on the ground. He even tweeted, “Had the chance to visit some students today who are learning to read using Bolo, excited for all the great books they’ll discover.” Far removed from that suburban Mumbai school, Sid Chatterjee of Austin, Texas, too has a use for voice when he is on the road and has to deal with email. He dictates to his Samsung S8, which has a voice mode in English and other languages, including Bengali. “It’s my default option and has 98% accuracy,” says Chatterjee, chief technology officer of Pune headquartered Persistent Systems, an IT services company. “Talking to a device via voice interface is a very liberating feeling compared to figuring out a vernacular keyboard.”

No doubt, voice is reassuring in the continuous evolution of human-machine interactions, even for hard core techies like Prasad Joshi, vice-president, emerging technologies, Infosys. His mother tongue is Marathi but he has never found it easy to use a Marathi keyboard. “Now, I can talk to my device in Marathi and send voice messages or write mails,” says Joshi, who is also based in the US.

Apart from using voice in their personal lives, both these geeks have seen customer demand for voice-activated products increase significantly in recent months.
Voice is the most natural, intuitive means of interaction. It is basic communication. Yet, it has been almost the last to get there. Using computers and digital devices always meant a familiarity with QWERTY keyboards, basic commands, touchscreens, web interfaces and so on.

Besides, earlier generation voice accuracy left a lot to be desired, and users had no option but to go for keyboards — virtual or physical —always a challenge for older users and others with low literacy levels. As Umesh Sachdev, cofounder, Uniphore, a natural language processing startup, says, “Voice is quicker compared to typing text—150 words spoken compared to 40 words typed per minute.” For those who frequently seek help from random people at post offices and ATMs to fill forms or withdraw money, voice is becoming a game changer.

Google had done extensive pilots before launching Bolo on March 6. The app, which will read out books to children to improve their Hindi and English skills, was tested in 200 villages around Uttar Pradesh’s Unnao district before launch. During the trial period, 64% village children showed improvement in reading proficiency in just three months, with Diya, the in-app interactive reading buddy, telling stories and even correcting their pronunciation.
By 2021, Google expects 735 million internet users in India. Of this, 563 million will access internet in local languages. For many of them, voice will be an easy way to go online.

In terms of projects, “voice counts higher than blockchain or cybersecurity,” says Joshi. Customers are seeking voice activation within apps. In last 12-15 months, voice interfaces are getting better at understanding users, multiple languages and dialects and accuracy. According to Gartner, by 2020, almost one-third of interactions will be through conversations with smart machines.

For the 500 million people already online, voice will be an add on, the most natural way to interact, while for those not yet online due to low literacy or challenge in using keyboards, voice will help leapfrog to that world. Subho Ray, president, Internet and Mobile Association of India (IAMAI), says, “Voice as input will make a big difference for users who are not keyboard savvy.”

Reliance Jio Infocomm is already seeing a preference for voice interfaces among first-time data users. Apart from playing music, helping in search, dimming lights or operating locks, Infosys’ Joshi sees even CXOs use voice commands to fetch market reports or get real time updates from their sales teams.

Banks, too, are experimenting with voice biometrics. Instead of users remembering multiple passwords, their voice could help complete transactions or at least authenticate users when they reach out to customer service. For example, Yes BankNSE -2.12 % has clocked 5.7 million customer interactions via its voice bot. Ritesh Pai, group president and chief digital officer, Yes Bank, says, “Besides serving as a means to input information, voice doubles up as a strong biometricauthentication factor.” For Standard Chartered Bank users, their voice is their password. “It’s a lifesaver,” says Subhasree Basu, a Mumbaibased entrepreneur. “My husband is a digital Luddite and honestly, these days, you need a personal assistant just to remember all your passwords.”

SoftBank-backed PolicyBazaar is working on models where people can say, “Give me car insurance options within this premium,” and get options.

Amazon’s Alexa can order products if you have enough Amazon Pay balance. In January, Alexa started voice bookings from PVR, KFC and Hungama Music. “By 2022, 80% of our interaction with audio visual devices will be non-touch,” predicts Sumit Chauhan, vice-president, lifestyle audio at Harman India, a manufacturer of speakers.

Bengaluru-based Uniphore believes retail consumer banking can be driven via voice-enabled apps for tasks including balance checking, funds transfers and bill payments. IndusInd Bank is on Alexa, which helps the bank’s customers complete simple tasks via voice commands. Damon Xi, general manager, UCWeb India & Indonesia, Alibaba Digital Media & Entertainment Group, says, “Voice input makes it convenient for users to get information on the search engine. UC has tied with Google to realise voice search on its platform. It’s a big benefit for users, especially for diversified local language searches.”
Persistent’s Chatterjee cites the case of a colleague’s husband who is “writing” a book in Bengali using voice. Barbara Cartland, who dictated almost 100 of her 700-plus romance novels to an army of typists, would turn in her grave.

“Voice opens up a new market,” says PN Sudarshan, partner, technology, Deloitte India, a consultancy. For instance, farmers can get prices in local markets by asking their phones and artisans — who might find it challenging to use keyboards — can easily ask devices and learn about exhibitions and markets they can go to.

Ramesh Subramanian, chief technology officer, Infogain, a mid-tier tech services company, points out that voice could be a boon for, say, a mechanic repairing an automobile. “A voice-based application used on the mobile will make it easy for him to source parts, talk to the car owner and even update insurance without wasting time in typing out information.” Similarly, surgeons, radiologists, nurses, managers, among a host of others, could benefit via voice interfaces. 
Interacting with devices has never been as easy or accurate as now. Much of this has been brought about by the ability of machines to understand the human voice. Mountain View, California-based Vladimir Vuskovic, product manager, Google, points out that in 2013, machines could recognise five of 20 words. Now, they are able to correctly understand 19.The intent of the user might still be difficult to comprehend but technology is improving by using machine learning and natural language processing capabilities. Basically, the ability of machines to process data is getting better. Joshi explains it as a confluence of technology — AI, ML, advanced networks. Voice interfaces will further improve with 5G networks of speeds more than 20x of 4G networks and very low latency (from 50 milliseconds in 4G to 1 ms in 5G). AI-powered voice assistants (such as Google Home, Alexa, and Siri) interpret natural language to complete an electronic task. Core components of voice devices include the automatic speech recognition (ASR) engine and natural language processing (NLP) engine. ASR converts speech signals into text, which is then provided as input to the NLP engine.
This, in turn, uses natural language understanding to get a meaningful representation of the spoken word. The response from the application is converted to speech using text-to-speech convertors.
“Algorithms and engines that power speech recognition have improved significantly,” feels Mahesh Makhija, leader, emerging technologies, EY India. “Machines can work with large data sets and the number of voice queries are of the order of hundreds of millions per month.” Makhija is referring to queries on assistants such as Siri, voice bots and devices like Alexa, Google Home and others.
Alongside, noise reduction and cancellation techniques have added to clarity of communications with machines. “Till now, humans were changing their behaviour to adjust to computers (by learning to type, etc). Now, it’s the other way round,” adds Dilip RS, country manager, India, Alexa Skills (ASK), Amazon.

Alexa claims more than 40,000 developers in India developing voice skills (term for voice apps). Voice assistants are priced at Rs 2,500-18,000 but Sudarshan of Deloitte sees at least a 50% drop in prices in a year, as volumes increase and more brands — including local manufacturers —offer low-cost assistants.

Voice computing will do what Indic language keyboards failed to do. The latter needed some literacy level to use and even people familiar with Indic languages never found them comfortable. Voice command is much faster and easier to input. Daan van Esch, technical program manager, Google, explains, “Indian languages are hard to input in a phone as most have a complicated script. The most natural way for anyone to interact with a device is to talk to it (like with a human). That’s why voice searches are increasing.”

Voice also opens up areas of conversational commerce — though initially only in routine kitchen consumables or standard domestic use products such as razor blades, rather than apparel, where buyers would want to see a wider selection. “The conversational aspect of voice still needs to be developed further to support more complex questions or ongoing/flowing interactions,” says Annette Jump, senior director, Gartner.
The lack of ability of voice assistants to follow a line of thought, though, is a limitation but one that will be overcome with time. Jump points out three areas of development to improve user experience — fast internet connectivity to process questions, support for languages and specific dialects and availability of relatively inexpensive VPA-enabled speakers. “One other hindrance to overcome is individual ‘shyness’ to talk loudly to a device and learning that some information avirtual assistant delivers might not be 100% correct,” adds Jump.
Indians often mix languages in conversation, which could confuse voice assistants. Google, Alexa, Microsoft and others are working on overcoming these challenges. Also, a lot of users who are already online tend to use less of the voice assistants over time. Van Esch says, “People familiar with keyboards won’t switch entirely to voice, though while interacting on mobile devices, people will move to voice inputs in private situations.”
Anku Jain, managing director, MediaTek India (a maker of chips), believes voice ushers in an era of “calm technology. Gadgets will become less technical and yet more intelligent. User-friendly voice assistants will become more interactive, just like a friend/assistant. This is not only good for people on the move but also for those with accessibility or readability problems.”

Of course, as with any disruptive technology, there will be concerns, namely security issues (fears that someone can record my voice and transfer money out of my bank account). But for the internet have-nots, the couch potatoes and the people who love chatting, it’s a resounding “yes” for voice input.