A standalone email interview conducted by The Economic Times with Ramesh Subramanian on VUI.
Voice, the most natural way to communicate, will change the way we interact with devices. What keyboards and touch screens could not do in making technology inclusive, voice will do. Users don’t need to figure out complicated scripts but will fetch information or complete transactions using voice. In an interview, Ramesh Subramanian, chief technology officer (CTO), Infogain Corporation, a tech services firm, says its still early days for voice as an input, yet it holds immense potential in impacting our daily lives. Edited excerpts:
There are two broad, main reasons driving the use of voice. First is ease of use: We are used to talking to communicate. So, if one wants a very convenient usage of any system, voice based systems could be the answer. People will get very comfortable with it. Second, when conventional means of communication could lead to a waste of time.
For example, in case of a mechanic repairing an automobile there could be a situation where they are busy with a machine and need information on a certain part for the automobile. Rather than going through the conventional means, they could be served better with a voice interface. Without leaving their work stations they can get information, use that information, interact with other entities to compare or file an insurance.
Conventionally, the mechanic would have to use a laptop to find and interpret data. Instead a voice based application on the mobile phone or using an Alexa like interface or even Echo, will make it easy for him to get the data, saving him time.
Another example is where you may be temporarily indisposed and cannot use your hands to gain access to the information needed – like a radiologist or a surgeon in the middle of planning a procedure, where typing out a report could lead to wastage of time. In such cases devices that can take voice as an input, interpret it and come back with the required information would fit very well.
Absolutely. VUI is not codified yet, it lacks established frameworks, but it is clear that interfacing with VUI is different than interfacing with GUI or some other formats as we are used to in the current systems.
Voice has different requirements, it may be non-sequential in the way information is requested, you may also not go according to the guidance script provided; there could be diversions. The inquiry could change, rather evolve while being requested. It may lead to different add-on queries.
But, usually information has some protocol or compliance or security requirement before being divulged. One common use case we have experienced is voice use in Call Centers. It has now become possible to identify a caller based on speech patterns and phraseology, bypassing the need to validate the caller by asking security questions (like date of birth, pet’s name).
The system should also be able to make out if I am at a public place, applying restriction in disclosing sensitive information. These are all emerging standards, still being evolved, observing the traditional world and applying it to the voice world. Compliance conditions are being added and removed as well. This is an exploratory phase for VUI standards.
Yes. We believe this to be a tremendous growth area. In fact, according to a Gartner report, by 2020, 30% of interactions with technology will be through `conversations’ with smart machines.
Any system that we develop has to have internalisation, has to have disability access standards implemented.
We have developed some interesting use cases (like mentioned in earlier answer) at the retail end or business use. There has been growth in the use of VUI for call centre optimisation (already voice based), targeting to make the call centers less impersonal and reduce the bounce rate. Voice can give this level of customer intimacy. However, voice is only the front end of the system, a responsive and intelligent system uses artificial intelligence and machine learning in the backend. Enabling easier interactions with context awareness.
One of the key reasons for using voice is to enable easier interaction, which in turn should behave less mechanically, understanding context, resolution, tenor of the spoken voice, this will give rise to Voice Analytics, using NLP (Natural Language Processing), which will help convert big data into coherent information.