A recent study shows that a third of the entire population of the US already use virtual assistants in some form (source). Another study made in 2017 showed that 98% believe that they would use voice-controlled assistants 5+ years in the future (source). Naturally, we have become more familiar and comfortable with these voice-controlled services over time. Google, Alexa, Siri, Cortana, Viv, Watson, Bixby, Mycroft. The list is long and still growing.
However, many associate the VA’s with trivial tasks such as setting a timer for the pizza in the oven, playing your favorite music playlist, answering questions you would normally search online, or just simply their inability to understand your question.
Further, the way these applications have been designed makes them a subject for a continuous debate of personal data privacy. Today, VA’s work by recording the user commands, and then sends this voice recording to a cloud processor that identifies the user intention.
The way users express themselves can vary greatly in language, accent, context and tones. Interpreting user intents with such great variations require more computational power than what is typically available in most phones and smart speakers, hence the need for cloud processing.
The voice recordings of the users (I.e. the data) is often by default stored in the vendors own cloud storage, giving them access to huge amounts of personal data that its users might want to keep private. Numerous incidents of privacy breach do not soothe the public opinion on the matter. For instance, there have been reported incidents where VA’s have accidentally recorded private conversations and then, by mistake, forwarded the recording to people in the user’s own contact list (source). Also, user data has been sent to other users (source), or listened to by human employees by the cloud processor (source).
Most innovations suffer challenges at some point, but fortunately, many of them manage to overcome and solve these obstacles through various solutions. VA’s ability to both understand and speak languages has greatly improved over the years as requirements and expectations have increased.
Recent advancements in deep learning have had an impact on the field of virtual assistants. For instance, the WaveNet-architecture developed by Google has made an improvement for natural voice synthesizing (source). This architecture has been an important contribution when creating Google Duplex. Google Duplex can call humans by phone and perform narrow tasks such as making an appointment at the hairdresser. It can do so without the person at the other end noticing that he or she is talking to a machine. Even though this might seem like a small task, it truly is a tremendous achievement!
Human to human communication can be very complex. We often correct or rephrase ourselves mid-sentence, use pause words like “uhm” frequently, and the pace and tone of our speaking can vary greatly. If you add the fact that we can have local variations of expression ourselves, like dialects and sociolects, one can see the complexity that comes from regular conversations.
Consequently, for a machine to be able to interpret human intention based on their natural language is a challenging task. The usual way to solve this is for humans to be very specific in their communication with machines, and not using natural languages. Following the advancements in language understanding and speech synthesizing, human to machine communication can increasingly contain natural language without confusing the machine.
Another result of the advancements in AI is the development of semantic search. Instead of just matching your search words with similar hits online, semantic search can analyze the full context of your query, such as the emotions behind it, and combine it with known user data and thus enriching the result. Semantic search in combination with the natural language communication will give the user a personalized feeling when interacting with voice assistants. With these rapid technological developments, our interactions with VA’s has the potential to be as natural as having a conversation with another person.
In the future, we can also expect to see VA’s that is able to process the user commands locally on-device, rather than the data being transmitted to a datacenter. This field of artificial intelligence is called Edge AI, or simply edge computing. This is the type of technology that Tesla and other car vendors use to make their cars able to be aware of its surroundings and to make decisions based on them. Self-driving cars need to be able to make decisions quickly and cannot rely on cloud services that might lag. Therefore, the vehicles must have huge data processing abilities on the device itself. In many ways, it is the heart of future autonomous vehicles.
By removing the need for online processing of user voice recordings, virtual assistants with on-device processing will shorten the response time. Removing this part of the equation will also increase user privacy since it has no need to process personal data in the cloud. Consequently, a wide range of opportunities could be opened for VA’s. One might, for example, imagine that the virtual assistant could provide the elderly with information about their medications and whether they have taken their medication.
Further, VA’s with on-device processing abilities can be trained to detect other noises as well, such as a person falling in their home. This way, the device can assist you in other ways than just by voice commands. For instance, it can call for healthcare personnel if the person is not responding after falling, or it can function as a house alarm if it detects that a window gets broken.
A feature that has already been developed but has yet to be utilized in virtual assistants is biometric authentication. Biometric authentication can be done through facial recognition, voice recognition, fingerprint recognition or by recognizing other personal traits. By utilizing this feature in VA’s the user can have the opportunity to choose whether the application should only be operated by yourself, or if you will grant access rights to others as well.
The implications of this is that you don’t risk anybody else accessing personal data from the device, whether it be people who want to access credit card information or just your children who want to pay for games with your device. Edge computing will open a whole new range of applications and opportunities for voice assistants. The combination of uniquely identifying users and on-device processing will make it possible for these devices to process highly sensitive data while still maintaining user privacy, such as health-related data. This way, the voice assistant can, for example, assist patient with self-care in their own home.
Today we might view virtual assistants as simple and immature applications, but I am confident that they will be a crucial part of our future lives. The user experience will become much better, making interactions richer and more natural. Their ability to process information on-device will open a whole new range of opportunities. All in all, virtual assistants will become more complex ecosystems that can support you in multiple areas of your everyday lives.
I am particularly excited about the opportunities that VA’s bring to those who feel less confident with interacting with computers. Through experience, I have seen the excitement and empowering experience that interacting with virtual assistants can give, especially for those who feel less confident around computers. The inclusive design of voice-controlled units holds great promise for utilization in areas where the users are usually less tech-savvy. Therefore, I believe that areas such as home care and elderly care are likely to be transformed by this technology in the future.
Currently the Dlab, TietoEVRY Industry Software own Innovation & design unit, is running the project Florence. Florence is a prototype for a voice assistant for clients in elderly care and right now the dLab is testing how she is perceived by the users and how to use the technology in a secure way.
Evens has a passion for driving digital change and transformation. With a broad industry background, Even is developing business possibilities from an early stage concept with alignment of Lean start-up principles –he has a strong believe we have only seen some indication of the digital possibilities.