With all the excitement around virtual assistants such as Amazon Echo’s Alexa, Apple’s Siri, Microsoft’s Cortana, and Google Assistant, it’s easy to forget that the interfaces between humans and machines have come a very long way in helping us collaborate and enabling us to be more productive. We are at the start of a new interface between humans and AI.
The Advent of the GUI
Before voice assistants and browsers, though, users interacted with machines via command lines and scripts. Scripting is a direct way to interact with machines, but it has a sharp learning curve and leaves little room for error.
Since adoption was only for the tough and ready, developers and researchers created a new solution for users: the Graphical User Interface. Graphical User Interfaces or GUIs had a long road to their current essential functionality.
The first GUIs started in the 1960s with the engineering-oriented Sketchpad. Xerox, often associated with copiers, created one of the first modern GUIs in the 1970s. Things really took off with the Apple Lisa and DOS operating systems. Today, the majority of users around the world interact with their machines (either by touch screen or by mouse) via GUIs.
Budding Human-Computer Interaction
The best designs are often the simplest and most seamless interfaces. What may seem obvious to many users is often wickedly difficult to define and implement from a design perspective. This led to some significant failures in the computer-human interaction design process.
Microsoft Bob was one attempt by Microsoft to take the real world and re-create the experience on a machine to make the interaction between humans and machines as seamless as possible. It didn’t work. Microsoft tried again with Clippy, the interactive paperclip which attempted to help users maximize their productivity and experience with Microsoft Office. This also proved to be a failure.
Natural Language Interface and the Creation of Siri
We then saw a void of commercially available virtual assistants for about a decade. During this time, the technology underpinning voice to text and voice recognition technology grew rapidly.
The Natural Language Interface (or NLI, the next step from GUI), quickly expanded in the academic research and science fields. SRI built one of the first commercially viable virtual assistants, which was then bought by Apple. The iPhone 4s presented this to the world as Siri in 2011.
Siri was certainly an advancement in the Human-Computer interface, but again it was not without its faults and limitations. The market has continued to evolve and expand rapidly to include not just Apple’s Siri, but also Amazon Echo, or Alexa, Microsoft Cortana, and Google assistant.
Natural Language Interface solutions are often connected to the cloud because that is where the best computing at scale is accomplished. Once the machine interprets and takes action on the task asked, the result is returned to the end user. This led to an incorporation of appliances, lights, and TVs into NLI solutions. Open APIs allowed for future growth in this path at the home and office.
What originally started as simple tasks—asking about the weather, getting flight information, or setting reminders—has expanded rapidly into many different areas. Humans after all do enjoy conversations as part of natural language, so commands are increasingly not the only thing machines are capable of completing during an interaction. Functionality has grown to include:
Text to speech while driving
Understanding a person’s unique voice in a loud or crowded place
Buying things hands-free
Anticipating the repeated actions of users
Jokes, and more
All of this functionality comes together to help us collaborate better on the go, optimize energy use in the home and office, help those with handicaps or disabilities, and establish identity more securely via another element that makes us unique: our voice.
Automation and the Question of Privacy
End users naturally have concerns about privacy, security and what this means when devices are actively and constantly listening to us, know our locations, and begin to understand context around these data points on a regular basis. This has led to push back from end users, privacy advocates, police forces, and governments to clarify what is done with the data uploaded, how it is stored, and what legal rights exist over the data.
While these are excellent questions, these themes rhyme very much with what else is happening in the world of software and technology. A solid parallel example in the world for such automation is the Automated Telling Machine, or ATM. Customers used to have to wait in line for even the most basic account transactions.
In the ‘70s and ‘80s, ATMs began replacing simpler transactions instead of waiting for an actual bank teller to supply services. Many people worried that ATMs were going to replace entire groups of people in the banking industry and had privacy concerns around the data input in an ATM. In the end, these concerns proved to be unfounded. Instead, it enabled human bank tellers to provide better and more complex services when customers need it.
Eventually, AI in the guise of NLI and talking towers will fill many of the small to medium complex tasks we currently do manually. As long as these solutions enable users to safely enhance their lifestyle and work, we will see this NLI integration happen more regularly.
Formerly a Solutions Engineer at AvePoint, Bryan worked with enterprises to implement effective business-focused governance, GDPR and regulatory compliance, proactive training and solution deployments for enterprises of all sizes, balancing customizations and available technology to meet business needs.