We respect your privacy. 

View our Privacy Policy.

  • Matthew Kaufman

"Hey Siri, How do you understand me?"


We take it for granted, but just how impressive are our digital assistants?

Whether it’s Apple’s Siri, Microsoft’s Cortana, Amazon’s Alexa, Google’s Assistant, or some other entry; almost all of us have come into contact with some sort of speech understanding A.I. that helps us with some of the mundane things in life. But how exactly do these things work? How much do they really understand or know? Allow me to answer that for you.


So when you first call up your digital ally and give them a command, the first thing that happens is a microphone records what you say. The device then takes that recorded audio and sends it out to some server in the cloud that will process what you just said into written text. If it can’t send it out, it will do its best job to process the recorded audio into text. This is a very important step to understand because it’s actually where most of the problems end up arising.




If you’ve ever used a very old version of Windows Media Player with the visualizations on or have ever done something with audio, you would probably easily recognize the reason why this is. When the microphone records you, it records the exact input into a set of sound waves. Unlike a human ear which naturally knows and distinguishes which sounds are just the cat meowing and which are the humans talking, the computer has no idea which waves are those of people talking. So it first has to do a lot of guesswork to try and figure out where exactly the words are coming from. At that point, it does its very best job to try and transform that sound wave into written out text. But even with just a single sound wave, that can still be very difficult. Discounting the introduction of multiple languages, people still talk funny from time to time. Our regional accents and dialects can make it hard for the computer to understand.


Assuming we’ve gotten past this point, the computer now has a written out text of what we’ve said. Even under the hope that the text has been perfectly transcribed, the computer still has a tough task in front of it. From here, begins a massive field of research in computer science called Natural Language Understanding. This whole field focuses on the concept of taking the natural language of human beings, and making it so that way a computer fully understands what we’re talking about. The difficult part about this all, is that there is so much that the computer must try and understand. Think of it this way; we’re trying to take an overly glorified abacus and make it have the brain of a human old enough to hold a real conversation. An incredibly daunting task indeed.


Now, let’s say that we have accomplished this. Let us say that we have had the computer slay the goliath level task of understanding what was spoken to it. This is a large supposition, but let us continue onward with the explanation. From here, the computer has a very human problem. What do I do with this knowledge now? The commands we give to computers aren’t always very straightforward. Say you’ve asked your A.I. buddy to “Play ‘Sympathy for The Devil’ by The Rolling Stones”. An excellent song choice, but that’s still incredibly vague.

The computer must now make dozens of split second decisions about how you want that played. Do you want to use Spotify, Amazon Music, YouTube, etc.? Do you want to hear the “Remastered” version, the “Original” version, the “Live” version, or something else? Through what speakers should it play? Even human beings have difficulties with this step when communicating with OTHER HUMANS. How many times have you asked someone to hand you something and they had to ask for further directions?


With all of these things explained, perhaps you’re starting to understand how difficult this all must be for the little piece of metal, glass, and silicon in your pocket. But now, I’d like to put this all in a bit more of perspective. How long do you think a human being would even be willing to wait for a computer to figure all of this out before they just decide to do the task themselves? Not only are we asking Miss Alexa or Miss Siri to take on a near impossible task, but we get upset with them if they have the audacity to take more than a few moments to complete it. Us users truly are demanding creatures, are we not? So next time you’re having that A.I. pal schedule your dentist appointment, take a moment and truly appreciate how fascinating your little buddy truly is.


Like the article? Consider sharing it!