Tuesday, April 16, 2013

Using SL4A and Android Speech Recognition for Home Automation

My latest project has been experimenting with SL4A (Scripting Language For Android) and Python on my Galaxy Note II. I started with the included saychat.py sample to build a simple script that kicks off Android speech recognition. It takes the text result returned from Google and sends it over IM to our HA server. The HA server does some basic natural language processing on the text, extracting commands and performing the operations if any valid ones are found. It then returns a response over IM to the phone with the result of the command(s). Back on the phone, the Python script has been waiting for this confirmation and uses TTS to read it back. The cycle repeats until the user says "goodbye" or it gets two consecutive recognition results with no speaker. Here's a short YouTube video of it in action:



From the video, you can see I've tried to parse the speech so that it can find the commands and devices even if the command is spoken differently. I used three different phrases:

  • "Can you turn on the kitchen light and dining room light?"
  • "Can you turn off the lights in the kitchen?"
  • "Turn off the dining room light."

    I was trying to avoid having only simplistic commands like the last one. The first one demonstrates that ability to speak a command for multiple devices, and the ability to preface the command with "Can you" or pretty much anything like "The dog wants you to" ;) The 2nd command shows that it's not restricted to parsing "kitchen light" together. The last command is a typical HA VR command. My parser also has the ability to decode multiple commands in one, such as "Turn off the kitchen light, the guest bath fan and living room light and turn on the back floods." The only challenge is saying everything you want to say without much of a pause, otherwise recognition stops and the partial command will be sent.

    A few advantages of using this setup:

  • Google's speech recognition in the cloud is probably the best, most up-to-date system. They started building up their system with the now closed GOOG-411 service. Further fine tuning gets done on the millions of voicemails their Google Voice service transcribes. Their Chrome browser also uses their speech recognition and of course, so do the millions of Android users. All this input goes into tuning their accuracy, and what you end up with is one of the best performing, up to date speech recognition systems. If you're using Microsoft's Windows VR, you're probably getting something that gets updated every few years with each OS release - if you're upgrading. With HAL, you've getting a 1990s VR engine. I'm not even sure if that gets updated anymore.
  • Google's free form speech recognition allows the most flexibility in speaking commands. Granted, that makes the parsing more difficult, but it allows a system that can more accurately respond to the different ways different people phrase commands. Most speech recognition engines I've worked with require you to pre-program canned phrases in order to recognize commands. If you deviate just a little from what's programmed, good luck getting your command recognized.
  • By using Jabber IM as a transport mechanism for the recognized commands, the same system that works at home, works when you're away. You just turn on your mobile data - there's no VPN or SSH tunnels to set up every time you want to speak a command. There's one level of security for free since your home's IM client must have pre-approved other users to allow communication (adding them to the roster). Another level can be done at the scripting layer of your HA software, by limiting what IM users can issue certain commands. For extra security, you can even encode or encrypt the text being sent over IM if you want, but if you're using Google Talk servers, your communication is already wrapped in SSL.

    A few more details. Using SL4A, I cannot control the default speech recognition sounds - it can get annoying after a while. I'm using Nova Launcher as my launcher instead of TouchWiz. Nova Launcher let's you remap the home key behavior on the home screen. When pressed, instead of showing the zoomed out view of all my screens, it kicks off the script. Also, my HA device database is stored in mySQL, which allows for powerful searches and easy matching of what's spoken to actual devices - even when the device name isn't exactly the same as what was spoken. I've been using the mySQL setup, IM interface and command parsing for many years now, (although the parsing was more primitive) so integration was extremely simple. At some point, I would like to implement NLTK, the Natural Language ToolKit, for more complex language processing.


  • 2 comments:

    1. Great demo. We'd love to talk to you. We're chip guys too. Glen www.myube.co

      ReplyDelete
      Replies
      1. hi Glen,

        I sent you an email at the "info" address embedded in your web form. Hope it finds it's way to you.

        Delete