Voice Assistant for Desktop:

Introduction

A virtual assistant, also called an AI assistant or digital assistant, is an application program that understands natural language voice commands and completes tasks for the user.

For building any voice based assistant you need two main functions. One for listening to your commands and another to respond to your commands. Along with these two core functions, you need the customized instructions that you will feed your assistant.

The first step is to install and import all the necessary libraries. Use pip install to install the libraries before importing them. Following are some of the key libraries used in this program:

  • The SpeechRecognition library allows Python to access audio from your system’s microphone, transcribe the audio, and save it.
  • Google’s text-to-speech package, gTTS converts your audio questions to text. The response from the look-up function that you write for fetching answer to the question is converted to an audio phrase by gTTS. This package interfaces with Google Translate’s API.
  • Playsound package is used to give voice to the answer. Playsound allows Python to play MP3 files.
  • Web browser package provides a high-level interface that allows displaying Web-based pages to users. Selenium is another option for displaying web pages. However, for using this you need to install and provide the browser-specific web driver.
  • Wikipedia is used to fetch a variety of information from the Wikipedia website.
How we are going to build this?
It's basically very simple. We need to create only 3 functions and that's it!
  • The first function, recognize voice(), will be responsible for capturing our voice (which we input through the Microphone), recognizing it, and returning the "text"  version of it.
  • Then we will take that "text" version of our voice and give it to another function called reply(), which will be responsible for replying back to us and doing all sorts of other crazy things (like searching google, telling the current time, etc.).
  • Finally, a function called speak(), which will take whatever text we give it and converts it into speech.

Requirements

  • You should have python3.3 or a higher version installed on your computer.
  • You should have venv installed. If you are using Python 3.3 or newer, then the venv is already included in the Python standard library and requires no additional installation.
  • You should have a microphone (your laptop's builtin one or the one on your earphone will do the job)
  • You should need an Internet connection.
  • Finally, you should have a modern code editor like visual studio code.
  • With these things in place, let's get started.

Initial Setups

  • First, create a folder named voice_assistant anywhere on your computer.
  • Then open it inside visual studio code.

Now let's make a new virtual environment using venv and activate it. To do that:

  • Open Terminal > New Terminal.
  • Then type:
    • python3 -m venv venv

    This command will create a virtual environment named venv for us.

    • To activate it, if you are on windows, type the following:
      • venv\Scripts\activate.bat

    Now you should see something like this:

    Note: Virtual environments like venv help us to keep all the dependencies related to the current project in its own environment isolated from the main computer. That's one of the main reasons why we are using it.

    • Finally, create a new file named "main.py" directly inside the voice_assistant folder like below:

  • Now you will have something similar to this:
visual studio code with the new main.py file

That's it, now let's install those required modules.

Installing the requirements

For recognizing our voice and converting it into text, we need some additional module like SpeechRecognizer, so let's install it. Type the following command in the terminal:

  •    pip install speechRecognition

Now If you are using the Microphone as the input source, in our case we are, then we need to install the PyAudio package. 

The process for installing PyAudio will vary depending on your operating system.

  •  pip install pyaudio

If you got any errors installing PyAudio on Windows, then refer to this StackOverflow solution. If you are on different machines, then try to Google the error. If you still got those errors, then feel free to comment below.

Once you’ve got PyAudio installed, you can test the installation from the terminal by typing this:

  • python -m speech_recognition

Now to give the program the ability to talk, we have to install the pyttsx3 module:

  • pip install pyttsx3

pyttsx3 is a Text to Speech (TTS) library for Python 2 and 3. It works without an internet connection or delay. It also supports multiple TTS engines, including Sapi5, nsss, and espeak.

That's it, we have installed and set up all the pre-requirements. Now it's time to write the program itself, so let's do that.

recognize_voice()

First of all, let's import all the necessary imports.

Type the following code inside the main.py file:

# all our imports
import speech_recognition as sr
from time import sleep
from datetime import datetime
import webbrowser
import pyttsx3


  • First, we are importing the speech_recognition module as sr.
  • Then we are importing the sleep() function from the time module. We will use this in a bit to make a fake delay.
  • Then for knowing the current date and time, we need that datetime module.
  • Then to open up a browser and do a google search, we need the help of the webbrowser module.
  • Then as I said earlier, to convert text to speech, we need pyttsx3.

All of the magic in SpeechRecognition happens with the Recognizer class. So let's instantiate it next:

# make an instance of Recognizer class
r = sr.Recognizer()


Now configure the pyttsx3:

# confs for pyttsx3
engine = pyttsx3.init()


  • pyttsx3 will be responsible for generating the computer voice. To see/hack the gender, age, speed, etc. of the generated computer voice, read this description.

Now let's create that recognize_voice() function. This recognize_voice() function will do the following:

  • listens to our Microphone.
  • recognize our voice with the help of recognize_google() function.
  • converts it into text format.
  • And then returns that text version of our voice.

Create the recognize_voice() function like below:

Comments