Serena AI Assistant

A personal voice-activated assistant with customizable voice options and multiple capabilities

Serena AI Assistant Demo

Project Overview

Serena is a desktop-based AI assistant with voice recognition capabilities that can perform various tasks through voice commands. The application uses a combination of speech recognition, text-to-speech, and natural language processing to understand and respond to user requests. Serena can perform web searches, control system functions, open applications, control media playback, and provide information on a variety of topics.

Key Technologies

Project Architecture

The project consists of three main files:

1. main.py

This is the entry point of the application that initializes the Tkinter window and starts the Serena Assistant.

import tkinter as tk
from serena_assistant import SerenaAssistant

def main():
    root = tk.Tk()
    app = SerenaAssistant(root)
    root.mainloop()

if __name__ == "__main__":
    main()
        

2. serena_assistant.py

This is the main class that handles the assistant's functionality, including the GUI, voice recognition, command processing, and response generation.

3. utils.py

Contains utility classes, particularly the TextToSpeech class that handles different text-to-speech options (Google TTS and local pyttsx3).

Core Features

Voice Recognition

Serena listens for voice commands through the device's microphone and converts speech to text using Google's speech recognition service.

Multi-Language Support

The assistant can speak in both English (Indian accent) and Hindi, allowing users to switch between languages with simple voice commands.

Customizable Voice

Users can choose between online (Google TTS) and local (pyttsx3) voice engines for the assistant's responses.

Web Searches

Serena can perform web searches by opening the default browser with relevant search queries.

System Control

The assistant can control system functions like shutdown and restart operations.

Application Management

Serena can open applications on request, making it easier to access frequently used programs.

Media Controls

Control media playback with voice commands for play, pause, next track, and previous track.

Voice Typing

The assistant can listen and type what the user says, acting as a voice-to-text tool.

Information Retrieval

Leveraging OpenAI's GPT-3.5 model, Serena can provide information on a wide range of topics.

Detailed Component Analysis

SerenaAssistant Class

The SerenaAssistant class is the heart of the application, coordinating all components and features:

Initialization

The initialization process sets up all necessary components:

def __init__(self, root):
    self.root = root
    self.setup_window()
    self.setup_voice()
    self.setup_recognizer()
    self.setup_openai()
    self.setup_personality()
    self.create_gui()
        

Each setup method handles a specific aspect of the assistant's functionality:

Voice Recognition System

Voice recognition is handled by a dedicated listening thread that captures audio input, processes it, and converts it to text:

def listen_loop(self):
    while self.listening:
        try:
            with sr.Microphone() as source:
                self.recognizer.adjust_for_ambient_noise(source, duration=0.5)
                self.log_message("Listening...")
                try:
                    audio = self.recognizer.listen(source, timeout=5, phrase_time_limit=5)
                    command = self.recognizer.recognize_google(audio).lower()
                    if command:
                        self.log_message(f"You: {command}")
                        self.process_command(command)
                except sr.WaitTimeoutError:
                    continue
                except sr.UnknownValueError:
                    self.log_message(random.choice(self.error_responses))
                except Exception as e:
                    self.log_message(f"Error: {str(e)}")
        except Exception as e:
            self.log_message(f"Listening error: {str(e)}")
            time.sleep(1)
        

Command Processing

The command processing system uses OpenAI's GPT-3.5 model to understand user intents and categorize them:

def process_command(self, command):
    try:
        # Handle language switching commands directly
        if any(phrase in command.lower() for phrase in ["switch to hindi", "speak in hindi", "use hindi"]):
            self.tts.switch_to_hindi()
            self.say("अब मैं हिंदी में बोलूंगी")
            return
        # Other command handling...
        
        # Use OpenAI to classify the command
        response = self.client.chat.completions.create(
            model="gpt-3.5-turbo",
            messages=[
                {"role": "system", "content": "You are Serena, a helpful female voice assistant. Analyze the command and respond with a JSON object containing: category (web_search/system_control/application/media_control/voice_typing/information), action, and parameters."},
                {"role": "user", "content": command}
            ]
        )
        result = json.loads(response.choices[0].message.content)
        category = result.get('category')
        parameters = result.get('parameters', {})
        
        # Execute the appropriate action based on category
        if category == 'web_search':
            self.web_search(command)
        elif category == 'system_control':
            self.system_control(command)
        # Other categories...
        
    except Exception as e:
        self.log_message(f"Command processing error: {str(e)}")
        self.say("I encountered an error processing that command. Please try again.")
        

Text-to-Speech System

The TextToSpeech class in utils.py handles voice output with support for multiple engines and languages:

class TextToSpeech:
    def __init__(self, rate=150, volume=1.0, use_indian_english=True):
        self.use_gtts = True
        self.use_indian_english = use_indian_english
        self.engine = pyttsx3.init()
        self.engine.setProperty('rate', rate)
        self.engine.setProperty('volume', volume)
        voices = self.engine.getProperty('voices')
        if len(voices) > 1:
            self.engine.setProperty('voice', voices[1].id)
        self.speak_lock = threading.Lock()
        language = "Indian English" if use_indian_english else "Hindi"
        print(f"Text-to-speech initialized using {language}")
        

This class supports:

Graphical User Interface

The GUI is built using Tkinter and features:

def create_gui(self):
    self.main_frame = ttk.Frame(self.root, style='Custom.TFrame', padding="20")
    self.main_frame.grid(row=0, column=0, sticky="nsew")
    # Configure grid weights...
    
    # Status label
    self.status_var = tk.StringVar(value="Ready")
    self.status_label = ttk.Label(
        self.main_frame, 
        textvariable=self.status_var,
        style='Status.TLabel'
    )
    self.status_label.grid(row=0, column=0, pady=(0, 10), sticky="ew")
    
    # History text area
    self.history_frame = ttk.Frame(self.main_frame)
    self.history_frame.grid(row=1, column=0, sticky="nsew")
    # Configure text area...
    
    # Button to control listening
    self.button_frame = ttk.Frame(self.main_frame)
    self.button_frame.grid(row=2, column=0, pady=(10, 0), sticky="ew")
    self.start_button = ttk.Button(
        self.button_frame,
        text="Start Listening",
        command=self.toggle_listening,
        style='Custom.TButton'
    )
    self.start_button.grid(row=0, column=1)
    
    # Animated GIF
    self.gif_label = tk.Label(self.main_frame)
    self.gif_label.grid(row=3, column=0, pady=10)
    self.gif_path = "gif/3.gif"
    self.gif = Image.open(self.gif_path)
    self.gif_frames = [ImageTk.PhotoImage(img) for img in ImageSequence.Iterator(self.gif)]
    # Animation setup...
        

Functional Capabilities

Web Search

Serena can search the web by extracting search terms from the command and opening a browser:

def web_search(self, query):
    search_terms = query.replace('search', '').replace('google', '').strip()
    self.say(f"Searching for {search_terms}")
    webbrowser.open(f"https://www.google.com/search?q={search_terms}")
        

System Control

The assistant can manage system operations like shutdown and restart:

def system_control(self, command):
    if 'shutdown' in command:
        self.say("Preparing to shut down the computer...")
        os.system('shutdown /s /t 60' if platform.system() == "Windows" else 'shutdown -h +1')
    elif 'restart' in command:
        self.say("Preparing to restart the computer...")
        os.system('shutdown /r /t 60' if platform.system() == "Windows" else 'shutdown -r +1')
    elif 'cancel shutdown' in command:
        self.say("Canceling shutdown...")
        os.system('shutdown /a' if platform.system() == "Windows" else 'shutdown -c')
        

Application Management

Serena can open applications across different operating systems:

def open_application(self, app_name):
    try:
        if platform.system() == "Windows":
            os.startfile(app_name)
        else:
            subprocess.Popen([app_name])
        self.say(f"Opening {app_name}")
    except Exception as e:
        self.say(f"Sorry, I couldn't open {app_name}")
        

Media Control

The assistant can control media playback using keyboard shortcuts:

def media_control(self, command):
    if 'play' in command or 'pause' in command:
        keyboard.press_and_release('play/pause media')
    elif 'next' in command:
        keyboard.press_and_release('next track')
    elif 'previous' in command:
        keyboard.press_and_release('previous track')
        

Voice Typing

Serena can listen to what the user says and type it using the keyboard module:

def voice_typing(self):
    self.say("Voice typing mode enabled. Speak your text.")
    try:
        with sr.Microphone() as source:
            audio = self.recognizer.listen(source, timeout=10)
            text = self.recognizer.recognize_google(audio)
            keyboard.write(text)
            self.say("Text typed successfully")
    except Exception as e:
        self.say("Sorry, I couldn't type that.")
        

Information Retrieval

The assistant can provide information on various topics using OpenAI's GPT model:

def get_information(self, query):
    try:
        response = self.client.chat.completions.create(
            model="gpt-3.5-turbo",
            messages=[
                {"role": "system", "content": "You are Serena, a female personal friendly and interactive AI assistant. Respond to the following user input in a conversational and engaging manner. Provide comprehensive and very short but concise information about the query."},
                {"role": "user", "content": query}
            ]
        )
        answer = response.choices[0].message.content
        self.say(answer)
    except Exception as e:
        self.say("I'm sorry, I couldn't find that information.")
        

Getting Started

Requirements

Example Voice Commands