Serena AI Assistant

Core Features

Voice Recognition

Serena listens for voice commands through the device's microphone and converts speech to text using Google's speech recognition service.

Multi-Language Support

The assistant can speak in both English (Indian accent) and Hindi, allowing users to switch between languages with simple voice commands.

Customizable Voice

Users can choose between online (Google TTS) and local (pyttsx3) voice engines for the assistant's responses.

Web Searches

Serena can perform web searches by opening the default browser with relevant search queries.

System Control

The assistant can control system functions like shutdown and restart operations.

Application Management

Serena can open applications on request, making it easier to access frequently used programs.

Media Controls

Control media playback with voice commands for play, pause, next track, and previous track.

Voice Typing

The assistant can listen and type what the user says, acting as a voice-to-text tool.

Information Retrieval

Leveraging OpenAI's GPT-3.5 model, Serena can provide information on a wide range of topics.

Detailed Component Analysis

SerenaAssistant Class

The SerenaAssistant class is the heart of the application, coordinating all components and features:

Initialization

The initialization process sets up all necessary components:

def __init__(self, root):
    self.root = root
    self.setup_window()
    self.setup_voice()
    self.setup_recognizer()
    self.setup_openai()
    self.setup_personality()
    self.create_gui()

Each setup method handles a specific aspect of the assistant's functionality:

setup_window(): Configures the application window and styles
setup_voice(): Initializes the text-to-speech engine
setup_recognizer(): Configures the speech recognition component
setup_openai(): Sets up the OpenAI API client for natural language processing
setup_personality(): Defines the assistant's responses for various situations
create_gui(): Constructs the graphical user interface

Voice Recognition System

Voice recognition is handled by a dedicated listening thread that captures audio input, processes it, and converts it to text:

def listen_loop(self):
    while self.listening:
        try:
            with sr.Microphone() as source:
                self.recognizer.adjust_for_ambient_noise(source, duration=0.5)
                self.log_message("Listening...")
                try:
                    audio = self.recognizer.listen(source, timeout=5, phrase_time_limit=5)
                    command = self.recognizer.recognize_google(audio).lower()
                    if command:
                        self.log_message(f"You: {command}")
                        self.process_command(command)
                except sr.WaitTimeoutError:
                    continue
                except sr.UnknownValueError:
                    self.log_message(random.choice(self.error_responses))
                except Exception as e:
                    self.log_message(f"Error: {str(e)}")
        except Exception as e:
            self.log_message(f"Listening error: {str(e)}")
            time.sleep(1)

Command Processing

The command processing system uses OpenAI's GPT-3.5 model to understand user intents and categorize them:

def process_command(self, command):
    try:
        # Handle language switching commands directly
        if any(phrase in command.lower() for phrase in ["switch to hindi", "speak in hindi", "use hindi"]):
            self.tts.switch_to_hindi()
            self.say("अब मैं हिंदी में बोलूंगी")
            return
        # Other command handling...
        
        # Use OpenAI to classify the command
        response = self.client.chat.completions.create(
            model="gpt-3.5-turbo",
            messages=[
                {"role": "system", "content": "You are Serena, a helpful female voice assistant. Analyze the command and respond with a JSON object containing: category (web_search/system_control/application/media_control/voice_typing/information), action, and parameters."},
                {"role": "user", "content": command}
            ]
        )
        result = json.loads(response.choices[0].message.content)
        category = result.get('category')
        parameters = result.get('parameters', {})
        
        # Execute the appropriate action based on category
        if category == 'web_search':
            self.web_search(command)
        elif category == 'system_control':
            self.system_control(command)
        # Other categories...
        
    except Exception as e:
        self.log_message(f"Command processing error: {str(e)}")
        self.say("I encountered an error processing that command. Please try again.")

Text-to-Speech System

The TextToSpeech class in utils.py handles voice output with support for multiple engines and languages:

class TextToSpeech:
    def __init__(self, rate=150, volume=1.0, use_indian_english=True):
        self.use_gtts = True
        self.use_indian_english = use_indian_english
        self.engine = pyttsx3.init()
        self.engine.setProperty('rate', rate)
        self.engine.setProperty('volume', volume)
        voices = self.engine.getProperty('voices')
        if len(voices) > 1:
            self.engine.setProperty('voice', voices[1].id)
        self.speak_lock = threading.Lock()
        language = "Indian English" if use_indian_english else "Hindi"
        print(f"Text-to-speech initialized using {language}")

This class supports:

Speaking in Indian English or Hindi
Using Google TTS (online) or pyttsx3 (local) voice engines
Threaded speech output to prevent UI freezing
Automatic fallback if one method fails

Graphical User Interface

The GUI is built using Tkinter and features:

A status label showing the current state (Ready, Listening)
A scrollable text area displaying conversation history
A button to toggle listening mode
An animated GIF for visual feedback

def create_gui(self):
    self.main_frame = ttk.Frame(self.root, style='Custom.TFrame', padding="20")
    self.main_frame.grid(row=0, column=0, sticky="nsew")
    # Configure grid weights...
    
    # Status label
    self.status_var = tk.StringVar(value="Ready")
    self.status_label = ttk.Label(
        self.main_frame, 
        textvariable=self.status_var,
        style='Status.TLabel'
    )
    self.status_label.grid(row=0, column=0, pady=(0, 10), sticky="ew")
    
    # History text area
    self.history_frame = ttk.Frame(self.main_frame)
    self.history_frame.grid(row=1, column=0, sticky="nsew")
    # Configure text area...
    
    # Button to control listening
    self.button_frame = ttk.Frame(self.main_frame)
    self.button_frame.grid(row=2, column=0, pady=(10, 0), sticky="ew")
    self.start_button = ttk.Button(
        self.button_frame,
        text="Start Listening",
        command=self.toggle_listening,
        style='Custom.TButton'
    )
    self.start_button.grid(row=0, column=1)
    
    # Animated GIF
    self.gif_label = tk.Label(self.main_frame)
    self.gif_label.grid(row=3, column=0, pady=10)
    self.gif_path = "gif/3.gif"
    self.gif = Image.open(self.gif_path)
    self.gif_frames = [ImageTk.PhotoImage(img) for img in ImageSequence.Iterator(self.gif)]
    # Animation setup...

Functional Capabilities

Web Search

Serena can search the web by extracting search terms from the command and opening a browser:

def web_search(self, query):
    search_terms = query.replace('search', '').replace('google', '').strip()
    self.say(f"Searching for {search_terms}")
    webbrowser.open(f"https://www.google.com/search?q={search_terms}")

System Control

The assistant can manage system operations like shutdown and restart:

def system_control(self, command):
    if 'shutdown' in command:
        self.say("Preparing to shut down the computer...")
        os.system('shutdown /s /t 60' if platform.system() == "Windows" else 'shutdown -h +1')
    elif 'restart' in command:
        self.say("Preparing to restart the computer...")
        os.system('shutdown /r /t 60' if platform.system() == "Windows" else 'shutdown -r +1')
    elif 'cancel shutdown' in command:
        self.say("Canceling shutdown...")
        os.system('shutdown /a' if platform.system() == "Windows" else 'shutdown -c')

Application Management

Serena can open applications across different operating systems:

def open_application(self, app_name):
    try:
        if platform.system() == "Windows":
            os.startfile(app_name)
        else:
            subprocess.Popen([app_name])
        self.say(f"Opening {app_name}")
    except Exception as e:
        self.say(f"Sorry, I couldn't open {app_name}")

Media Control

The assistant can control media playback using keyboard shortcuts:

def media_control(self, command):
    if 'play' in command or 'pause' in command:
        keyboard.press_and_release('play/pause media')
    elif 'next' in command:
        keyboard.press_and_release('next track')
    elif 'previous' in command:
        keyboard.press_and_release('previous track')

Voice Typing

Serena can listen to what the user says and type it using the keyboard module:

def voice_typing(self):
    self.say("Voice typing mode enabled. Speak your text.")
    try:
        with sr.Microphone() as source:
            audio = self.recognizer.listen(source, timeout=10)
            text = self.recognizer.recognize_google(audio)
            keyboard.write(text)
            self.say("Text typed successfully")
    except Exception as e:
        self.say("Sorry, I couldn't type that.")

Information Retrieval

The assistant can provide information on various topics using OpenAI's GPT model:

def get_information(self, query):
    try:
        response = self.client.chat.completions.create(
            model="gpt-3.5-turbo",
            messages=[
                {"role": "system", "content": "You are Serena, a female personal friendly and interactive AI assistant. Respond to the following user input in a conversational and engaging manner. Provide comprehensive and very short but concise information about the query."},
                {"role": "user", "content": query}
            ]
        )
        answer = response.choices[0].message.content
        self.say(answer)
    except Exception as e:
        self.say("I'm sorry, I couldn't find that information.")

Project Overview

Key Technologies

Project Architecture

1. main.py

2. serena_assistant.py

3. utils.py

Core Features

Detailed Component Analysis

SerenaAssistant Class

Initialization

Voice Recognition System

Command Processing

Text-to-Speech System

Graphical User Interface

Functional Capabilities

Web Search

System Control

Application Management

Media Control

Voice Typing

Information Retrieval

Getting Started

Requirements

Example Voice Commands