A personal voice-activated assistant with customizable voice options and multiple capabilities
Serena is a desktop-based AI assistant with voice recognition capabilities that can perform various tasks through voice commands. The application uses a combination of speech recognition, text-to-speech, and natural language processing to understand and respond to user requests. Serena can perform web searches, control system functions, open applications, control media playback, and provide information on a variety of topics.
The project consists of three main files:
This is the entry point of the application that initializes the Tkinter window and starts the Serena Assistant.
import tkinter as tk from serena_assistant import SerenaAssistant def main(): root = tk.Tk() app = SerenaAssistant(root) root.mainloop() if __name__ == "__main__": main()
This is the main class that handles the assistant's functionality, including the GUI, voice recognition, command processing, and response generation.
Contains utility classes, particularly the TextToSpeech class that handles different text-to-speech options (Google TTS and local pyttsx3).
Serena listens for voice commands through the device's microphone and converts speech to text using Google's speech recognition service.
The assistant can speak in both English (Indian accent) and Hindi, allowing users to switch between languages with simple voice commands.
Users can choose between online (Google TTS) and local (pyttsx3) voice engines for the assistant's responses.
Serena can perform web searches by opening the default browser with relevant search queries.
The assistant can control system functions like shutdown and restart operations.
Serena can open applications on request, making it easier to access frequently used programs.
Control media playback with voice commands for play, pause, next track, and previous track.
The assistant can listen and type what the user says, acting as a voice-to-text tool.
Leveraging OpenAI's GPT-3.5 model, Serena can provide information on a wide range of topics.
The SerenaAssistant class is the heart of the application, coordinating all components and features:
The initialization process sets up all necessary components:
def __init__(self, root): self.root = root self.setup_window() self.setup_voice() self.setup_recognizer() self.setup_openai() self.setup_personality() self.create_gui()
Each setup method handles a specific aspect of the assistant's functionality:
setup_window()
: Configures the application window and stylessetup_voice()
: Initializes the text-to-speech enginesetup_recognizer()
: Configures the speech recognition componentsetup_openai()
: Sets up the OpenAI API client for natural language processingsetup_personality()
: Defines the assistant's responses for various situationscreate_gui()
: Constructs the graphical user interfaceVoice recognition is handled by a dedicated listening thread that captures audio input, processes it, and converts it to text:
def listen_loop(self): while self.listening: try: with sr.Microphone() as source: self.recognizer.adjust_for_ambient_noise(source, duration=0.5) self.log_message("Listening...") try: audio = self.recognizer.listen(source, timeout=5, phrase_time_limit=5) command = self.recognizer.recognize_google(audio).lower() if command: self.log_message(f"You: {command}") self.process_command(command) except sr.WaitTimeoutError: continue except sr.UnknownValueError: self.log_message(random.choice(self.error_responses)) except Exception as e: self.log_message(f"Error: {str(e)}") except Exception as e: self.log_message(f"Listening error: {str(e)}") time.sleep(1)
The command processing system uses OpenAI's GPT-3.5 model to understand user intents and categorize them:
def process_command(self, command): try: # Handle language switching commands directly if any(phrase in command.lower() for phrase in ["switch to hindi", "speak in hindi", "use hindi"]): self.tts.switch_to_hindi() self.say("अब मैं हिंदी में बोलूंगी") return # Other command handling... # Use OpenAI to classify the command response = self.client.chat.completions.create( model="gpt-3.5-turbo", messages=[ {"role": "system", "content": "You are Serena, a helpful female voice assistant. Analyze the command and respond with a JSON object containing: category (web_search/system_control/application/media_control/voice_typing/information), action, and parameters."}, {"role": "user", "content": command} ] ) result = json.loads(response.choices[0].message.content) category = result.get('category') parameters = result.get('parameters', {}) # Execute the appropriate action based on category if category == 'web_search': self.web_search(command) elif category == 'system_control': self.system_control(command) # Other categories... except Exception as e: self.log_message(f"Command processing error: {str(e)}") self.say("I encountered an error processing that command. Please try again.")
The TextToSpeech class in utils.py handles voice output with support for multiple engines and languages:
class TextToSpeech: def __init__(self, rate=150, volume=1.0, use_indian_english=True): self.use_gtts = True self.use_indian_english = use_indian_english self.engine = pyttsx3.init() self.engine.setProperty('rate', rate) self.engine.setProperty('volume', volume) voices = self.engine.getProperty('voices') if len(voices) > 1: self.engine.setProperty('voice', voices[1].id) self.speak_lock = threading.Lock() language = "Indian English" if use_indian_english else "Hindi" print(f"Text-to-speech initialized using {language}")
This class supports:
The GUI is built using Tkinter and features:
def create_gui(self): self.main_frame = ttk.Frame(self.root, style='Custom.TFrame', padding="20") self.main_frame.grid(row=0, column=0, sticky="nsew") # Configure grid weights... # Status label self.status_var = tk.StringVar(value="Ready") self.status_label = ttk.Label( self.main_frame, textvariable=self.status_var, style='Status.TLabel' ) self.status_label.grid(row=0, column=0, pady=(0, 10), sticky="ew") # History text area self.history_frame = ttk.Frame(self.main_frame) self.history_frame.grid(row=1, column=0, sticky="nsew") # Configure text area... # Button to control listening self.button_frame = ttk.Frame(self.main_frame) self.button_frame.grid(row=2, column=0, pady=(10, 0), sticky="ew") self.start_button = ttk.Button( self.button_frame, text="Start Listening", command=self.toggle_listening, style='Custom.TButton' ) self.start_button.grid(row=0, column=1) # Animated GIF self.gif_label = tk.Label(self.main_frame) self.gif_label.grid(row=3, column=0, pady=10) self.gif_path = "gif/3.gif" self.gif = Image.open(self.gif_path) self.gif_frames = [ImageTk.PhotoImage(img) for img in ImageSequence.Iterator(self.gif)] # Animation setup...
Serena can search the web by extracting search terms from the command and opening a browser:
def web_search(self, query): search_terms = query.replace('search', '').replace('google', '').strip() self.say(f"Searching for {search_terms}") webbrowser.open(f"https://www.google.com/search?q={search_terms}")
The assistant can manage system operations like shutdown and restart:
def system_control(self, command): if 'shutdown' in command: self.say("Preparing to shut down the computer...") os.system('shutdown /s /t 60' if platform.system() == "Windows" else 'shutdown -h +1') elif 'restart' in command: self.say("Preparing to restart the computer...") os.system('shutdown /r /t 60' if platform.system() == "Windows" else 'shutdown -r +1') elif 'cancel shutdown' in command: self.say("Canceling shutdown...") os.system('shutdown /a' if platform.system() == "Windows" else 'shutdown -c')
Serena can open applications across different operating systems:
def open_application(self, app_name): try: if platform.system() == "Windows": os.startfile(app_name) else: subprocess.Popen([app_name]) self.say(f"Opening {app_name}") except Exception as e: self.say(f"Sorry, I couldn't open {app_name}")
The assistant can control media playback using keyboard shortcuts:
def media_control(self, command): if 'play' in command or 'pause' in command: keyboard.press_and_release('play/pause media') elif 'next' in command: keyboard.press_and_release('next track') elif 'previous' in command: keyboard.press_and_release('previous track')
Serena can listen to what the user says and type it using the keyboard module:
def voice_typing(self): self.say("Voice typing mode enabled. Speak your text.") try: with sr.Microphone() as source: audio = self.recognizer.listen(source, timeout=10) text = self.recognizer.recognize_google(audio) keyboard.write(text) self.say("Text typed successfully") except Exception as e: self.say("Sorry, I couldn't type that.")
The assistant can provide information on various topics using OpenAI's GPT model:
def get_information(self, query): try: response = self.client.chat.completions.create( model="gpt-3.5-turbo", messages=[ {"role": "system", "content": "You are Serena, a female personal friendly and interactive AI assistant. Respond to the following user input in a conversational and engaging manner. Provide comprehensive and very short but concise information about the query."}, {"role": "user", "content": query} ] ) answer = response.choices[0].message.content self.say(answer) except Exception as e: self.say("I'm sorry, I couldn't find that information.")