Introduction to TTS
TTS (Text To Speech) is a speech synthesis technology that allows a machine to play back input text as speech to achieve the effect of machine speech.
TTS is divided into speech processing and speech synthesis, where the input text is first recognized by the machine, and then speech synthesis is performed based on the speech library. Now there are a lot of TTS interfaces that can be called, such as Baidu Intelligent Cloud's speech synthesis interface. Microsoft also provides a TTS interface in Windows, which can be called to realize offline TTS speech synthesis.
In this article, we will write a speech synthesis widget using the pyttsx3 library as a demonstration.
pyttsx3 official documentation:
The source code for this article has been uploaded to GitHub:
/XMNHCAS/SpeechSynthesisTool
Install the required packages
Installation of PyQt5 and its GUI design tools
# Install PyQt5 pip install PyQt5 # Install the PyQt5 designer pip install PyQt5Designer
The editor used in this article is VSCode, not PyCharm, there may be differences in the way you use PyQt5, the specific use can be configured according to the actual situation.
Install pyttsx3
pip install pyttsx3
UI interface
You can refer to the following figure to design a simple GUI interface, as this paper is mainly a functional example, so do not consider the interface aesthetics.
The interface should have a text input box to enter the text that will be converted to speech, and a play button to trigger the method of speech playback. The speed, volume and language can be selected on demand.
Using PyQt5's design tools, the following UI (XML) code can be generated based on the above configured GUI interface:
<?xml version="1.0" encoding="UTF-8"?> <ui version="4.0"> <class>Form</class> <widget class="QWidget" name="Form"> <property name="geometry"> <rect> <x>0</x> <y>0</y> <width>313</width> <height>284</height> </rect> </property> <property name="windowTitle"> <string>speech synthesizer</string> </property> <property name="windowIcon"> <iconset> <normaloff></normaloff></iconset> </property> <widget class="QWidget" name="verticalLayoutWidget"> <property name="geometry"> <rect> <x>10</x> <y>10</y> <width>291</width> <height>261</height> </rect> </property> <layout class="QVBoxLayout" name="verticalLayout"> <property name="spacing"> <number>20</number> </property> <item> <layout class="QHBoxLayout" name="horizontalLayout_2"> <item> <widget class="QLabel" name="label"> <property name="text"> <string>Broadcast Text</string> </property> <property name="alignment"> <set>Qt::AlignJustify|Qt::AlignTop</set> </property> </widget> </item> <item> <widget class="QTextEdit" name="tbx_text"/> </item> </layout> </item> <item> <layout class="QHBoxLayout" name="horizontalLayout_4"> <item> <widget class="QLabel" name="label_3"> <property name="text"> <string>speed of speech</string> </property> </widget> </item> <item> <widget class="QSlider" name="slider_rate"> <property name="maximum"> <number>300</number> </property> <property name="orientation"> <enum>Qt::Horizontal</enum> </property> </widget> </item> <item> <widget class="QLabel" name="label_rate"> <property name="minimumSize"> <size> <width>30</width> <height>0</height> </size> </property> <property name="text"> <string>0</string> </property> <property name="alignment"> <set>Qt::AlignCenter</set> </property> </widget> </item> </layout> </item> <item> <layout class="QHBoxLayout" name="horizontalLayout_3"> <item> <widget class="QLabel" name="label_2"> <property name="text"> <string>loudness</string> </property> </widget> </item> <item> <widget class="QSlider" name="slider_volumn"> <property name="maximum"> <number>100</number> </property> <property name="orientation"> <enum>Qt::Horizontal</enum> </property> </widget> </item> <item> <widget class="QLabel" name="label_volumn"> <property name="minimumSize"> <size> <width>30</width> <height>0</height> </size> </property> <property name="text"> <string>0</string> </property> <property name="alignment"> <set>Qt::AlignCenter</set> </property> </widget> </item> </layout> </item> <item> <layout class="QHBoxLayout" name="horizontalLayout"> <item> <widget class="QLabel" name="label_4"> <property name="text"> <string>Select Language</string> </property> </widget> </item> <item> <widget class="QRadioButton" name="rbtn_zh"> <property name="text"> <string>Chinese writing</string> </property> <property name="checked"> <bool>true</bool> </property> </widget> </item> <item> <widget class="QRadioButton" name="rbtn_en"> <property name="text"> <string>English (language)</string> </property> </widget> </item> </layout> </item> <item> <layout class="QHBoxLayout" name="horizontalLayout_5"> <item> <widget class="QLabel" name="label_5"> <property name="minimumSize"> <size> <width>60</width> <height>0</height> </size> </property> <property name="text"> <string/> </property> </widget> </item> <item> <widget class="QPushButton" name="btn_play"> <property name="minimumSize"> <size> <width>0</width> <height>30</height> </size> </property> <property name="text"> <string>playable</string> </property> </widget> </item> </layout> </item> </layout> </widget> </widget> <resources/> <connections/> </ui>
Finally, using PyQt5's interface tools again, you can generate the following form class based on the above UI code:
# -*- coding: utf-8 -*- # Form implementation generated from reading ui file 'd:\Program\VSCode\Python\TTS_PyQT\tts_form.ui' # # Created by: PyQt5 UI code generator 5.15.7 # # WARNING: Any manual changes made to this file will be lost when pyuic5 is # run again. Do not edit this file unless you know what you are doing. from PyQt5 import QtCore, QtGui, QtWidgets class Ui_Form(object): def setupUi(self, Form): ("Form") (313, 284) icon = () ( ("./"), , ) (icon) = (Form) ((10, 10, 291, 261)) ("verticalLayoutWidget") = () (0, 0, 0, 0) (20) ("verticalLayout") self.horizontalLayout_2 = () self.horizontalLayout_2.setObjectName("horizontalLayout_2") = () ( | ) ("label") self.horizontalLayout_2.addWidget() self.tbx_text = () self.tbx_text.setObjectName("tbx_text") self.horizontalLayout_2.addWidget(self.tbx_text) (self.horizontalLayout_2) self.horizontalLayout_4 = () self.horizontalLayout_4.setObjectName("horizontalLayout_4") self.label_3 = () self.label_3.setObjectName("label_3") self.horizontalLayout_4.addWidget(self.label_3) self.slider_rate = () self.slider_rate.setMaximum(300) self.slider_rate.setOrientation() self.slider_rate.setObjectName("slider_rate") self.horizontalLayout_4.addWidget(self.slider_rate) self.label_rate = () self.label_rate.setMinimumSize((30, 0)) self.label_rate.setAlignment() self.label_rate.setObjectName("label_rate") self.horizontalLayout_4.addWidget(self.label_rate) (self.horizontalLayout_4) self.horizontalLayout_3 = () self.horizontalLayout_3.setObjectName("horizontalLayout_3") self.label_2 = () self.label_2.setObjectName("label_2") self.horizontalLayout_3.addWidget(self.label_2) self.slider_volumn = () self.slider_volumn.setMaximum(100) self.slider_volumn.setOrientation() self.slider_volumn.setObjectName("slider_volumn") self.horizontalLayout_3.addWidget(self.slider_volumn) self.label_volumn = () self.label_volumn.setMinimumSize((30, 0)) self.label_volumn.setAlignment() self.label_volumn.setObjectName("label_volumn") self.horizontalLayout_3.addWidget(self.label_volumn) (self.horizontalLayout_3) = () ("horizontalLayout") self.label_4 = () self.label_4.setObjectName("label_4") (self.label_4) self.rbtn_zh = () self.rbtn_zh.setChecked(True) self.rbtn_zh.setObjectName("rbtn_zh") (self.rbtn_zh) self.rbtn_en = () self.rbtn_en.setObjectName("rbtn_en") (self.rbtn_en) () self.horizontalLayout_5 = () self.horizontalLayout_5.setObjectName("horizontalLayout_5") self.label_5 = () self.label_5.setMinimumSize((60, 0)) self.label_5.setText("") self.label_5.setObjectName("label_5") self.horizontalLayout_5.addWidget(self.label_5) self.btn_play = () self.btn_play.setMinimumSize((0, 30)) self.btn_play.setObjectName("btn_play") self.horizontalLayout_5.addWidget(self.btn_play) (self.horizontalLayout_5) (Form) (Form) def retranslateUi(self, Form): _translate = (_translate("Form", "Speech synthesizer.")) (_translate("Form", "Broadcast text")) self.label_3.setText(_translate("Form", "Speed of speech.")) self.label_rate.setText(_translate("Form", "0")) self.label_2.setText(_translate("Form", "Volume.")) self.label_volumn.setText(_translate("Form", "0")) self.label_4.setText(_translate("Form", "Select Language")) self.rbtn_zh.setText(_translate("Form", "Chinese")) self.rbtn_en.setText(_translate("Form", "English")) self.btn_play.setText(_translate("Form", "Play"))
If you copy this code directly, you may have a problem with missing icons. This needs to modify the icon configuration according to the actual situation and add the ico icon file to be used.
function code
speech instrument class
First we need to initialize and get the speech engine object for speech synthesis.
# tts object engine = ()
We can modify the properties of the object for speech synthesis through the setProperty method of that object:
property name | account for |
rate | Integral speech rate expressed in words per minute |
volume | Volume, range [0.0, 1.0] |
voices | String identifiers for speech |
The code for the Voice Tools class is as follows, and the meaning of the code can be found in the comments:
import pyttsx3 class VoiceEngine(): ''' tts voice tools category ''' def __init__(self): ''' Initialization ''' # tts object self.__engine = () # Speed of speech self.__rate = 150 # Volume self.__volume = 100 # Voice ID, 0 for Chinese, 1 for English self.__voice = 0 @property def Rate(self): ''' The speed of speech attribute ''' return self.__rate @ def Rate(self, value): self.__rate = value @property def Volume(self): ''' Volume Properties ''' return self.__volume @ def Volume(self, value): self.__volume = value @property def VoiceID(self): ''' Voice ID: 0 -- Chinese; 1 -- English ''' return self.__voice @ def VoiceID(self, value): self.__voice = value def Say(self, text): ''' Playing Voice ''' self.__engine.setProperty('rate', self.__rate) self.__engine.setProperty('volume', self.__volume) # Get a list of available voices and set the voice voices = self.__engine.getProperty('voices') self.__engine.setProperty('voice', voices[self.__voice].id) # Save voice files # self.__engine.save_to_file(text, 'test.mp3') self.__engine.say(text) self.__engine.runAndWait() self.__engine.stop()
form factor
We can create a form class that inherits from the PyQt5 class we just created and register callback functions for the form's drag-and-drop and click events, as well as create an instance of the voice tool class to implement the voice actions that need to be performed when the specified events are triggered.
import sys import _thread as th from import QMainWindow, QApplication from Ui_tts_form import Ui_Form class MainWindow(QMainWindow, Ui_Form): ''' Forms Classes ''' def __init__(self, parent=None): ''' Initializing Forms ''' super(MainWindow, self).__init__(parent) (self) # Get instances of the tts tool class = VoiceEngine() self.__isPlaying = False # Set initial text self.tbx_text.setText('The moonlight in front of the bed is suspected to be like frost on the ground. \n Raise your head to look at the moon, lower your head to think of your hometown.') # Progress bar data bound to a label for display self.slider_rate.() self.slider_volumn.() # Set the initial value of the progress bar self.slider_rate.setValue() self.slider_volumn.setValue() # RadioButton selection event self.rbtn_zh.(self.onSelectVoice_zh) self.rbtn_en.(self.onSelectVoice_en) # Play button click event self.btn_play.() def setRateTextValue(self): ''' Modify the speed of speech label text value ''' value = self.slider_rate.value() self.label_rate.setText(str(value)) = value def setVolumnTextValue(self): ''' Modify volume label text value ''' value = self.slider_volumn.value() self.label_volumn.setText(str(value / 100)) = value def onSelectVoice_zh(self): ''' Modify voice configuration and default playback text for Chinese ''' self.tbx_text.setText('The moonlight in front of the bed is suspected to be like frost on the ground. \n Raise your head to look at the moon, lower your head to think of your hometown.') = 0 def onSelectVoice_en(self): ''' Modifying the English voice configuration and default playback text ''' self.tbx_text.setText('Hello World') = 1 def playVoice(self): ''' Play ''' if self.__isPlaying is not True: self.__isPlaying = True text = self.tbx_text.toPlainText() (text) self.__isPlaying = False def onPlayButtonClick(self): ''' Play button click event Opens a new thread to play the voice, to avoid the form to be pseudo stuck because of the voice playback ''' th.start_new_thread(, ())
Full Code
import sys import _thread as th from import QMainWindow, QApplication from Ui_tts_form import Ui_Form import pyttsx3 class VoiceEngine(): ''' tts voice tools category ''' def __init__(self): ''' Initialization ''' # tts object self.__engine = () # Speed of speech self.__rate = 150 # Volume self.__volume = 100 # Voice ID, 0 for Chinese, 1 for English self.__voice = 0 @property def Rate(self): ''' The speed of speech attribute ''' return self.__rate @ def Rate(self, value): self.__rate = value @property def Volume(self): ''' Volume Properties ''' return self.__volume @ def Volume(self, value): self.__volume = value @property def VoiceID(self): ''' Voice ID: 0 -- Chinese; 1 -- English ''' return self.__voice @ def VoiceID(self, value): self.__voice = value def Say(self, text): ''' Playing Voice ''' self.__engine.setProperty('rate', self.__rate) self.__engine.setProperty('volume', self.__volume) voices = self.__engine.getProperty('voices') self.__engine.setProperty('voice', voices[self.__voice]) # Save voice files # self.__engine.save_to_file(text, 'test.mp3') self.__engine.say(text) self.__engine.runAndWait() self.__engine.stop() class MainWindow(QMainWindow, Ui_Form): ''' Forms Classes ''' def __init__(self, parent=None): ''' Initializing Forms ''' super(MainWindow, self).__init__(parent) (self) # Get instances of the tts tool class = VoiceEngine() self.__isPlaying = False # Set initial text self.tbx_text.setText('The moonlight in front of the bed is suspected to be like frost on the ground. \n Raise your head to look at the moon, lower your head to think of your hometown.') # Progress bar data bound to a label for display self.slider_rate.() self.slider_volumn.() # Set the initial value of the progress bar self.slider_rate.setValue() self.slider_volumn.setValue() # RadioButton selection event self.rbtn_zh.(self.onSelectVoice_zh) self.rbtn_en.(self.onSelectVoice_en) # Play button click event self.btn_play.() def setRateTextValue(self): ''' Modify the speed of speech label text value ''' value = self.slider_rate.value() self.label_rate.setText(str(value)) = value def setVolumnTextValue(self): ''' Modify volume label text value ''' value = self.slider_volumn.value() self.label_volumn.setText(str(value / 100)) = value def onSelectVoice_zh(self): ''' Modify voice configuration and default playback text for Chinese ''' self.tbx_text.setText('The moonlight in front of the bed is suspected to be like frost on the ground. \n Raise your head to look at the moon, lower your head to think of your hometown.') = 0 def onSelectVoice_en(self): ''' Modifying the English voice configuration and default playback text ''' self.tbx_text.setText('Hello World') = 1 def playVoice(self): ''' Play ''' if self.__isPlaying is not True: self.__isPlaying = True text = self.tbx_text.toPlainText() (text) self.__isPlaying = False def onPlayButtonClick(self): ''' Modify the speed of speech label text value ''' th.start_new_thread(, ()) if __name__ == "__main__": ''' Main Functions ''' app = QApplication() form = MainWindow() () (app.exec_())
This article on the Python speech synthesis project (PyQt5+pyttsx3) is introduced to this article, more related to Python speech synthesis content, please search for my previous posts or continue to browse the following related articles I hope you will support me in the future!