Python speech synthesis project (PyQt5+pyttsx3)

Introduction to TTS

TTS (Text To Speech) is a speech synthesis technology that allows a machine to play back input text as speech to achieve the effect of machine speech.

TTS is divided into speech processing and speech synthesis, where the input text is first recognized by the machine, and then speech synthesis is performed based on the speech library. Now there are a lot of TTS interfaces that can be called, such as Baidu Intelligent Cloud's speech synthesis interface. Microsoft also provides a TTS interface in Windows, which can be called to realize offline TTS speech synthesis.

In this article, we will write a speech synthesis widget using the pyttsx3 library as a demonstration.

pyttsx3 official documentation:

The source code for this article has been uploaded to GitHub:

/XMNHCAS/SpeechSynthesisTool

Install the required packages

Installation of PyQt5 and its GUI design tools

# Install PyQt5
pip install PyQt5
 
# Install the PyQt5 designer
pip install PyQt5Designer

The editor used in this article is VSCode, not PyCharm, there may be differences in the way you use PyQt5, the specific use can be configured according to the actual situation.

Install pyttsx3

pip install pyttsx3

UI interface

You can refer to the following figure to design a simple GUI interface, as this paper is mainly a functional example, so do not consider the interface aesthetics.

The interface should have a text input box to enter the text that will be converted to speech, and a play button to trigger the method of speech playback. The speed, volume and language can be selected on demand.

Using PyQt5's design tools, the following UI (XML) code can be generated based on the above configured GUI interface:

<?xml version="1.0" encoding="UTF-8"?>
<ui version="4.0">
 <class>Form</class>
 <widget class="QWidget" name="Form">
  <property name="geometry">
   <rect>
    <x>0</x>
    <y>0</y>
    <width>313</width>
    <height>284</height>
   </rect>
  </property>
  <property name="windowTitle">
   <string>speech synthesizer</string>
  </property>
  <property name="windowIcon">
   <iconset>
    <normaloff></normaloff></iconset>
  </property>
  <widget class="QWidget" name="verticalLayoutWidget">
   <property name="geometry">
    <rect>
     <x>10</x>
     <y>10</y>
     <width>291</width>
     <height>261</height>
    </rect>
   </property>
   <layout class="QVBoxLayout" name="verticalLayout">
    <property name="spacing">
     <number>20</number>
    </property>
    <item>
     <layout class="QHBoxLayout" name="horizontalLayout_2">
      <item>
       <widget class="QLabel" name="label">
        <property name="text">
         <string>Broadcast Text</string>
        </property>
        <property name="alignment">
         <set>Qt::AlignJustify|Qt::AlignTop</set>
        </property>
       </widget>
      </item>
      <item>
       <widget class="QTextEdit" name="tbx_text"/>
      </item>
     </layout>
    </item>
    <item>
     <layout class="QHBoxLayout" name="horizontalLayout_4">
      <item>
       <widget class="QLabel" name="label_3">
        <property name="text">
         <string>speed of speech</string>
        </property>
       </widget>
      </item>
      <item>
       <widget class="QSlider" name="slider_rate">
        <property name="maximum">
         <number>300</number>
        </property>
        <property name="orientation">
         <enum>Qt::Horizontal</enum>
        </property>
       </widget>
      </item>
      <item>
       <widget class="QLabel" name="label_rate">
        <property name="minimumSize">
         <size>
          <width>30</width>
          <height>0</height>
         </size>
        </property>
        <property name="text">
         <string>0</string>
        </property>
        <property name="alignment">
         <set>Qt::AlignCenter</set>
        </property>
       </widget>
      </item>
     </layout>
    </item>
    <item>
     <layout class="QHBoxLayout" name="horizontalLayout_3">
      <item>
       <widget class="QLabel" name="label_2">
        <property name="text">
         <string>loudness</string>
        </property>
       </widget>
      </item>
      <item>
       <widget class="QSlider" name="slider_volumn">
        <property name="maximum">
         <number>100</number>
        </property>
        <property name="orientation">
         <enum>Qt::Horizontal</enum>
        </property>
       </widget>
      </item>
      <item>
       <widget class="QLabel" name="label_volumn">
        <property name="minimumSize">
         <size>
          <width>30</width>
          <height>0</height>
         </size>
        </property>
        <property name="text">
         <string>0</string>
        </property>
        <property name="alignment">
         <set>Qt::AlignCenter</set>
        </property>
       </widget>
      </item>
     </layout>
    </item>
    <item>
     <layout class="QHBoxLayout" name="horizontalLayout">
      <item>
       <widget class="QLabel" name="label_4">
        <property name="text">
         <string>Select Language</string>
        </property>
       </widget>
      </item>
      <item>
       <widget class="QRadioButton" name="rbtn_zh">
        <property name="text">
         <string>Chinese writing</string>
        </property>
        <property name="checked">
         <bool>true</bool>
        </property>
       </widget>
      </item>
      <item>
       <widget class="QRadioButton" name="rbtn_en">
        <property name="text">
         <string>English (language)</string>
        </property>
       </widget>
      </item>
     </layout>
    </item>
    <item>
     <layout class="QHBoxLayout" name="horizontalLayout_5">
      <item>
       <widget class="QLabel" name="label_5">
        <property name="minimumSize">
         <size>
          <width>60</width>
          <height>0</height>
         </size>
        </property>
        <property name="text">
         <string/>
        </property>
       </widget>
      </item>
      <item>
       <widget class="QPushButton" name="btn_play">
        <property name="minimumSize">
         <size>
          <width>0</width>
          <height>30</height>
         </size>
        </property>
        <property name="text">
         <string>playable</string>
        </property>
       </widget>
      </item>
     </layout>
    </item>
   </layout>
  </widget>
 </widget>
 <resources/>
 <connections/>
</ui>

Finally, using PyQt5's interface tools again, you can generate the following form class based on the above UI code:

# -*- coding: utf-8 -*-
 
# Form implementation generated from reading ui file 'd:\Program\VSCode\Python\TTS_PyQT\tts_form.ui'
#
# Created by: PyQt5 UI code generator 5.15.7
#
# WARNING: Any manual changes made to this file will be lost when pyuic5 is
# run again.  Do not edit this file unless you know what you are doing.
 
from PyQt5 import QtCore, QtGui, QtWidgets
 
 
class Ui_Form(object):
 
    def setupUi(self, Form):
        ("Form")
        (313, 284)
        icon = ()
        (
            ("./"),
            , )
        (icon)
         = (Form)
        ((10, 10, 291, 261))
        ("verticalLayoutWidget")
         = ()
        (0, 0, 0, 0)
        (20)
        ("verticalLayout")
        self.horizontalLayout_2 = ()
        self.horizontalLayout_2.setObjectName("horizontalLayout_2")
         = ()
        ( | )
        ("label")
        self.horizontalLayout_2.addWidget()
        self.tbx_text = ()
        self.tbx_text.setObjectName("tbx_text")
        self.horizontalLayout_2.addWidget(self.tbx_text)
        (self.horizontalLayout_2)
        self.horizontalLayout_4 = ()
        self.horizontalLayout_4.setObjectName("horizontalLayout_4")
        self.label_3 = ()
        self.label_3.setObjectName("label_3")
        self.horizontalLayout_4.addWidget(self.label_3)
        self.slider_rate = ()
        self.slider_rate.setMaximum(300)
        self.slider_rate.setOrientation()
        self.slider_rate.setObjectName("slider_rate")
        self.horizontalLayout_4.addWidget(self.slider_rate)
        self.label_rate = ()
        self.label_rate.setMinimumSize((30, 0))
        self.label_rate.setAlignment()
        self.label_rate.setObjectName("label_rate")
        self.horizontalLayout_4.addWidget(self.label_rate)
        (self.horizontalLayout_4)
        self.horizontalLayout_3 = ()
        self.horizontalLayout_3.setObjectName("horizontalLayout_3")
        self.label_2 = ()
        self.label_2.setObjectName("label_2")
        self.horizontalLayout_3.addWidget(self.label_2)
        self.slider_volumn = ()
        self.slider_volumn.setMaximum(100)
        self.slider_volumn.setOrientation()
        self.slider_volumn.setObjectName("slider_volumn")
        self.horizontalLayout_3.addWidget(self.slider_volumn)
        self.label_volumn = ()
        self.label_volumn.setMinimumSize((30, 0))
        self.label_volumn.setAlignment()
        self.label_volumn.setObjectName("label_volumn")
        self.horizontalLayout_3.addWidget(self.label_volumn)
        (self.horizontalLayout_3)
         = ()
        ("horizontalLayout")
        self.label_4 = ()
        self.label_4.setObjectName("label_4")
        (self.label_4)
        self.rbtn_zh = ()
        self.rbtn_zh.setChecked(True)
        self.rbtn_zh.setObjectName("rbtn_zh")
        (self.rbtn_zh)
        self.rbtn_en = ()
        self.rbtn_en.setObjectName("rbtn_en")
        (self.rbtn_en)
        ()
        self.horizontalLayout_5 = ()
        self.horizontalLayout_5.setObjectName("horizontalLayout_5")
        self.label_5 = ()
        self.label_5.setMinimumSize((60, 0))
        self.label_5.setText("")
        self.label_5.setObjectName("label_5")
        self.horizontalLayout_5.addWidget(self.label_5)
        self.btn_play = ()
        self.btn_play.setMinimumSize((0, 30))
        self.btn_play.setObjectName("btn_play")
        self.horizontalLayout_5.addWidget(self.btn_play)
        (self.horizontalLayout_5)
 
        (Form)
        (Form)
 
    def retranslateUi(self, Form):
        _translate = 
        (_translate("Form", "Speech synthesizer."))
        (_translate("Form", "Broadcast text"))
        self.label_3.setText(_translate("Form", "Speed of speech."))
        self.label_rate.setText(_translate("Form", "0"))
        self.label_2.setText(_translate("Form", "Volume."))
        self.label_volumn.setText(_translate("Form", "0"))
        self.label_4.setText(_translate("Form", "Select Language"))
        self.rbtn_zh.setText(_translate("Form", "Chinese"))
        self.rbtn_en.setText(_translate("Form", "English"))
        self.btn_play.setText(_translate("Form", "Play"))

If you copy this code directly, you may have a problem with missing icons. This needs to modify the icon configuration according to the actual situation and add the ico icon file to be used.

function code

speech instrument class

First we need to initialize and get the speech engine object for speech synthesis.

# tts object
engine = ()

We can modify the properties of the object for speech synthesis through the setProperty method of that object:

property name	account for
rate	Integral speech rate expressed in words per minute
volume	Volume, range [0.0, 1.0]
voices	String identifiers for speech

The code for the Voice Tools class is as follows, and the meaning of the code can be found in the comments:

import pyttsx3
 
 
class VoiceEngine():
    '''
    tts voice tools category
    '''
 
    def __init__(self):
        '''
        Initialization
        '''
        # tts object
        self.__engine = ()
        # Speed of speech
        self.__rate = 150
        # Volume
        self.__volume = 100
        # Voice ID, 0 for Chinese, 1 for English
        self.__voice = 0
 
    @property
    def Rate(self):
        '''
        The speed of speech attribute
        '''
        return self.__rate
 
    @
    def Rate(self, value):
        self.__rate = value
 
    @property
    def Volume(self):
        '''
        Volume Properties
        '''
        return self.__volume
 
    @
    def Volume(self, value):
        self.__volume = value
 
    @property
    def VoiceID(self):
        '''
        Voice ID: 0 -- Chinese; 1 -- English
        '''
 
        return self.__voice
 
    @
    def VoiceID(self, value):
        self.__voice = value
 
    def Say(self, text):
        '''
        Playing Voice
        '''
        self.__engine.setProperty('rate', self.__rate)
        self.__engine.setProperty('volume', self.__volume)
 
        # Get a list of available voices and set the voice
        voices = self.__engine.getProperty('voices')
        self.__engine.setProperty('voice', voices[self.__voice].id)
 
        # Save voice files
        # self.__engine.save_to_file(text, 'test.mp3')
 
        self.__engine.say(text)
        self.__engine.runAndWait()
        self.__engine.stop()

form factor

We can create a form class that inherits from the PyQt5 class we just created and register callback functions for the form's drag-and-drop and click events, as well as create an instance of the voice tool class to implement the voice actions that need to be performed when the specified events are triggered.

import sys
import _thread as th
from  import QMainWindow, QApplication
from Ui_tts_form import Ui_Form
 
class MainWindow(QMainWindow, Ui_Form):
    '''
    Forms Classes
    '''
 
    def __init__(self, parent=None):
        '''
        Initializing Forms
        '''
        super(MainWindow, self).__init__(parent)
        (self)
 
        # Get instances of the tts tool class
         = VoiceEngine()
        self.__isPlaying = False
 
        # Set initial text
        self.tbx_text.setText('The moonlight in front of the bed is suspected to be like frost on the ground. \n Raise your head to look at the moon, lower your head to think of your hometown.')
 
        # Progress bar data bound to a label for display
        self.slider_rate.()
        self.slider_volumn.()
 
        # Set the initial value of the progress bar
        self.slider_rate.setValue()
        self.slider_volumn.setValue()
 
        # RadioButton selection event
        self.rbtn_zh.(self.onSelectVoice_zh)
        self.rbtn_en.(self.onSelectVoice_en)
 
        # Play button click event
        self.btn_play.()
 
    def setRateTextValue(self):
        '''
        Modify the speed of speech label text value
        '''
        value = self.slider_rate.value()
        self.label_rate.setText(str(value))
         = value
 
    def setVolumnTextValue(self):
        '''
        Modify volume label text value
        '''
        value = self.slider_volumn.value()
        self.label_volumn.setText(str(value / 100))
         = value
 
    def onSelectVoice_zh(self):
        '''
        Modify voice configuration and default playback text for Chinese
        '''
        self.tbx_text.setText('The moonlight in front of the bed is suspected to be like frost on the ground. \n Raise your head to look at the moon, lower your head to think of your hometown.')
         = 0
 
    def onSelectVoice_en(self):
        '''
        Modifying the English voice configuration and default playback text
        '''
        self.tbx_text.setText('Hello World')
         = 1
 
    def playVoice(self):
        '''
        Play
        '''
 
        if self.__isPlaying is not True:
            self.__isPlaying = True
            text = self.tbx_text.toPlainText()
            (text)
            self.__isPlaying = False
 
    def onPlayButtonClick(self):
        '''
        Play button click event
        Opens a new thread to play the voice, to avoid the form to be pseudo stuck because of the voice playback
        '''
        th.start_new_thread(, ())

Full Code

import sys
import _thread as th
from  import QMainWindow, QApplication
from Ui_tts_form import Ui_Form
import pyttsx3
 
 
class VoiceEngine():
    '''
    tts voice tools category
    '''
 
    def __init__(self):
        '''
        Initialization
        '''
        # tts object
        self.__engine = ()
        # Speed of speech
        self.__rate = 150
        # Volume
        self.__volume = 100
        # Voice ID, 0 for Chinese, 1 for English
        self.__voice = 0
 
    @property
    def Rate(self):
        '''
        The speed of speech attribute
        '''
        return self.__rate
 
    @
    def Rate(self, value):
        self.__rate = value
 
    @property
    def Volume(self):
        '''
        Volume Properties
        '''
        return self.__volume
 
    @
    def Volume(self, value):
        self.__volume = value
 
    @property
    def VoiceID(self):
        '''
        Voice ID: 0 -- Chinese; 1 -- English
        '''
 
        return self.__voice
 
    @
    def VoiceID(self, value):
        self.__voice = value
 
    def Say(self, text):
        '''
        Playing Voice
        '''
        self.__engine.setProperty('rate', self.__rate)
        self.__engine.setProperty('volume', self.__volume)
        voices = self.__engine.getProperty('voices')
        self.__engine.setProperty('voice', voices[self.__voice])
 
        # Save voice files
        # self.__engine.save_to_file(text, 'test.mp3')
 
        self.__engine.say(text)
        self.__engine.runAndWait()
        self.__engine.stop()
 
 
class MainWindow(QMainWindow, Ui_Form):
    '''
    Forms Classes
    '''
 
    def __init__(self, parent=None):
        '''
        Initializing Forms
        '''
        super(MainWindow, self).__init__(parent)
        (self)
 
        # Get instances of the tts tool class
         = VoiceEngine()
        self.__isPlaying = False
 
        # Set initial text
        self.tbx_text.setText('The moonlight in front of the bed is suspected to be like frost on the ground. \n Raise your head to look at the moon, lower your head to think of your hometown.')
 
        # Progress bar data bound to a label for display
        self.slider_rate.()
        self.slider_volumn.()
 
        # Set the initial value of the progress bar
        self.slider_rate.setValue()
        self.slider_volumn.setValue()
 
        # RadioButton selection event
        self.rbtn_zh.(self.onSelectVoice_zh)
        self.rbtn_en.(self.onSelectVoice_en)
 
        # Play button click event
        self.btn_play.()
 
    def setRateTextValue(self):
        '''
        Modify the speed of speech label text value
        '''
        value = self.slider_rate.value()
        self.label_rate.setText(str(value))
         = value
 
    def setVolumnTextValue(self):
        '''
        Modify volume label text value
        '''
        value = self.slider_volumn.value()
        self.label_volumn.setText(str(value / 100))
         = value
 
    def onSelectVoice_zh(self):
        '''
        Modify voice configuration and default playback text for Chinese
        '''
        self.tbx_text.setText('The moonlight in front of the bed is suspected to be like frost on the ground. \n Raise your head to look at the moon, lower your head to think of your hometown.')
         = 0
 
    def onSelectVoice_en(self):
        '''
        Modifying the English voice configuration and default playback text
        '''
        self.tbx_text.setText('Hello World')
         = 1
 
    def playVoice(self):
        '''
        Play
        '''
 
        if self.__isPlaying is not True:
            self.__isPlaying = True
            text = self.tbx_text.toPlainText()
            (text)
            self.__isPlaying = False
 
    def onPlayButtonClick(self):
        '''
        Modify the speed of speech label text value
        '''
        th.start_new_thread(, ())
 
 
if __name__ == "__main__":
    '''
    Main Functions
    '''
    app = QApplication()
    form = MainWindow()
    ()
    (app.exec_())

This article on the Python speech synthesis project (PyQt5+pyttsx3) is introduced to this article, more related to Python speech synthesis content, please search for my previous posts or continue to browse the following related articles I hope you will support me in the future!