edge-tts
is a powerful Python library that utilizes Microsoft Azure’s cloud text-to-voice (TTS) service, supports multiple languages and sound options, and can generate high-quality, naturally-audit voice output. It supports a variety of audio formats, including MP3, WAV, and OGG, and is suitable for applications that convert text to voice locally or on servers. It can be deployed and run with simple API calls, making it ideal for a variety of scenarios such as voice assistants, educational applications, and audio content production.
Installation and Environment Settings
First, make sure you have installededge-tts
Library:
pip install edge-tts
After the installation is complete, you can start developing functions related to voice synthesis.
Text to voice
In this chapter, we will show how to implement a basic function: pass in text and generate speech, save as an audio file. This function uses fixed voice and saves voice as.mp3
document. It will be generated after executionweather.mp3
Audio file, containing the synthetic Chinese voice.
import asyncio import edge_tts def generate_audio(text: str, voice: str, output_file: str) -> None: """ Pass in text, voice and output file names, generate voice and save as audio files :param text: Chinese text that needs to be synthesized :param voice: The type of voice used, such as 'zh-CN-XiaoyiNeural' :param output_file: The output audio file name """ async def generate_audio_async() -> None: """Async-generated voice""" communicate = edge_tts.Communicate(text, voice) await (output_file) # Asynchronous execution to generate audio (generate_audio_async()) # Sample callgenerate_audio("The weather is good today, it's suitable for going out to play.", "zh-CN-XiaoyiNeural", "weather.mp3")
-
generate_audio()
: This is the main function, which receives text, voice and output file name as parameters. - Asynchronous Functions
generate_audio_async()
Realize voice synthesis. -
()
Used to run asynchronous code.
Find the tone
In this chapter, we will show how to find voices that meet certain criteria and print a list of voices that meet criteria to the user without further action. This method lists only the qualifying pronunciations and prints out the name, gender, and language of each vocabulary.
import asyncio import edge_tts from edge_tts import VoicesManager async def print_available_voices(language: str = "zh", gender: str = None) -> None: """ Asynchronously find and print a list of voices that meet certain criteria. :param language: the language of pronunciation, such as "zh-CN" means Chinese :param gender: Optional parameter, select the gender of the voice ("Male" or "Female"), not specified by default """ # Get all available voices asynchronously voices = await () # Filter voice by language filtered_voices = (Language=language) if gender: filtered_voices = [voice for voice in filtered_voices if voice["Gender"] == gender] # Print qualified voice if filtered_voices: print(f"Voice that meets the criteria:") for voice in filtered_voices: print(f"Voice Name: {voice['Name']}, gender: {voice['Gender']}, language: {voice['Language']}") else: print(f"No matching voice was found:language={language}, gender={gender}") # Sample callasync def main(): await print_available_voices(language="zh", gender="Female") # Run an asynchronous exampleif __name__ == "__main__": (main())
-
print_available_voices()
: This function is asynchronous, throughawait
Come to call()
, and get a voice list. Then pass()
Filter pronunciations by language and gender.
Change voice parameters
In addition to choosing different tones,edge-tts
It also allows users to adjust the volume, speech speed, tone and other parameters of the voice during synthesis. passCommunicate
In the classrate
、pitch
andvolume
Parameters, which can dynamically control the generated voice effects.
import edge_tts def generate_audio_with_custom_params(text: str, output_file: str, rate: str = "+0%", pitch: str = "+0Hz", volume: str = "+0%") -> None: """ Generate audio with custom voice parameters :param text: Chinese text that needs to be synthesized :param output_file: The output audio file name :param rate: Speech speed adjustment (default is "+0%", indicating the standard speaking speed) :param pitch: Tone adjustment (default is "+0Hz", indicating a standard tone) :param volume: Volume adjustment (default is "+0%", indicating the standard volume) """ # Select Chinese pronunciation, here is Xiaoyi's Neural pronunciation voice = "zh-CN-XiaoyiNeural" # Create a voice object using edge_tts.Communicate and pass in custom parameters communicate = edge_tts.Communicate(text, voice, rate=rate, pitch=pitch, volume=volume) # Save the generated audio file communicate.save_sync(output_file) print(f"Audio generated,Speed of speech: {rate},tone: {pitch},volume: {volume}。") # Sample callgenerate_audio_with_custom_params( "Welcome to experience custom voice synthesis!", "custom_param_audio.wav", rate="+50%", pitch="+10Hz", volume="-20%" )
-
rate
(Speech speed): Controls the adjustment of speech speed. The default value is"+0%"
, indicating the standard speaking speed. -
pitch
(Tone): Controls the adjustment of the tone, the unit is Hz. The default value is"+0Hz"
, indicating a standard tone. -
volume
(Volume): Controls the adjustment of the volume, the unit is percentage. The default value is"+0%"
, indicating the standard volume.
Generate audio and subtitles
In some application scenarios, you may need to generate both audio and subtitles and choose synchronous or asynchronous methods to process as needed. This chapter shows how to passedge-tts
Implement synchronous and asynchronous generation of audio and subtitle files. After execution, the audio file and the corresponding subtitle file will be generated.
import asyncio import edge_tts def process_audio_and_subtitles_sync(text: str, voice: str, output_file: str, srt_file: str) -> None: """ Generate audio synchronously and generate subtitles in real time :param text: Chinese text that needs to be synthesized :param voice: The type of voice used :param output_file: The output audio file name :param srt_file: The output subtitle file name """ communicate = edge_tts.Communicate(text, voice) submaker = edge_tts.SubMaker() # Synchronize the audio and generate subtitles in real time with open(output_file, "wb") as audio_file: for chunk in communicate.stream_sync(): if chunk["type"] == "audio": audio_file.write(chunk["data"]) # Write audio data elif chunk["type"] == "WordBoundary": (chunk) # Process subtitles # Save subtitle file with open(srt_file, "w", encoding="utf-8") as subtitle_file: subtitle_file.write(submaker.get_srt()) async def process_audio_and_subtitles_async(text: str, voice: str, output_file: str, srt_file: str) -> None: """ Generate audio asynchronously and generate subtitles in real time :param text: Chinese text that needs to be synthesized :param voice: The type of voice used :param output_file: The output audio file name :param srt_file: The output subtitle file name """ # Logic for synchronous version calls asynchronously loop = asyncio.get_event_loop() await loop.run_in_executor(None, process_audio_and_subtitles_sync, text, voice, output_file, srt_file) # Sample callprocess_audio_and_subtitles_sync("Welcome to use Python for voice synthesis!", "zh-CN-XiaoyiNeural", "audio_sync.mp3", "audio_sync.srt") # Asynchronous call(process_audio_and_subtitles_async("This is an example of testing voice and subtitle generation.", "zh-CN-XiaoyiNeural", "audio_async.mp3", "audio_async.srt"))
-
process_audio_and_subtitles_sync
: Synchronously generate audio data and generate subtitles in real time (SRT format).
- use
communicate.stream_sync()
Get the audio data stream and process each "audio" and "word boundary".
-
process_audio_and_subtitles_async
:passasyncio.run_in_executor
Asynchronous call to synchronous versionprocess_audio_and_subtitles_sync
, ensure that asynchronous functions can run efficiently.
Summarize
Through this tutorial, you learned how to use itedge-tts
The library implements text-to-speech conversion. You implement the following functions through different functions:
- Basic text to pronunciation
- Dynamically select voice to generate voice
- Generate audio streams and subtitles
This is the article about the convenient voice synthesis of edge-tts in Python. For more related Python edge-tts, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!