SoFunction
Updated on 2025-05-06

Edge-tts in Python implements convenient speech synthesis

edge-ttsis a powerful Python library that utilizes Microsoft Azure’s cloud text-to-voice (TTS) service, supports multiple languages ​​and sound options, and can generate high-quality, naturally-audit voice output. It supports a variety of audio formats, including MP3, WAV, and OGG, and is suitable for applications that convert text to voice locally or on servers. It can be deployed and run with simple API calls, making it ideal for a variety of scenarios such as voice assistants, educational applications, and audio content production.

Installation and Environment Settings

First, make sure you have installededge-ttsLibrary:

pip install edge-tts

After the installation is complete, you can start developing functions related to voice synthesis.

Text to voice

In this chapter, we will show how to implement a basic function: pass in text and generate speech, save as an audio file. This function uses fixed voice and saves voice as.mp3document. It will be generated after executionweather.mp3Audio file, containing the synthetic Chinese voice.

import asyncio
import edge_tts

def generate_audio(text: str, voice: str, output_file: str) -> None:
    """
     Pass in text, voice and output file names, generate voice and save as audio files
     :param text: Chinese text that needs to be synthesized
     :param voice: The type of voice used, such as 'zh-CN-XiaoyiNeural'
     :param output_file: The output audio file name
     """
    async def generate_audio_async() -> None:
        """Async-generated voice"""
        communicate = edge_tts.Communicate(text, voice)
        await (output_file)

    # Asynchronous execution to generate audio    (generate_audio_async())

# Sample callgenerate_audio("The weather is good today, it's suitable for going out to play.", "zh-CN-XiaoyiNeural", "weather.mp3")
  • generate_audio(): This is the main function, which receives text, voice and output file name as parameters.
  • Asynchronous Functionsgenerate_audio_async()Realize voice synthesis.
  • ()Used to run asynchronous code.

Find the tone

In this chapter, we will show how to find voices that meet certain criteria and print a list of voices that meet criteria to the user without further action. This method lists only the qualifying pronunciations and prints out the name, gender, and language of each vocabulary.

import asyncio
import edge_tts
from edge_tts import VoicesManager

async def print_available_voices(language: str = "zh", gender: str = None) -> None:
    """
     Asynchronously find and print a list of voices that meet certain criteria.
     :param language: the language of pronunciation, such as "zh-CN" means Chinese
     :param gender: Optional parameter, select the gender of the voice ("Male" or "Female"), not specified by default
     """
    # Get all available voices asynchronously    voices = await ()

    # Filter voice by language    filtered_voices = (Language=language)
    if gender:
        filtered_voices = [voice for voice in filtered_voices if voice["Gender"] == gender]
    
    # Print qualified voice    if filtered_voices:
        print(f"Voice that meets the criteria:")
        for voice in filtered_voices:
            print(f"Voice Name: {voice['Name']}, gender: {voice['Gender']}, language: {voice['Language']}")
    else:
        print(f"No matching voice was found:language={language}, gender={gender}")

# Sample callasync def main():
    await print_available_voices(language="zh", gender="Female")

# Run an asynchronous exampleif __name__ == "__main__":
    (main())
  • print_available_voices(): This function is asynchronous, throughawaitCome to call(), and get a voice list. Then pass()Filter pronunciations by language and gender.

Change voice parameters

In addition to choosing different tones,edge-ttsIt also allows users to adjust the volume, speech speed, tone and other parameters of the voice during synthesis. passCommunicateIn the classratepitchandvolumeParameters, which can dynamically control the generated voice effects.

import edge_tts

def generate_audio_with_custom_params(text: str, output_file: str, rate: str = "+0%", pitch: str = "+0Hz", volume: str = "+0%") -> None:
    """
     Generate audio with custom voice parameters
     :param text: Chinese text that needs to be synthesized
     :param output_file: The output audio file name
     :param rate: Speech speed adjustment (default is "+0%", indicating the standard speaking speed)
     :param pitch: Tone adjustment (default is "+0Hz", indicating a standard tone)
     :param volume: Volume adjustment (default is "+0%", indicating the standard volume)
     """
    # Select Chinese pronunciation, here is Xiaoyi's Neural pronunciation    voice = "zh-CN-XiaoyiNeural"  
    
    # Create a voice object using edge_tts.Communicate and pass in custom parameters    communicate = edge_tts.Communicate(text, voice, rate=rate, pitch=pitch, volume=volume)
    
    # Save the generated audio file    communicate.save_sync(output_file)
    print(f"Audio generated,Speed ​​of speech: {rate},tone: {pitch},volume: {volume}。")

# Sample callgenerate_audio_with_custom_params(
    "Welcome to experience custom voice synthesis!", 
    "custom_param_audio.wav", 
    rate="+50%", 
    pitch="+10Hz", 
    volume="-20%"
)
  • rate(Speech speed): Controls the adjustment of speech speed. The default value is"+0%", indicating the standard speaking speed.
  • pitch(Tone): Controls the adjustment of the tone, the unit is Hz. The default value is"+0Hz", indicating a standard tone.
  • volume(Volume): Controls the adjustment of the volume, the unit is percentage. The default value is"+0%", indicating the standard volume.

Generate audio and subtitles

In some application scenarios, you may need to generate both audio and subtitles and choose synchronous or asynchronous methods to process as needed. This chapter shows how to passedge-ttsImplement synchronous and asynchronous generation of audio and subtitle files. After execution, the audio file and the corresponding subtitle file will be generated.

import asyncio
import edge_tts

def process_audio_and_subtitles_sync(text: str, voice: str, output_file: str, srt_file: str) -> None:
    """
     Generate audio synchronously and generate subtitles in real time
     :param text: Chinese text that needs to be synthesized
     :param voice: The type of voice used
     :param output_file: The output audio file name
     :param srt_file: The output subtitle file name
     """
    communicate = edge_tts.Communicate(text, voice)
    submaker = edge_tts.SubMaker()

    # Synchronize the audio and generate subtitles in real time    with open(output_file, "wb") as audio_file:
        for chunk in communicate.stream_sync():
            if chunk["type"] == "audio":
                audio_file.write(chunk["data"])  # Write audio data            elif chunk["type"] == "WordBoundary":
                (chunk)  # Process subtitles
    # Save subtitle file    with open(srt_file, "w", encoding="utf-8") as subtitle_file:
        subtitle_file.write(submaker.get_srt())

async def process_audio_and_subtitles_async(text: str, voice: str, output_file: str, srt_file: str) -> None:
    """
     Generate audio asynchronously and generate subtitles in real time
     :param text: Chinese text that needs to be synthesized
     :param voice: The type of voice used
     :param output_file: The output audio file name
     :param srt_file: The output subtitle file name
     """
    # Logic for synchronous version calls asynchronously    loop = asyncio.get_event_loop()
    await loop.run_in_executor(None, process_audio_and_subtitles_sync, text, voice, output_file, srt_file)

# Sample callprocess_audio_and_subtitles_sync("Welcome to use Python for voice synthesis!", "zh-CN-XiaoyiNeural", "audio_sync.mp3", "audio_sync.srt")

# Asynchronous call(process_audio_and_subtitles_async("This is an example of testing voice and subtitle generation.", "zh-CN-XiaoyiNeural", "audio_async.mp3", "audio_async.srt"))
  • process_audio_and_subtitles_sync: Synchronously generate audio data and generate subtitles in real time (SRT format).
  • usecommunicate.stream_sync()Get the audio data stream and process each "audio" and "word boundary".
  • process_audio_and_subtitles_async:passasyncio.run_in_executorAsynchronous call to synchronous versionprocess_audio_and_subtitles_sync, ensure that asynchronous functions can run efficiently.

Summarize

Through this tutorial, you learned how to use itedge-ttsThe library implements text-to-speech conversion. You implement the following functions through different functions:

  • Basic text to pronunciation
  • Dynamically select voice to generate voice
  • Generate audio streams and subtitles

This is the article about the convenient voice synthesis of edge-tts in Python. For more related Python edge-tts, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!