SoFunction
Updated on 2024-11-15

Python audio processing library pydub tutorials to use details

1. Installation

Just use pip to install it (you also need to install ffmpeg dependency, it is recommended to use conda command to install it, then you don't need to configure the environment):

pip install pydub

2. Importing and reading audio files

from pydub import AudioSegment
audio = AudioSegment.from_file("path/to/file")

3. Audio playback

from  import play
play(audio)

4. Audio duration

duration = audio.duration_seconds # in seconds

5. Audio cutting

# The first 10 seconds
audio = audio[:10000]

# 10 seconds after
audio = audio[-10000:]

# Starting at the 10th second and ending at the 20th second #
audio = audio[10000:20000]

# From the 10th second to the end
audio = audio[10000:]

# From start to 10th second audio = audio[:10000]

6. Audio merging

audio1 = AudioSegment.from_file("path/to/file1")
audio2 = AudioSegment.from_file("path/to/file2")
audio_combined = audio1 + audio2

7. Audio conversion

("path/to/new/file", format="mp3")

8. Volume adjustment

# 10 dB increase
louder_audio = audio + 10

# Decrease by 10 decibels
quieter_audio = audio - 10

9. Splitting audio equally

# Split in equal parts, at roughly three-minute intervals #
for i in range(1, 1000):
    if 3.3 >= (audio.duration_seconds / (60 * i)) >= 2.8:
        number = i
        break
chunks = audio[::int(audio.duration_seconds / number * 1000 + 1)]  # Cutting

# Save split audio
for i, chunk in enumerate(chunks):
    ("path/to/new/file{}.wav".format(title,i), format="wav")

10. Complete code

Below is a complete code for cutting the audio back and forth and splitting the audio into small segments of appropriate length for saving.

from pydub import AudioSegment

# Reading audio files
audio = AudioSegment.from_file("path/to/file")

# of hours of output video
print('Video Duration:', audio.duration_seconds / 60)

# Front and back cutting
start = int(input('First cut n seconds, no cut enter 0'))*1000
end = int(input('Back cut n seconds, no cut enter 0'))*1000
if start:
    audio = audio[start:-end]

# Calculate the proper split length
for i in range(1, 1000):
    if 3.3 >= (audio.duration_seconds / (60 * i)) >= 2.8:
        number = i
        break
chunks = audio[::int(audio.duration_seconds / number * 1000 + 1)] 
# Save split audio
for i, chunk in enumerate(chunks):
    print('Duration after splitting:', chunk.duration_seconds / 60)
    ("path/to/new/file{}.wav".format(i), format="wav")

These are the main points of pydub and a complete example. With pydub, we can easily process and convert audio, making our audio processing more efficient and convenient.
In addition, some other use cases for pydub are listed below.

Application Cases

1. Convert audio files to a specified format

from pydub import AudioSegment

# Reading audio files
audio = AudioSegment.from_file("path/to/file")

# Convert to mp3 and save
("path/to/new/file.mp3", format="mp3")

2. Merge multiple audio files into a single file

from pydub import AudioSegment

# Reading audio files
audio1 = AudioSegment.from_file("path/to/file1")
audio2 = AudioSegment.from_file("path/to/file2")

# Merge audio files and save
combined_audio = audio1 + audio2
combined_audio.export("path/to/new/file", format="wav")

3. Ringtone production

from pydub import AudioSegment

# Reading audio files
audio = AudioSegment.from_file("path/to/file")

# Cut and save
start = 10000
end = 15000
ringtone = audio[start:end]
("path/to/new/file", format="mp3")

4. Adjusting audio volume

from pydub import AudioSegment

# Reading audio files
audio = AudioSegment.from_file("path/to/file")

# 10 dB increase
louder_audio = audio + 10

# Decrease by 10 decibels
quieter_audio = audio - 10

# Save the adjusted audio
louder_audio.export("path/to/new/file", format="wav")
quieter_audio.export("path/to/new/file", format="wav")

Case: Segmenting songs in audio by recognizing blank sounds

from pydub import AudioSegment
from  import split_on_silence

# Reading audio files
audio = AudioSegment.from_file("audio.mp3", format="mp3")

# Setting the segmentation parameters
min_silence_len = 700  # Minimum mute length
silence_thresh =-10  # Mute thresholds, smaller and tighter
keep_silence = 600  # Reserved mute length

# Calculate the number of splits
num_segments = int(audio.duration_seconds/60/3)  # Roughly three minutes per song, counting songs #

# Split the audio file
for i in range(-10, 0):
    segments = split_on_silence(audio, min_silence_len=min_silence_len, silence_thresh=i, keep_silence=keep_silence)
    if len(segments) <= num_segments:
        print(f"Split Success,Split out a total of {len(segments)} stage (of a process)")
        break
    else:
        print(f"The current threshold is {i},break out {len(segments)} stage (of a process),Keep trying.")

First, we use AudioSegment.from_file() method to read the audio file and set the segmentation parameters min_silence_len, silence_thresh and keep_silence to represent the minimum silence length, silence threshold and keep silence length respectively. Among them, the smaller the mute threshold is, the more small segments are segmented, but mis-segmentation may occur; on the contrary, the larger the mute threshold is, the fewer small segments are segmented, but leakage segmentation may occur.

Then we calculate the number of segments num_segments, i.e. how many segments the audio file is split into. Here we assume that each song is about three minutes long, and calculate the total number of segments we need to split into.

Finally, we use the split_on_silence() method to split the audio file, set the splitting parameters, and keep adjusting the silence threshold by looping until the number of small segments split is as expected. If the split is successful, the loop is skipped; otherwise, keep trying.

All in all, pydub is a very useful audio processing library, which can be easily used for audio processing, conversion, merging and other operations. At the same time, pydub also has a wealth of application scenarios, such as making ringtones, adjusting the volume and so on. It is worth noting that in the process of using pydub, you need to pay attention to the compatibility of audio formats.

In addition, audio can be coded, decoded, mixed, resampled, etc. with pydub. Here are some examples of common operations.

Codecs, mixing, resampling

1. Codecs

from pydub import AudioSegment

# Reading audio files
audio = AudioSegment.from_file("path/to/file")

# Coding
encoded_audio = audio.set_frame_rate(16000).set_sample_width(2).set_channels(1)

# Decoding
decoded_audio = encoded_audio.set_frame_rate(44100).set_sample_width(4).set_channels(2)

2. Mixing

from pydub import AudioSegment

# Reading audio files
audio1 = AudioSegment.from_file("path/to/file1")
audio2 = AudioSegment.from_file("path/to/file2")

# Mixing
mixed_audio = (audio2)

# Save the audio after mixing
mixed_audio.export("path/to/new/file", format="wav")

3. Resampling

from pydub import AudioSegment

# Reading audio files
audio =AudioSegment.from_file("path/to/file")

# Resampling to 44100 Hz
resampled_audio = audio.set_frame_rate(44100)

# Save resampled audio
resampled_audio.export("path/to/new/file", format="wav")

With pydub, we can easily perform audio codecs, mixing, resampling and other operations, further expanding the application scenarios of pydub. It is important to note that when performing audio mixing operations, you need to ensure that the sampling rate, the number of sampling bits and the number of channels of the two audio files are the same.

Finally, summarize the pros and cons of pydub.

Pros:

Lightweight: pydub is a lightweight audio processing library that is easy to install and simple to use.

Feature-rich: pydub provides rich audio processing functions, including cutting, merging, converting, adjusting volume, codecs, mixing, resampling and so on.

Wide range of applications: pydub has a wide range of application scenarios, including audio processing, ringtone production, audio format conversion, speech recognition and so on.

Drawbacks:

Limited compatibility with formats: pydub has limited compatibility with audio formats and does not support all audio formats, you need to convert the audio to a supported format before processing.

Mediocre performance: pydub may have mediocre performance when dealing with large files, requiring some time and computational resources.

Doesn't support streaming: pydub doesn't support streaming, you need to read the whole audio file into memory, resulting in a large memory footprint.

In summary, pydub is a feature-rich, widely used audio processing library. When using pydub, you need to pay attention to audio format compatibility issues, and pay attention to performance and memory usage when processing large files. If you need to handle more complex audio tasks, you can consider using other more specialized audio processing libraries.

summarize

The blog describes how to split an audio file into multiple small segments using Python's pydub library. We first read the audio file, then set the segmentation parameters and calculate the number of segments. Finally, a loop is used to keep adjusting the mute threshold until the number of small segments split is as expected. This approach can be used to process and analyze audio, such as speech recognition, music recommendation, etc.

It should be noted that the effect of audio segmentation is affected by the segmentation parameters, which need to be adjusted according to the specific situation. In addition, the segmented segments may have mis-segmentation and missed segmentation, which need to be subsequently checked and processed.

Above is the Python audio processing library pydub use tutorial details, more information about Python audio processing library pydub please pay attention to my other related articles!