SoFunction
Updated on 2025-04-26

Example of implementation of pandas median fill null values

Processing missing data is a common and important step in data analysis and machine learning. Missing data may affect the performance of the model, so appropriate methods are needed to deal with these null values. This article will explain how to use the median in the pandas library to fill in null values ​​in the data.

What is median padding?

Median padding is a simple and effective method to fill missing values ​​in the dataset. The median is a statistic that represents the value between the dataset. Unlike the mean, the median is not affected by extreme values, so median filling may be more robust than mean filling in the presence of outliers.

Why choose median padding?

Robustness: The median is not affected by outliers and can more accurately reflect the central trend of the data.
Simple: Both implementation and understanding are simple.
Universality: Applicable to filling most numerical data.

Sample data

First, we create an example DataFrame with some null values.

import pandas as pd
import numpy as np

# Create sample datadata = {
    'A': [1, 2, , 4, 5],
    'B': [, 2, 3, 4, ],
    'C': [1, , , 4, 5]
}
df = (data)

print("Raw Data:")
print(df)
Fill in empty values ​​with median
Next,We will use pandas Provided fillna() Method to fill in empty values。First calculate the median number of each column,Then use these medians to fill the empty values ​​of the corresponding column。

python
Copy the code
# Calculate the median number of each columnmedian_values = ()

# Use median to fill in empty valuesdf_filled = (median_values)

print("\nFilled with median:")
print(df_filled)

Results Analysis

After the fill operation, we can see that the null value in the DataFrame is replaced by the median of the corresponding column.

Complete code

Here is a complete code example, from creating data to filling in null values ​​with medians:

import pandas as pd
import numpy as np

# Create sample datadata = {
    'A': [1, 2, , 4, 5],
    'B': [, 2, 3, 4, ],
    'C': [1, , , 4, 5]
}
df = (data)

print("Raw Data:")
print(df)

# Calculate the median number of each columnmedian_values = ()

# Use median to fill in empty valuesdf_filled = (median_values)

print("\nFilled with median:")
print(df_filled)

Output

Raw data:
     A    B    C
0  1.0  NaN  1.0
1  2.0  2.0  NaN
2  NaN  3.0  NaN
3  4.0  4.0  4.0
4  5.0  NaN  5.0

Data filled with median:
     A    B    C
0  1.0  3.0  1.0
1  2.0  2.0  4.0
2  3.0  3.0  4.0
3  4.0  4.0  4.0
4  5.0  3.0  5.0

Summarize

Median padding is a simple and effective way to deal with missing data. It is more robust than mean padding when it comes to handling outliers. In practical applications, choosing the appropriate filling method depends on the characteristics of the data and specific requirements. Hope this article helps you better understand and use the median fill method in pandas.

This is the end of this article about the implementation example of pandas median fill null values. For more related pandas median fill null values, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!