SoFunction
Updated on 2024-11-16

DataFrame in python commonly used descriptive statistical analysis methods in detail

DataFrame common descriptive statistical analysis methods

sum() sums

Use the sum() method to sum DataFrame objects.
Where **set_option('.east_asian_width', True)** aligns the displayed DataFrame values with the column names.
sum has an axis parameter and defaults to 0, which means summing over columns.

  • Set to 1 for row summing.
  • The skipna parameter can also be set. The default value of the skipna parameter is True, which means that missing values are not taken into account, while False means that missing values are taken into account, and when there are missing values, the corresponding result is expressed as Nan.
  • (parameter values of type Boolean, which are also interpreted as Boolean values of values of other types when passed in)

Here the rows of the example data are summed and a new column is generated to add to the data.

import pandas as pd
data = [[110, 105, 99], [105, 88, 115], [109, 120, 130]]
index = [1, 2, 3]
columns = ['Languages', 'Math', 'English']
pd.set_option('.east_asian_width', True)
df = (data=data, index=index, columns=columns)
print(df)
print("================================")
# Add a column
df['Overall performance'] = (axis=1, skipna=1)
print(df)

The program runs with the following results:

在这里插入图片描述

mean() average

Here each column of the generated data is averaged and added as a new row to the original data.

As you can see from the example, when there is a null value in the original data, neither the numerator nor the denominator counts that data when calculating the mean. That is, mean() seeks the mean of the non-null data.

import pandas as pd
data = [[110, 105, 99], [105, 88, 115], [109, 120, 130], [112, 115]]
index = [1, 2, 3, 4]
columns = ['Languages', 'Math', 'English']
pd.set_option('.east_asian_width', True)
df = (data=data, index=index, columns=columns)
print(df)
print("================================")
new = ()
# Add one row of data (average of language, math and English, ignoring indexes)
df = (new, ignore_index=True)
print(df)

在这里插入图片描述

About DataFrame's append() method

To add a row to a DataFrame you can use the append() method. Set the parameter, ignore_index=True to ignore the index.

When the object appended after the DataFrame is a Series, ignore_index must be set to True, or unless Serise has the name attributeWhen appending multiple columns, setting ignore_index to True avoids the exception event of duplicate index values. Also DataFrame's append() method is about to be deprecated in a future version. It will be replaced by concat.

max() max value & min() min value

import pandas as pd
data = [[110, 105, 99], [105, 88, 115], [109, 120, 130]]
index = [1, 2, 3]
columns = ['Languages', 'Math', 'English']
pd.set_option('.east_asian_width', True)
df = (data=data, index=index, columns=columns)
print(df)
print("================================")
df_max = ()
print(df_max)
print("================================")
df_min = ()
print(df_min)

在这里插入图片描述

median()

import pandas as pd
data = [[110, 120, 110], [130, 130, 131], [115, 120, 130]]
columns = ['Languages', 'Math', 'English']
df = (data=data, columns=columns)
print(df)
print("================================")
print(())

在这里插入图片描述

mode() Mode

import pandas as pd
data = [[110, 120, 110], [130, 130, 130], [130, 120, 130]]
columns = ['Languages', 'Math', 'English']
df = (data=data, columns=columns)
print(df)
# Plurality of scores in three subjects
print(())
# of plurals in each row
print((axis=1))
# Plurality of "math"
print(df['Math'].mode())

在这里插入图片描述

var() variance

import pandas as pd
data = [[110, 113, 102, 105, 108], [118, 98, 119, 85, 118]]
index = ['Blackie', 'White']
columns = ['Physics 1', 'Physics 2', 'Physics 3', 'Physics 4', 'Physics 5']
df = (data=data, index=index, columns=columns)
print(df)
print("========================================")
print((axis=1))

在这里插入图片描述

std() standard deviation

import pandas as pd
data = [[110, 120, 110], [130, 130, 130], [130, 120, 130]]
columns = ['Languages', 'Math', 'English']
df = (data=data, columns=columns)
print(df)
print("=============================")
print(())

在这里插入图片描述

quantile() quantile

Take the 35% quartile as an example

import pandas as pd
# Create DataFrame data (math scores)
data = [120, 89, 98, 78, 65, 102, 112, 56, 79, 45]
columns = ['Math']
df = (data=data, columns=columns)
print(df)
print("============================")
# Calculate the 35% quartile
x = df['Math'].quantile(0.35)
# Output elimination of students
print(df[df['Math'] <= x])

在这里插入图片描述

With respect to other data types, such as Timestamp, the quantilequantile() method can also be used.

import pandas as pd
pd.set_option('.east_asian_width', True)
df = ({'A': [1, 2],
                   'B': [('2019'),
                         ('2020')],
                   'C': [('1 days'),
                         ('2 days')]})
print(df)
print("==============================")
print((0.5, numeric_only=False))

在这里插入图片描述

to this article on the python DataFrame commonly used descriptive statistical analysis methods explained in detail on the article is introduced to this, more related python DataFrame commonly used method content please search my previous posts or continue to browse the following related articles I hope you will support me more in the future!