DataFrame common descriptive statistical analysis methods
sum() sums
Use the sum() method to sum DataFrame objects.
Where **set_option('.east_asian_width', True)** aligns the displayed DataFrame values with the column names.
sum has an axis parameter and defaults to 0, which means summing over columns.
- Set to 1 for row summing.
- The skipna parameter can also be set. The default value of the skipna parameter is True, which means that missing values are not taken into account, while False means that missing values are taken into account, and when there are missing values, the corresponding result is expressed as Nan.
- (parameter values of type Boolean, which are also interpreted as Boolean values of values of other types when passed in)
Here the rows of the example data are summed and a new column is generated to add to the data.
import pandas as pd data = [[110, 105, 99], [105, 88, 115], [109, 120, 130]] index = [1, 2, 3] columns = ['Languages', 'Math', 'English'] pd.set_option('.east_asian_width', True) df = (data=data, index=index, columns=columns) print(df) print("================================") # Add a column df['Overall performance'] = (axis=1, skipna=1) print(df)
The program runs with the following results:
mean() average
Here each column of the generated data is averaged and added as a new row to the original data.
As you can see from the example, when there is a null value in the original data, neither the numerator nor the denominator counts that data when calculating the mean. That is, mean() seeks the mean of the non-null data.
import pandas as pd data = [[110, 105, 99], [105, 88, 115], [109, 120, 130], [112, 115]] index = [1, 2, 3, 4] columns = ['Languages', 'Math', 'English'] pd.set_option('.east_asian_width', True) df = (data=data, index=index, columns=columns) print(df) print("================================") new = () # Add one row of data (average of language, math and English, ignoring indexes) df = (new, ignore_index=True) print(df)
About DataFrame's append() method
To add a row to a DataFrame you can use the append() method. Set the parameter, ignore_index=True to ignore the index.
When the object appended after the DataFrame is a Series, ignore_index must be set to True, or unless Serise has the name attribute。 When appending multiple columns, setting ignore_index to True avoids the exception event of duplicate index values. Also DataFrame's append() method is about to be deprecated in a future version. It will be replaced by concat.
max() max value & min() min value
import pandas as pd data = [[110, 105, 99], [105, 88, 115], [109, 120, 130]] index = [1, 2, 3] columns = ['Languages', 'Math', 'English'] pd.set_option('.east_asian_width', True) df = (data=data, index=index, columns=columns) print(df) print("================================") df_max = () print(df_max) print("================================") df_min = () print(df_min)
median()
import pandas as pd data = [[110, 120, 110], [130, 130, 131], [115, 120, 130]] columns = ['Languages', 'Math', 'English'] df = (data=data, columns=columns) print(df) print("================================") print(())
mode() Mode
import pandas as pd data = [[110, 120, 110], [130, 130, 130], [130, 120, 130]] columns = ['Languages', 'Math', 'English'] df = (data=data, columns=columns) print(df) # Plurality of scores in three subjects print(()) # of plurals in each row print((axis=1)) # Plurality of "math" print(df['Math'].mode())
var() variance
import pandas as pd data = [[110, 113, 102, 105, 108], [118, 98, 119, 85, 118]] index = ['Blackie', 'White'] columns = ['Physics 1', 'Physics 2', 'Physics 3', 'Physics 4', 'Physics 5'] df = (data=data, index=index, columns=columns) print(df) print("========================================") print((axis=1))
std() standard deviation
import pandas as pd data = [[110, 120, 110], [130, 130, 130], [130, 120, 130]] columns = ['Languages', 'Math', 'English'] df = (data=data, columns=columns) print(df) print("=============================") print(())
quantile() quantile
Take the 35% quartile as an example
import pandas as pd # Create DataFrame data (math scores) data = [120, 89, 98, 78, 65, 102, 112, 56, 79, 45] columns = ['Math'] df = (data=data, columns=columns) print(df) print("============================") # Calculate the 35% quartile x = df['Math'].quantile(0.35) # Output elimination of students print(df[df['Math'] <= x])
With respect to other data types, such as Timestamp, the quantilequantile() method can also be used.
import pandas as pd pd.set_option('.east_asian_width', True) df = ({'A': [1, 2], 'B': [('2019'), ('2020')], 'C': [('1 days'), ('2 days')]}) print(df) print("==============================") print((0.5, numeric_only=False))
to this article on the python DataFrame commonly used descriptive statistical analysis methods explained in detail on the article is introduced to this, more related python DataFrame commonly used method content please search my previous posts or continue to browse the following related articles I hope you will support me more in the future!