Pandas import
Pandas is a third-party library for Python that provides high-performance, easy-to-use data types and analysis tools Pandas is based on the NumPy implementation and is often used in conjunction with NumPy and Matplotlib Two data types: Series, DataFrame
import pandas as pd
Pandas vs numpy
The Series type of Pandas
Consists of a set of data and an index of the data associated with it
Creation of the Series type in Pandas
The Series type can be created from the following types:
Python lists, index is the same as the number of elements in the list Scalar values, index expresses the size of the Series type Python dictionaries, the "key" in a key-value pair is the index, and index performs a selection operation on the dictionary ndarray, indexes and data can be created with the ndarray Other functions, range() function, etc.
Basic operations on the Series type of Pandas
The Series type contains both index and values:
index Gets the index values Gets the data
A Series created from an ndarray or dictionary that operates like an ndarray or dictionary type.
DataFrame type for pandas
The DataFrame type consists of a set of columns that share the same index.
DataFrame is a tabular data type where each column value type can be different
DataFrame has both row and column indexes.
DataFrames are often used to express two-dimensional data, but can express multidimensional data.
DataFrame is a two-dimensional "labeled" array.
The basic operation of DataFrame is similar to Series, based on the index of rows and columns.
DataFrame type creation for pandas
The DataFrame type can be created from the following types:
Two-dimensional ndarray object Dictionary type consisting of a one-dimensional ndarray, list, dictionary, tuple, or Series Other DataFrame types
Basic operations on Pandas' Dataframe type
pandas index operations
pandas reindexing
reindex() can change or reorder Series and DataFrame indexes.
Arguments for reindex(index=None, columns=None, ...)
pandas delete index
drop () can delete the Series and DataFrame specified rows or columns indexed
pandas data operations
Arithmetic operations are based on row and column indexes, and are performed after completing the operation, which produces a floating-point number by default Missing entries are filled with NaN (null) when completing the operation Broadcasting operations between two and one dimensions, one dimension, and zero dimensions Binary operations using the + - * / notation produce a new object
arithmetic operation
Different dimensions are broadcast operations, one-dimensional series participate in axis 1 by default. Use the operation method to make one-dimensional series participate in axis 0 operations.
Pandas Data Analysis
pandas import and export data
Import data
pd.read_csv(filename): import data from CSV file
pd.read_table(filename): Import data from a delimited text file.
pd.read_excel(filename): import data from Excel file
pd.read_sql(query, connection_object): import data from SQL table/library
pd.read_json(json_string): import data from JSON format string
pd.read_html(url): parses a URL, string or HTML file and extracts the tables table.
pd.read_clipboard(): get the content from your clipboard and pass it to read_table()
(dict): import data from the dictionary object, Key is the column name, Value is the data
Export data
df.to_csv(filename): export data to CSV file
df.to_excel(filename): export data to Excel file
df.to_sql(table_name, connection_object): export data to SQL table
df.to_json(filename): export data to text file in Json format
Pandas to view, examine data
(n): view the first n rows of the DataFrame object
(n): view the last n rows of the DataFrame object
(): view the number of rows and columns
(): view index, data type and memory information
(): view summary statistics for numeric columns
s.value_counts(dropna=False): view unique values and counts for Series objects
(.value_counts): view unique values and counts for each column in the DataFrame object
Pandas data selection
df[col]: based on the column name and return the column as a Series
df[[col1, col2]]: return multiple columns as DataFrame
[0]: Selection of data by position
['index_one']: select data by index
[0,:]: return to the first line
[0,0]: return to the first element of the first column
pandas data cleanup
= ['a','b','c']: rename columns
(): checks for null values in the DataFrame object and returns an array of Booleans
(): checks for non-null values in the DataFrame object and returns an array of Booleans
(): removes all lines containing null values
(axis=1): remove all columns containing null values
(axis=1,thresh=n): remove all rows with less than n non-null values
(x): replace all null values in the DataFrame object with x
(float): change the data type in Series to float type
(1,'one'): replace all values equal to 1 with 'one'
([1,3],['one','three']): replace 1 with 'one' and 3 with 'three'
(columns=lambda x: x + 1): batch change column names
(columns={'old_name': 'new_ name'}): selectively change column names
df.set_index('column_one'): change index columns
(index=lambda x: x + 1): batch rename indexes
Pandas Data Processing
= ['a','b','c']: rename columns
(): checks for null values in the DataFrame object and returns an array of Booleans
(): checks for non-null values in the DataFrame object and returns an array of Booleans
(): removes all lines containing null values
(axis=1): remove all columns containing null values
(axis=1,thresh=n): remove all rows with less than n non-null values
(x): replace all null values in the DataFrame object with x
(float): change the data type in Series to float type
(1,'one'): replace all values equal to 1 with 'one'
([1,3],['one','three']): replace 1 with 'one' and 3 with 'three'
(columns=lambda x: x + 1): batch change column names
(columns={'old_name': 'new_ name'}): selectively change column names
df.set_index('column_one'): change index columns
(index=lambda x: x + 1): batch rename indexes
df[df[col] > 0.5]: selects rows where the value of the col column is greater than 0.5
df.sort_values(col1): sort data by column col1, default ascending order
df.sort_values(col2, ascending=False): sort data in descending order by column col1
df.sort_values([col1,col2], ascending=[True,False]): first by column col1 ascending, then by col2 descending order data
(col): return a Groupby object grouped by column col
([col1,col2]): Returns a Groupby object grouped by multiple columns.
(col1)[col2]: returns the mean value of column col2 after grouping by column col1
df.pivot_table(index=col1, values=[col2,col3], aggfunc=max): creates a pivot table grouped by column col1 and calculates the maximum values for col2 and col3
(col1).agg (): return to the average of all columns grouped by column col1
(): Apply the function to each column in the DataFrame
(,axis=1): apply the function to each row in the DataFrame
Pandas Data Merge
(df2): add rows from df2 to the end of df1
([df1, df2],axis=1): add the columns from df2 to the end of df1
(df2,on=col1,how='inner'): performs a SQL join between the columns of df1 and df2.
Pandas statistics
(): view summary statistics for columns of data values
(): return the average value of all columns
(): return the correlation coefficient between columns
(): return the number of non-null values in each column
(): return to the maximum value of each column
(): return the minimum value of each column
(): return the median of each column
(): return the standard deviation of each column
To this article on the Python third-party library Pandas data analysis tutorial is introduced to this article, more related to Python Pandas data analysis content, please search for my previous articles or continue to browse the following related articles I hope you will support me in the future!