Preparation (importing library, importing data)
import pandas as pd import as plt import numpy as np import seaborn as sns sns.set_style("darkgrid")
list_csv = ['Amazon_top_selling_book.csv','breast_cancer_wisconsin.csv','','','netflix_titles.csv','', '',''] dic_path = r'C:\Users\pandas\Desktop\task\228datasets\datasets' part_data = pd.read_csv(dic_path+'\\'+list_csv[4]) part_data
show_id | type | title | director | cast | country | date_added | release_year | rating | duration | listed_in | description | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | s1 | Movie | Dick Johnson Is Dead | Kirsten Johnson | NaN | United States | September 25, 2021 | 2020 | PG-13 | 90 min | Documentaries | As her father nears the end of his life, filmm... |
1 | s2 | TV Show | Blood & Water | NaN | Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban... |
South Africa | September 24, 2021 | 2021 | TV-MA | 2 Seasons | International TV Shows, TV Dramas, TV Mysteries |
After crossing paths at a party, a Cape Town t... |
2 | s3 | TV Show | Ganglands | Julien Leclercq | Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi... |
NaN | September 24, 2021 | 2021 | TV-MA | 1 Season | Crime TV Shows, International TV Shows, TV Act... |
To protect his family from a powerful drug lor... |
3 | s4 | TV Show | Jailbirds New Orleans | NaN | NaN | NaN | September 24, 2021 | 2021 | TV-MA | 1 Season | Docuseries, Reality TV | Feuds, flirtations and toilet talk go down amo... |
4 | s5 | TV Show | Kota Factory | NaN | Mayur More, Jitendra Kumar, Ranjan Raj, Alam K... |
India | September 24, 2021 | 2021 | TV-MA | 2 Seasons | International TV Shows, Romantic TV Shows, TV ... |
In a city of coaching centers known to train I... |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
8807 rows × 12 columns
Status of detection data
Hint: This function is used to detect missing values in any DataFrame.
def missing_values_table(df): mis_val = ().sum() mis_val_percent = 100 * ().sum() / len(df) mis_val_table = ([mis_val, mis_val_percent], axis=1) mis_val_table_ren_columns = mis_val_table.rename( columns = {0 : 'Missing Values', 1 : '% of Total Values'}) mis_val_table_ren_columns = mis_val_table_ren_columns[ mis_val_table_ren_columns.iloc[:,1] != 0].sort_values( '% of Total Values', ascending=False).round(1) print ("Your selected dataframe has " + str([1]) + " columns.\n" "There are " + str(mis_val_table_ren_columns.shape[0]) + " columns that have missing values.") return mis_val_table_ren_columns
missing_values_table(part_data)
Your selected dataframe has 12 columns.
There are 6 columns that have missing values.
Missing Values | % of Total Values | |
---|---|---|
director | 2634 | 29.9 |
country | 831 | 9.4 |
cast | 825 | 9.4 |
date_added | 10 | 0.1 |
rating | 4 | 0.0 |
duration | 3 | 0.0 |
(labels=None,axis=0, index=None, columns=None, inplace=False)
Parameter Description:
- labels is the name of the rows and columns to be deleted, given as a list.
- axis defaults to 0 and refers to deleting rows, so specify axis=1 when deleting columns;
- index Directly specifies the row to be deleted
- columns Specify the columns to be deleted directly
- replace=False, by default this delete operation does not change the original data, but returns a new dataframe after performing the delete operation;
- replace=True, then the delete operation will be performed directly on the original data, and cannot be returned after deletion.
Mode 1: Delete specified rows or columns
labels+axis
demo = part_data.drop(['director'], axis=1) missing_values_table(demo)
Your selected dataframe has 11 columns.
There are 5 columns that have missing values.
Missing Values | % of Total Values | |
---|---|---|
country | 831 | 9.4 |
cast | 825 | 9.4 |
date_added | 10 | 0.1 |
rating | 4 | 0.0 |
duration | 3 | 0.0 |
Way two: the use of boolean to delete the line that satisfies the condition element
df = (df[].index)
# Delete rows with release_year year before 2009 demo = part_data.drop(part_data[part_data["release_year"]<2009].index)
(7624, 12)
to this article on pandas data cleansing to achieve the deletion of the project practice of the article is introduced to this, more related pandas data cleansing to delete the contents of the search for my previous articles or continue to browse the following related articles I hope that you will support me in the future more!