SoFunction
Updated on 2024-11-16

pandas data cleansing to achieve deletion project practice

Preparation (importing library, importing data)

import pandas as pd
import  as plt
import numpy as np
import seaborn as  sns
sns.set_style("darkgrid")   
list_csv = ['Amazon_top_selling_book.csv','breast_cancer_wisconsin.csv','','','netflix_titles.csv','',
           '','']
dic_path = r'C:\Users\pandas\Desktop\task\228datasets\datasets'
part_data = pd.read_csv(dic_path+'\\'+list_csv[4])
part_data
  show_id type title director cast country date_added release_year rating duration listed_in description
0 s1 Movie Dick Johnson Is Dead Kirsten Johnson NaN United States September 25, 2021 2020 PG-13 90 min Documentaries As her father nears the end of his life, filmm...
1 s2 TV Show Blood & Water NaN Ama Qamata,
Khosi Ngema,
Gail Mabalane,
Thaban...
South Africa September 24, 2021 2021 TV-MA 2 Seasons International TV Shows,
TV Dramas,
TV Mysteries
After crossing paths at a party, a Cape Town t...
2 s3 TV Show Ganglands Julien Leclercq Sami Bouajila,
Tracy Gotoas,
Samuel Jouy,
Nabi...
NaN September 24, 2021 2021 TV-MA 1 Season Crime TV Shows,
International TV Shows,
TV Act...
To protect his family from a powerful drug lor...
3 s4 TV Show Jailbirds New Orleans NaN NaN NaN September 24, 2021 2021 TV-MA 1 Season Docuseries, Reality TV Feuds, flirtations and toilet talk go down amo...
4 s5 TV Show Kota Factory NaN Mayur More,
Jitendra Kumar,
Ranjan Raj,
Alam K...
India September 24, 2021 2021 TV-MA 2 Seasons International TV Shows,
Romantic TV Shows,
TV ...
In a city of coaching centers known to train I...
... ... ... ... ... ... ... ... ... ... ... ... ...

8807 rows × 12 columns

Status of detection data

Hint: This function is used to detect missing values in any DataFrame.

def missing_values_table(df):
        mis_val = ().sum()
        mis_val_percent = 100 * ().sum() / len(df)
        mis_val_table = ([mis_val, mis_val_percent], axis=1)
        mis_val_table_ren_columns = mis_val_table.rename(
        columns = {0 : 'Missing Values', 1 : '% of Total Values'})
        mis_val_table_ren_columns = mis_val_table_ren_columns[
            mis_val_table_ren_columns.iloc[:,1] != 0].sort_values(
        '% of Total Values', ascending=False).round(1)
        print ("Your selected dataframe has " + str([1]) + " columns.\n"      
            "There are " + str(mis_val_table_ren_columns.shape[0]) +
              " columns that have missing values.")
        return mis_val_table_ren_columns
missing_values_table(part_data)

Your selected dataframe has 12 columns.
There are 6 columns that have missing values.

  Missing Values % of Total Values
director 2634 29.9
country 831 9.4
cast 825 9.4
date_added 10 0.1
rating 4 0.0
duration 3 0.0

(labels=None,axis=0, index=None, columns=None, inplace=False)

Parameter Description:

  • labels is the name of the rows and columns to be deleted, given as a list.
  • axis defaults to 0 and refers to deleting rows, so specify axis=1 when deleting columns;
  • index Directly specifies the row to be deleted
  • columns Specify the columns to be deleted directly
  • replace=False, by default this delete operation does not change the original data, but returns a new dataframe after performing the delete operation;
  • replace=True, then the delete operation will be performed directly on the original data, and cannot be returned after deletion.

Mode 1: Delete specified rows or columns

labels+axis

demo = part_data.drop(['director'], axis=1)
missing_values_table(demo)

Your selected dataframe has 11 columns.
There are 5 columns that have missing values.

  Missing Values % of Total Values
country 831 9.4
cast 825 9.4
date_added 10 0.1
rating 4 0.0
duration 3 0.0

Way two: the use of boolean to delete the line that satisfies the condition element

df = (df[].index)

# Delete rows with release_year year before 2009
demo = part_data.drop(part_data[part_data["release_year"]<2009].index)

(7624, 12)

to this article on pandas data cleansing to achieve the deletion of the project practice of the article is introduced to this, more related pandas data cleansing to delete the contents of the search for my previous articles or continue to browse the following related articles I hope that you will support me in the future more!