SoFunction
Updated on 2024-11-20

Python Pandas DataFrame.drop_duplicates Remove Duplicates Details

grammatical

df.drop_duplicates(subset = None,
                   keep = 'first', 
                   inplace = False, 
                   ignore_index = False)

parameters

: the specified label or sequence of labels, removing only those columns with duplicate values, defaults to all columns

:: Determine the duplicate values to be retained, with the following options.

first:keep the first occurrence of the duplicate value, default

last:retains the last occurrence of a duplicate value

False:Remove all duplicates.

:: Whether it is in force

4. ignore_index: if True, reassign natural index (0,1,...,n - 1)

# delete duplicate values DataFrame.drop_duplicates()
import pandas as pd
 
df = ([['x','x',1],['x','x',1],['z','x',2]], columns = ['A','B','C'])
 
# Delete duplicate rows
res1 = df.drop_duplicates()
 
# Delete the specified column
res2 = df.drop_duplicates(subset = ['A'])
 
# Retain the last
res3 = df.drop_duplicates(subset = ['A'], keep = 'last')

Results Showcase

df

res1

res2

res3

Extension: Identify Duplicate Values

import pandas as pd
 
df = ({
    'studentID':['A001','A002','A003','A004','A005','A006','A006'],
    'score':[100,93,94,96,93,95,95]})
 
# Recognize duplicate values
duplicate_value = df[()]

df

From the above figure, we can see that there are two records with studentID 'A006', we can use duplicated() method to identify duplicate values, it returns Boolean results (True: with duplicate values, False: without duplicate values)

duplicate_value

summarize

To this article on Python Pandas DataFrame.drop_duplicates() to remove duplicate values of the article is introduced to this, more related Pandas DataFrame.drop_duplicates() to remove duplicate values of the content, please search for my previous posts or continue to browse the following related articles I hope that everyone! I hope you will support me more in the future!