grammatical
df.drop_duplicates(subset = None, keep = 'first', inplace = False, ignore_index = False)
parameters
: the specified label or sequence of labels, removing only those columns with duplicate values, defaults to all columns
:: Determine the duplicate values to be retained, with the following options.
first:keep the first occurrence of the duplicate value, default
last:retains the last occurrence of a duplicate value
False:Remove all duplicates.
:: Whether it is in force
4. ignore_index: if True, reassign natural index (0,1,...,n - 1)
# delete duplicate values DataFrame.drop_duplicates() import pandas as pd df = ([['x','x',1],['x','x',1],['z','x',2]], columns = ['A','B','C']) # Delete duplicate rows res1 = df.drop_duplicates() # Delete the specified column res2 = df.drop_duplicates(subset = ['A']) # Retain the last res3 = df.drop_duplicates(subset = ['A'], keep = 'last')
Results Showcase
df
res1
res2
res3
Extension: Identify Duplicate Values
import pandas as pd df = ({ 'studentID':['A001','A002','A003','A004','A005','A006','A006'], 'score':[100,93,94,96,93,95,95]}) # Recognize duplicate values duplicate_value = df[()]
df
From the above figure, we can see that there are two records with studentID 'A006', we can use duplicated() method to identify duplicate values, it returns Boolean results (True: with duplicate values, False: without duplicate values)
duplicate_value
summarize
To this article on Python Pandas DataFrame.drop_duplicates() to remove duplicate values of the article is introduced to this, more related Pandas DataFrame.drop_duplicates() to remove duplicate values of the content, please search for my previous posts or continue to browse the following related articles I hope that everyone! I hope you will support me more in the future!