Python pandas methods for handling missing values explained (dropna, drop, fillna)

Faced with three treatments for missing values:

option 1: Remove samples containing missing values (rows)
Option 2: Remove columns (feature vectors) containing missing values
option 3: Fill missing values with some value (0, mean, median, etc.)

For dropna and fillna, both dataframe and series are available, here we mainly talk about datafame's

For option1:

utilization(axis=0, how='any', thresh=None, subset=None, inplace=False)

Parameter Description:

axis:
- axis=0: remove rows containing missing values
- axis=1: remove columns containing missing values
how: works with axis
- how='any' :Delete the line item column whenever a missing value occurs
- how='all': all values are missing before deleting rows or columns
thresh: there are at least thresh non-missing values in the axis, otherwise it is deleted.
For example, axis=0, thresh=10: identifies that if the number of non-missing values in the line is less than 10, the line will be deleted.
subset: list
Which columns to look in to see if there are any missing values
inplace: Whether to operate on the original data. If true, returns None otherwise returns a new copy, stripped of missing values

It is recommended to write all the default parameters for quick understanding when using the

examples:

 	   	      df = (
                                        {"name": ['Alfred', 'Batman', 'Catwoman'],         
                                          "toy": [, 'Batmobile', 'Bullwhip'],
                                         "born": [, ("1940-04-25")     
                                                        ]})
 			>>> df
 			       name        toy       born
 			0    Alfred        NaN        NaT
 			1    Batman  Batmobile 1940-04-25
 			2  Catwoman   Bullwhip        NaT
 			
 			# Drop the rows where at least one element is missing.
 			>>> ()
 			     name        toy       born
 			1  Batman  Batmobile 1940-04-25
 			
 			# Drop the columns where at least one element is missing.
 			>>> (axis='columns')
 			       name
 			0    Alfred
 			1    Batman
 			2  Catwoman
 			
 			# Drop the rows where all elements are missing.
 			>>> (how='all')
 			       name        toy       born
 			0    Alfred        NaN        NaT
 			1    Batman  Batmobile 1940-04-25
 			2  Catwoman   Bullwhip        NaT
 			
 			# Keep only the rows with at least 2 non-NA values.
 			>>> (thresh=2)
 			       name        toy       born
 			1    Batman  Batmobile 1940-04-25
 			2  Catwoman   Bullwhip        NaT
 			
 			# Define in which columns to look for missing values.
 			>>> (subset=['name', 'born'])
 			       name        toy       born
 			1    Batman  Batmobile 1940-04-25
 			
 			# Keep the DataFrame with valid entries in the same variable.	
 			>>> (inplace=True)
 			>>> df
 			     name        toy       born
 			1  Batman  Batmobile 1940-04-25

For option 2.

You can use the dropna or drop function.
(labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')

labels: list of rows or columns to delete
axis: 0 rows; 1 column

	df = ((12).reshape(3,4),                 
	                  columns=['A', 'B', 'C', 'D'])
	
	>>>df
	   	   A  B   C   D
		0  0  1   2   3
		1  4  5   6   7
		2  8  9  10  11

	# Delete columns
	>>> (['B', 'C'], axis=1)
	   A   D
	0  0   3
	1  4   7
	2  8  11
	>>> (columns=['B', 'C'])
	   A   D
	0  0   3
	1  4   7
	2  8  11
	
	# Delete rows (indexes)
	>>> ([0, 1])
	   A  B   C   D
	2  8  9  10  11

For option3

utilization(value=None, method=None, axis=None, inplace=False, limit=None, downcast=None, **kwargs)

value: scalar, dict, Series, or DataFrame
dict allows you to specify what values to fill each row or column with.
method： {‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}, default None
Column operations
- ffill / pad: use the previous value to fill in missing values
- backfill / bfill :use the latter value to fill in missing values
limit A limit on the number of missing values that can be filled. Shouldn't be used much.

f = ([[, 2, , 0],
                   [3, 4, , 1],
                   [, , , 5],
                   [, 3, , 4]],
                   columns=list('ABCD'))
 >>> df
     A    B   C  D
0  NaN  2.0 NaN  0
1  3.0  4.0 NaN  1
2  NaN  NaN NaN  5
3  NaN  3.0 NaN  4

# Use 0 to replace all missing values
>>> (0)
    A   B   C   D
0   0.0 2.0 0.0 0
1   3.0 4.0 0.0 1
2   0.0 0.0 0.0 5
3   0.0 3.0 0.0 4

# Fill in missing values using back or front values
>>> (method='ffill')
    A   B   C   D
0   NaN 2.0 NaN 0
1   3.0 4.0 NaN 1
2   3.0 4.0 NaN 5
3   3.0 3.0 NaN 4

>>>(method='bfill')
     A	B	C	D
0	3.0	2.0	NaN	0
1	3.0	4.0	NaN	1
2	NaN	3.0	NaN	5
3	NaN	3.0	NaN	4

# Replace all NaN elements in column ‘A', ‘B', ‘C', and ‘D', with 0, 1, 2, and 3 respectively.
# Use different missing values for each column
>>> values = {'A': 0, 'B': 1, 'C': 2, 'D': 3}
>>> (value=values)
    A   B   C   D
0   0.0 2.0 2.0 0
1   3.0 4.0 2.0 1
2   0.0 1.0 2.0 5
3   0.0 3.0 2.0 4

# Replace only the first missing value
 >>>(value=values, limit=1)
    A   B   C   D
0   0.0 2.0 2.0 0
1   3.0 4.0 NaN 1
2   NaN 1.0 NaN 5
3   NaN 3.0 NaN 4

House price analysis:

In this problem, only the bedroom column has missing values, following these three methods the processing code is:

# option 1 Remove lines containing missing values
(subset=["total_bedrooms"])  

# option 2 Remove the column "total_bedrooms" from the data.
("total_bedrooms", axis=1)  

 # option 3 Fill in missing values with the median value of "total_bedrooms".
median = housing["total_bedrooms"].median()
housing["total_bedrooms"].fillna(median)

Sklearn provides the Imputer class for handling missing values, a tutorial on how to use it is here.https:///article/

summarize

to this article on Python pandas processing of missing values (dropna, drop, fillna) of the article is introduced to this, more related pandas processing of missing values content, please search for my previous posts or continue to browse the following related articles I hope that you will support me in the future more!