SoFunction
Updated on 2024-11-14

python pandas dataframe rows and columns selection, slicing operation method

SQL in the select is based on the name of the column to select; Pandas is more flexible, not only according to the name of the column to select, but also according to the column where the position (number, in the first few rows and columns, note that pandas rows and columns of the position is from 0 to start) to select. Related functions are as follows:

1) loc, based on column label, can select a specific row (based on row index);

2) iloc, based on row/column position;

3) at, according to the specified row index and column label, quickly locate the elements of DataFrame;

4) iat, similar to at, the difference is that it is positioned according to position;

5) ix, a hybrid of loc and iloc, supports both label and position;

an actual example

import pandas as pd
import numpy as np


df = ({'total_bill': [16.99, 10.34, 23.68, 23.68, 24.59],
          'tip': [1.01, 1.66, 3.50, 3.31, 3.61],
          'sex': ['Female', 'Male', 'Male', 'Male', 'Female']})
# data type of columns
print 
# indexes
print 
# return 
print 
# each row, return array[array]
print 
print df
sex      object
tip      float64
total_bill  float64
dtype: object
RangeIndex(start=0, stop=5, step=1)
Index([u'sex', u'tip', u'total_bill'], dtype='object')
[['Female' 1.01 16.99]
 ['Male' 1.66 10.34]
 ['Male' 3.5 23.68]
 ['Male' 3.31 23.68]
 ['Female' 3.61 24.59]]
   sex  tip total_bill
0 Female 1.01    16.99
1  Male 1.66    10.34
2  Male 3.50    23.68
3  Male 3.31    23.68
4 Female 3.61    24.59
print [1:3, ['total_bill', 'tip']]
print [1:3, 'tip': 'total_bill']
print [1:3, [1, 2]]
print [1:3, 1: 3]
  total_bill  tip
1    10.34 1.66
2    23.68 3.50
3    23.68 3.31
  tip total_bill
1 1.66    10.34
2 3.50    23.68
3 3.31    23.68
  tip total_bill
1 1.66    10.34
2 3.50    23.68
  tip total_bill
1 1.66    10.34
2 3.50    23.68

Misrepresentation:

print [1:3, [2, 3]]#.locOnly column name operations are supported
KeyError: 'None of [[2, 3]] are in the [columns]'
print [[2, 3]]#.locColumns can be left out,then the line selection
  sex  tip total_bill
2 Male 3.50    23.68
3 Male 3.31    23.68
print [1:3]#.ilocColumns can be left out,then the line selection
sex  tip total_bill
1 Male 1.66    10.34
2 Male 3.50    23.68
print [1:3, 'tip': 'total_bill']
TypeError: cannot do slice indexing on <class ''> with these indexers [tip] of <type 'str'>

print [3, 'tip']
print [3, 1]
print [1:3, [1, 2]]
print [1:3, ['total_bill', 'tip']]
3.31
3.31
  tip total_bill
1 1.66    10.34
2 3.50    23.68
3 3.31    23.68
  total_bill  tip
1    10.34 1.66
2    23.68 3.50
3    23.68 3.31
print [[1, 2]]#line selection
  sex  tip total_bill
1 Male 1.66    10.34
2 Male 3.50    23.68
print df[1: 3]
print df[['total_bill', 'tip']]
# print df[1:2, ['total_bill', 'tip']] # TypeError: unhashable type
sex  tip total_bill
1 Male 1.66    10.34
2 Male 3.50    23.68
  total_bill  tip
0    16.99 1.01
1    10.34 1.66
2    23.68 3.50
3    23.68 3.31
4    24.59 3.61
print df[1:3,1:2]
TypeError: unhashable type

summarize

1) .loc, .iloc, .ix, only the first parameter such as .loc([1,2]), .iloc([2:3]), .ix[2]... then the line selection is carried out!

2) .loc, .at, select the column is only the name of the column, can not be position

3) .iloc, .iat, select the column is only the position, not the column name

(4) df [] can only row selection, or column selection, can not be selected at the same time column selection, column selection can only be the name of the column.

Above this python pandas dataframe rows and columns selection, slicing operation method is all that I share with you, I hope to be able to give you a reference, and I hope that you support me more.