SQL in the select is based on the name of the column to select; Pandas is more flexible, not only according to the name of the column to select, but also according to the column where the position (number, in the first few rows and columns, note that pandas rows and columns of the position is from 0 to start) to select. Related functions are as follows:
1) loc, based on column label, can select a specific row (based on row index);
2) iloc, based on row/column position;
3) at, according to the specified row index and column label, quickly locate the elements of DataFrame;
4) iat, similar to at, the difference is that it is positioned according to position;
5) ix, a hybrid of loc and iloc, supports both label and position;
an actual example
import pandas as pd import numpy as np df = ({'total_bill': [16.99, 10.34, 23.68, 23.68, 24.59], 'tip': [1.01, 1.66, 3.50, 3.31, 3.61], 'sex': ['Female', 'Male', 'Male', 'Male', 'Female']}) # data type of columns print # indexes print # return print # each row, return array[array] print print df
sex object tip float64 total_bill float64 dtype: object RangeIndex(start=0, stop=5, step=1) Index([u'sex', u'tip', u'total_bill'], dtype='object') [['Female' 1.01 16.99] ['Male' 1.66 10.34] ['Male' 3.5 23.68] ['Male' 3.31 23.68] ['Female' 3.61 24.59]] sex tip total_bill 0 Female 1.01 16.99 1 Male 1.66 10.34 2 Male 3.50 23.68 3 Male 3.31 23.68 4 Female 3.61 24.59
print [1:3, ['total_bill', 'tip']] print [1:3, 'tip': 'total_bill'] print [1:3, [1, 2]] print [1:3, 1: 3]
total_bill tip 1 10.34 1.66 2 23.68 3.50 3 23.68 3.31 tip total_bill 1 1.66 10.34 2 3.50 23.68 3 3.31 23.68 tip total_bill 1 1.66 10.34 2 3.50 23.68 tip total_bill 1 1.66 10.34 2 3.50 23.68
Misrepresentation:
print [1:3, [2, 3]]#.locOnly column name operations are supported
KeyError: 'None of [[2, 3]] are in the [columns]'
print [[2, 3]]#.locColumns can be left out,then the line selection
sex tip total_bill 2 Male 3.50 23.68 3 Male 3.31 23.68
print [1:3]#.ilocColumns can be left out,then the line selection
sex tip total_bill 1 Male 1.66 10.34 2 Male 3.50 23.68
print [1:3, 'tip': 'total_bill']
TypeError: cannot do slice indexing on <class ''> with these indexers [tip] of <type 'str'>
print [3, 'tip'] print [3, 1] print [1:3, [1, 2]] print [1:3, ['total_bill', 'tip']]
3.31 3.31 tip total_bill 1 1.66 10.34 2 3.50 23.68 3 3.31 23.68 total_bill tip 1 10.34 1.66 2 23.68 3.50 3 23.68 3.31
print [[1, 2]]#line selection
sex tip total_bill 1 Male 1.66 10.34 2 Male 3.50 23.68
print df[1: 3] print df[['total_bill', 'tip']] # print df[1:2, ['total_bill', 'tip']] # TypeError: unhashable type
sex tip total_bill 1 Male 1.66 10.34 2 Male 3.50 23.68 total_bill tip 0 16.99 1.01 1 10.34 1.66 2 23.68 3.50 3 23.68 3.31 4 24.59 3.61
print df[1:3,1:2]
TypeError: unhashable type
summarize
1) .loc, .iloc, .ix, only the first parameter such as .loc([1,2]), .iloc([2:3]), .ix[2]... then the line selection is carried out!
2) .loc, .at, select the column is only the name of the column, can not be position
3) .iloc, .iat, select the column is only the position, not the column name
(4) df [] can only row selection, or column selection, can not be selected at the same time column selection, column selection can only be the name of the column.
Above this python pandas dataframe rows and columns selection, slicing operation method is all that I share with you, I hope to be able to give you a reference, and I hope that you support me more.