Hierarchical indexing is an important feature of pandas that enables you to have multiple (more than two) index levels on a single axis.
Creates a Series and indexes it with a list or array of lists.
data=Series((10), index=[['a','a','a','b','b','b','c','c','d','d'], [1,2,3,1,2,3,1,2,2,3]]) data Out[6]: a 1 -2.842857 2 0.376199 3 -0.512978 b 1 0.225243 2 -1.242407 3 -0.663188 c 1 -0.149269 2 -1.079174 d 2 -0.952380 3 -1.113689 dtype: float64
This is the formatted output of a Series with MultiIndex indexes. The "spacing" between indexes means "use the labels above directly".
Out[7]: MultiIndex(levels=[['a', 'b', 'c', 'd'], [1, 2, 3]], labels=[[0, 0, 0, 1, 1, 1, 2, 2, 3, 3], [0, 1, 2, 0, 1, 2, 0, 1, 1, 2]])
For a hierarchically indexed object, the operation of picking a subset of data is simple:
data['b'] Out[8]: 1 0.225243 2 -1.242407 3 -0.663188 dtype: float64 data['b':'c'] Out[10]: b 1 0.225243 2 -1.242407 3 -0.663188 c 1 -0.149269 2 -1.079174 dtype: float64 [['b','d']] __main__:1: DeprecationWarning: .ix is deprecated. Please use .loc for label based indexing or .iloc for positional indexing See the documentation here: /pandas-docs/stable/#ix-indexer-is-deprecated Out[11]: b 1 0.225243 2 -1.242407 3 -0.663188 d 2 -0.952380 3 -1.113689 dtype: float64
You can even select it from the "inner layer":
data[:,2] Out[12]: a 0.376199 b -1.242407 c -1.079174 d -0.952380 dtype: float64
Hierarchical indexes play an important role in data remodeling and grouping-based operations.
can be rearranged into a DataFrame via the unstack method:
() Out[13]: 1 2 3 a -2.842857 0.376199 -0.512978 b 0.225243 -1.242407 -0.663188 c -0.149269 -1.079174 NaN d NaN -0.952380 -1.113689 The inverse of #unstack is stack. ().stack() Out[14]: a 1 -2.842857 2 0.376199 3 -0.512978 b 1 0.225243 2 -1.242407 3 -0.663188 c 1 -0.149269 2 -1.079174 d 2 -0.952380 3 -1.113689 dtype: float64
For DataFrame, each axis can have hierarchical indexes:
frame=DataFrame((12).reshape((4,3)), index=[['a','a','b','b'],[1,2,1,2]], columns=[['Ohio','Ohio','Colorado'], ['Green','Red','Green']]) frame Out[16]: Ohio Colorado Green Red Green a 1 0 1 2 2 3 4 5 b 1 6 7 8 2 9 10 11
The layers can have names. If names are specified, they will be displayed in the console (don't confuse index names with axis labels!)
=['key1','key2'] =['state','color'] frame Out[22]: state Ohio Colorado color Green Red Green key1 key2 a 1 0 1 2 2 3 4 5 b 1 6 7 8 2 9 10 11
Column grouping can be easily selected thanks to the divisional column index:
frame['Ohio'] Out[23]: color Green Red key1 key2 a 1 0 1 2 3 4 b 1 6 7 2 9 10
Rearrangement of hierarchical ordering
Sometimes it is necessary to reorder the levels on an axis, or to sort the data according to the values on a specified level. swaplevel accepts two level numbers or names and returns a new object with the levels swapped (but the data is not changed):
('key1','key2') Out[24]: state Ohio Colorado color Green Red Green key2 key1 1 a 0 1 2 2 a 3 4 5 1 b 6 7 8 2 b 9 10 11
sortlevel sorts the data according to the values in the individual levels. When swapping levels, it is common to get a sortlevel so that the end result is also ordered:
(0,1) Out[27]: state Ohio Colorado color Green Red Green key2 key1 1 a 0 1 2 2 a 3 4 5 1 b 6 7 8 2 b 9 10 11 # Swap levels 0,1 (i.e. key1,key2) # Then sort on axis=0 (0,1).sortlevel(0) __main__:1: FutureWarning: sortlevel is deprecated, use sort_index(level= ...) Out[28]: state Ohio Colorado color Green Red Green key2 key1 1 a 0 1 2 b 6 7 8 2 a 3 4 5 b 9 10 11
Summary statistics by level
Sometimes it is necessary to reorder the levels on an axis, or to sort the data according to the values on a specified level. swaplevel accepts two level numbers or names and returns a new object with the levels swapped (but the data is not changed):
(level='key2') Out[29]: state Ohio Colorado color Green Red Green key2 1 6 8 10 2 12 14 16 (level='color',axis=1) Out[30]: color Green Red key1 key2 a 1 2 1 2 8 4 b 1 14 7 2 20 10
Columns using DataFrame
Use one or more columns of a DataFrame as a row index, or turn a row index into a column of a Dataframe.
frame=DataFrame({'a':range(7),'b':range(7,0,-1), 'c':['one','one','one','two','two','two','two'], 'd':[0,1,2,0,1,2,3]}) frame Out[32]: a b c d 0 0 7 one 0 1 1 6 one 1 2 2 5 one 2 3 3 4 two 0 4 4 3 two 1 5 5 2 two 2 6 6 1 two 3
A DataFrame's set_index function converts one or more of its columns to a row index and creates a new DataFrame:
frame2=frame.set_index(['c','d']) frame2 Out[34]: a b c d one 0 0 7 1 1 6 2 2 5 two 0 3 4 1 4 3 2 5 2 3 6 1
By default, those columns are removed from the DataFrame, but it is possible to keep them there:
frame.set_index(['c','d'],drop=False) Out[35]: a b c d c d one 0 0 7 one 0 1 1 6 one 1 2 2 5 one 2 two 0 3 4 two 0 1 4 3 two 1 2 5 2 two 2 3 6 1 two 3
The function of reset_index is just the opposite of set_index, where the level of the hierarchical index is moved inside the column:
frame2.reset_index() Out[36]: c d a b 0 one 0 0 7 1 one 1 1 6 2 one 2 2 5 3 two 0 3 4 4 two 1 4 3 5 two 2 5 2 6 two 3 6 1
This is the whole content of this article.