Create category
Created with Series
In the creation of Series at the same time add dtype="category" can create a good category. category is divided into two parts, one is the order, one is the literal:
In [1]: s = (["a", "b", "c", "a"], dtype="category") In [2]: s Out[2]: 0 a 1 b 2 c 3 a dtype: category Categories (3, object): ['a', 'b', 'c']
It is possible to convert a Series in DF to a category:
In [3]: df = ({"A": ["a", "b", "c", "a"]}) In [4]: df["B"] = df["A"].astype("category") In [5]: df["B"] Out[32]: 0 a 1 b 2 c 3 a Name: B, dtype: category Categories (3, object): [a, b, c]
It is possible to create a good , passing it as a parameter to Series:
In [10]: raw_cat = ( ....: ["a", "b", "c", "a"], categories=["b", "c", "d"], ordered=False ....: ) ....: In [11]: s = (raw_cat) In [12]: s Out[12]: 0 NaN 1 b 2 c 3 NaN dtype: category Categories (3, object): ['b', 'c', 'd']
Created with DF
You can also pass in dtype="category" when creating a DataFrame:
In [17]: df = ({"A": list("abca"), "B": list("bccd")}, dtype="category") In [18]: Out[18]: A category B category dtype: object
A and B in DF are both a category.
In [19]: df["A"] Out[19]: 0 a 1 b 2 c 3 a Name: A, dtype: category Categories (3, object): ['a', 'b', 'c'] In [20]: df["B"] Out[20]: 0 b 1 c 2 c 3 d Name: B, dtype: category Categories (3, object): ['b', 'c', 'd']
Or use ("category") to convert all the Series in DF to category:.
In [21]: df = ({"A": list("abca"), "B": list("bccd")}) In [22]: df_cat = ("category") In [23]: df_cat.dtypes Out[23]: A category B category dtype: object
Creating Controls
By default the category created by passing dtype='category' uses the default value:
is extrapolated from the data.
是没有大小顺序的。
可以显示创建CategoricalDtypeto change the two defaults above:
In [26]: from import CategoricalDtype In [27]: s = (["a", "b", "c", "a"]) In [28]: cat_type = CategoricalDtype(categories=["b", "c", "d"], ordered=True) In [29]: s_cat = (cat_type) In [30]: s_cat Out[30]: 0 NaN 1 b 2 c 3 NaN dtype: category Categories (3, object): ['b' < 'c' < 'd']
The same CategoricalDtype can also be used in DF:
In [31]: from import CategoricalDtype In [32]: df = ({"A": list("abca"), "B": list("bccd")}) In [33]: cat_type = CategoricalDtype(categories=list("abcd"), ordered=True) In [34]: df_cat = (cat_type) In [35]: df_cat["A"] Out[35]: 0 a 1 b 2 c 3 a Name: A, dtype: category Categories (4, object): ['a' < 'b' < 'c' < 'd'] In [36]: df_cat["B"] Out[36]: 0 b 1 c 2 c 3 d Name: B, dtype: category Categories (4, object): ['a' < 'b' < 'c' < 'd']
Convert to original type
utilization(original_dtype)
or(categorical)
Category can be converted to its original type:
In [39]: s = (["a", "b", "c", "a"]) In [40]: s Out[40]: 0 a 1 b 2 c 3 a dtype: object In [41]: s2 = ("category") In [42]: s2 Out[42]: 0 a 1 b 2 c 3 a dtype: category Categories (3, object): ['a', 'b', 'c'] In [43]: (str) Out[43]: 0 a 1 b 2 c 3 a dtype: object In [44]: (s2) Out[44]: array(['a', 'b', 'c', 'a'], dtype=object)
Categories operation
Get attributes of category
Categorical data arecategories
和 ordered
两个属性。可以通过 和
来获取:
In [57]: s = (["a", "b", "c", "a"], dtype="category") In [58]: Out[58]: Index(['a', 'b', 'c'], dtype='object') In [59]: Out[59]: False
重排category的顺序:
In [60]: s = ((["a", "b", "c", "a"], categories=["c", "b", "a"])) In [61]: Out[61]: Index(['c', 'b', 'a'], dtype='object') In [62]: Out[62]: False
重命名categories
You can rename categories by assigning a value to them.
In [67]: s = (["a", "b", "c", "a"], dtype="category") In [68]: s Out[68]: 0 a 1 b 2 c 3 a dtype: category Categories (3, object): ['a', 'b', 'c'] In [69]: = ["Group %s" % g for g in ] In [70]: s Out[70]: 0 Group a 1 Group b 2 Group c 3 Group a dtype: category Categories (3, object): ['Group a', 'Group b', 'Group c']
The same effect can be achieved using rename_categories:
In [71]: s = .rename_categories([1, 2, 3]) In [72]: s Out[72]: 0 1 1 2 2 3 3 1 dtype: category Categories (3, int64): [1, 2, 3]
Or use a dictionary object:
# You can also pass a dict-like object to map the renaming In [73]: s = .rename_categories({1: "x", 2: "y", 3: "z"}) In [74]: s Out[74]: 0 x 1 y 2 z 3 x dtype: category Categories (3, object): ['x', 'y', 'z']
Adding categories with add_categories
You can use add_categories to add categories.
In [77]: s = .add_categories([4]) In [78]: Out[78]: Index(['x', 'y', 'z', 4], dtype='object') In [79]: s Out[79]: 0 x 1 y 2 z 3 x dtype: category Categories (4, object): ['x', 'y', 'z', 4]
Removing categories with remove_categories
In [80]: s = .remove_categories([4]) In [81]: s Out[81]: 0 x 1 y 2 z 3 x dtype: category Categories (3, object): ['x', 'y', 'z']
Delete unused cagtegory
In [82]: s = ((["a", "b", "a"], categories=["a", "b", "c", "d"])) In [83]: s Out[83]: 0 a 1 b 2 a dtype: category Categories (4, object): ['a', 'b', 'c', 'd'] In [84]: .remove_unused_categories() Out[84]: 0 a 1 b 2 a dtype: category Categories (2, object): ['a', 'b']
Reset cagtegory
utilizationset_categories()
You can add and remove categories at the same time:
In [85]: s = (["one", "two", "four", "-"], dtype="category") In [86]: s Out[86]: 0 one 1 two 2 four 3 - dtype: category Categories (4, object): ['-', 'four', 'one', 'two'] In [87]: s = .set_categories(["one", "two", "three", "four"]) In [88]: s Out[88]: 0 one 1 two 2 four 3 NaN dtype: category Categories (4, object): ['one', 'two', 'three', 'four']
Sort by category
If the category is created with ordered=True, then it can be ordered:
In [91]: s = (["a", "b", "c", "a"]).astype(CategoricalDtype(ordered=True)) In [92]: s.sort_values(inplace=True) In [93]: s Out[93]: 0 a 3 a 1 b 2 c dtype: category Categories (3, object): ['a' < 'b' < 'c'] In [94]: (), () Out[94]: ('a', 'c')
You can use as_ordered() or as_unordered() to force sorting or not:
In [95]: .as_ordered() Out[95]: 0 a 3 a 1 b 2 c dtype: category Categories (3, object): ['a' < 'b' < 'c'] In [96]: .as_unordered() Out[96]: 0 a 3 a 1 b 2 c dtype: category Categories (3, object): ['a', 'b', 'c']
reorder
Existing categories can be reordered using Categorical.reorder_categories():
In [103]: s = ([1, 2, 3, 1], dtype="category") In [104]: s = .reorder_categories([2, 3, 1], ordered=True) In [105]: s Out[105]: 0 1 1 2 2 3 3 1 dtype: category Categories (3, int64): [2 < 3 < 1]
multicolumn sorting
sort_values supports multiple columns for sorting:
In [109]: dfs = ( .....: { .....: "A": ( .....: list("bbeebbaa"), .....: categories=["e", "a", "b"], .....: ordered=True, .....: ), .....: "B": [1, 2, 1, 2, 2, 1, 2, 1], .....: } .....: ) .....: In [110]: dfs.sort_values(by=["A", "B"]) Out[110]: A B 2 e 1 3 e 2 7 a 1 6 a 2 0 b 1 5 b 1 1 b 2 4 b 2
comparison operation
If ordered==True is set at creation time, then comparison operations can be performed between categories. Supported ==
, !=
, >
, >=
, <
, and<=
These operators.
In [113]: cat = ([1, 2, 3]).astype(CategoricalDtype([3, 2, 1], ordered=True)) In [114]: cat_base = ([2, 2, 2]).astype(CategoricalDtype([3, 2, 1], ordered=True)) In [115]: cat_base2 = ([2, 2, 2]).astype(CategoricalDtype(ordered=True)) In [119]: cat > cat_base Out[119]: 0 True 1 False 2 False dtype: bool In [120]: cat > 2 Out[120]: 0 True 1 False 2 False dtype: bool
Other operations
Cagetory is still essentially a Series, so the Series operations category are basically available, such as: (), () and ().
value_counts:
In [131]: s = ((["a", "b", "c", "c"], categories=["c", "a", "b", "d"])) In [132]: s.value_counts() Out[132]: c 2 a 1 b 1 d 0 dtype: int64
():
In [133]: columns = ( .....: ["One", "One", "Two"], categories=["One", "Two", "Three"], ordered=True .....: ) .....: In [134]: df = ( .....: data=[[1, 2, 3], [4, 5, 6]], .....: columns=.from_arrays([["A", "B", "B"], columns]), .....: ) .....: In [135]: (axis=1, level=1) Out[135]: One Two Three 0 3 3 0 1 9 6 0
Groupby:
In [136]: cats = ( .....: ["a", "b", "b", "b", "c", "c", "c"], categories=["a", "b", "c", "d"] .....: ) .....: In [137]: df = ({"cats": cats, "values": [1, 2, 2, 2, 3, 4, 5]}) In [138]: ("cats").mean() Out[138]: values cats a 1.0 b 2.0 c 4.0 d NaN In [139]: cats2 = (["a", "a", "b", "b"], categories=["a", "b", "c"]) In [140]: df2 = ( .....: { .....: "cats": cats2, .....: "B": ["c", "d", "c", "d"], .....: "values": [1, 2, 3, 4], .....: } .....: ) .....: In [141]: (["cats", "B"]).mean() Out[141]: values cats B a c 1.0 d 2.0 b c 3.0 d 4.0 c c NaN d NaN
Pivot tables:
In [142]: raw_cat = (["a", "a", "b", "b"], categories=["a", "b", "c"]) In [143]: df = ({"A": raw_cat, "B": ["c", "d", "c", "d"], "values": [1, 2, 3, 4]}) In [144]: pd.pivot_table(df, values="values", index=["A", "B"]) Out[144]: values A B a c 1 d 2 b c 3 d 4
to this article on the use of Pandas data types of the category of the article is introduced to this, more related to the use of category content please search for my previous articles or continue to browse the following related articles I hope that you will support me in the future more!