1. Converting an index from a groupby operation to a column
groupby
Grouping methods are often used. For example, the following grouping is done by adding a grouping column TEAM.
>>> df0["team"] = ["X", "X", "Y", "Y", "Y"] >>> df0 A B C team 0 0.548012 0.288583 0.734276 X 1 0.342895 0.207917 0.995485 X 2 0.378794 0.160913 0.971951 Y 3 0.039738 0.008414 0.226510 Y 4 0.581093 0.750331 0.133022 Y >>> ("team").mean() A B C team X 0.445453 0.248250 0.864881 Y 0.333208 0.306553 0.443828
By default, the grouping programs the grouped columnsindex
Index. But in many cases, we don't want the grouped columns to become indexes, because there may be some calculations or judgment logic that still need to use the columns. Therefore, we need to set up so that the grouped columns do not become indexes, and at the same time can also fulfill the function of grouping.
There are two ways to accomplish the desired operation, the first is to use thereset_index
The second is in thegroupby
Setting in the methodas_index=False
. Personally, I prefer the second method, which involves only two steps and is more concise.
>>> ("team").mean().reset_index() team A B C 0 X 0.445453 0.248250 0.864881 1 Y 0.333208 0.306553 0.443828 >>> ("team", as_index=False).mean() team A B C 0 X 0.445453 0.248250 0.864881 1 Y 0.333208 0.306553 0.443828
2. Use an existing DataFrame to set up indexes.
Of course, if the data has already been read or after doing some data processing steps, we can pass theset_index
Set the index manually.
>>> df = pd.read_csv("", parse_dates=["date"]) >>> df.set_index("date") temperature humidity date 2021-07-01 95 50 2021-07-02 94 55 2021-07-03 94 56
There are two things to keep in mind here:
-
set_index
method will by default create a newDataFrame
. To change the index of a df in-place, you need to set theinplace=True
。
df.set_index(“date”, inplace=True)
- If you want to keep the columns that will be set as indexes, you can set the
drop=False
。
df.set_index(“date”, drop=False)
3. Resetting indexes after some operations
in dealing withDataFrame
When you do this, certain operations (e.g., row deletion, index selection, etc.) will generate a subset of the original index, so that the default numeric index ordering is messed up. To regenerate consecutive indexes, you can use thereset_index
Methods.
>>> df0 = ((5, 3), columns=list("ABC")) >>> df0 A B C 0 0.548012 0.288583 0.734276 1 0.342895 0.207917 0.995485 2 0.378794 0.160913 0.971951 3 0.039738 0.008414 0.226510 4 0.581093 0.750331 0.133022 >>> df1 = df0[ % 2 == 0] >>> df1 A B C 0 0.548012 0.288583 0.734276 2 0.378794 0.160913 0.971951 4 0.581093 0.750331 0.133022 >>> df1.reset_index(drop=True) A B C 0 0.548012 0.288583 0.734276 1 0.378794 0.160913 0.971951 2 0.581093 0.750331 0.133022
Normally, we don't need to keep the old index, so the drop parameter can be set to True. similarly, to reset the index in place, set theinplace
parameter is True, otherwise a newDataFrame
。
4. Reset index after sorting
replace the old with newsort_value
This problem is also encountered when sorting methods, because by default, indexesindex
It follows the sort order, so it's messy snow. If we want the index to not follow the sort order, again, we need to add a new index to thesort_values
Set the parameters in the methodignore_index
Ready to go.
>>> df0.sort_values("A") A B C team 3 0.039738 0.008414 0.226510 Y 1 0.342895 0.207917 0.995485 X 2 0.378794 0.160913 0.971951 Y 0 0.548012 0.288583 0.734276 X 4 0.581093 0.750331 0.133022 Y >>> df0.sort_values("A", ignore_index=True) A B C team 0 0.039738 0.008414 0.226510 Y 1 0.342895 0.207917 0.995485 X 2 0.378794 0.160913 0.971951 Y 3 0.548012 0.288583 0.734276 X 4 0.581093 0.750331 0.133022 Y
5. Reset the index after deleting duplicates
Removing duplicates, like sorting, also upsets the sort order when executed by default. Similarly, thedrop_duplicates
method.ignore_index
parametersTrue
Ready to go.
>>> df0 A B C team 0 0.548012 0.288583 0.734276 X 1 0.342895 0.207917 0.995485 X 2 0.378794 0.160913 0.971951 Y 3 0.039738 0.008414 0.226510 Y 4 0.581093 0.750331 0.133022 Y >>> df0.drop_duplicates("team", ignore_index=True) A B C team 0 0.548012 0.288583 0.734276 X 1 0.378794 0.160913 0.971951 Y
6. Direct assignment of indexes
When we have aDataFrame
When you want to assign indexes using a different data source or a separate operation. In this case, the index can be assigned directly to an existing
。
>>> better_index = ["X1", "X2", "Y1", "Y2", "Y3"] >>> = better_index >>> df0 A B C team X1 0.548012 0.288583 0.734276 X X2 0.342895 0.207917 0.995485 X Y1 0.378794 0.160913 0.971951 Y Y2 0.039738 0.008414 0.226510 Y Y3 0.581093 0.750331 0.133022 Y
7. Ignore indexes when writing CSV files
Default when exporting data to a CSV fileDataFrame
has an index starting from 0. If we don't want to include it in the exported CSV file, we can add it in theto_csv
method.index
Parameters.
>>> df0.to_csv("exported_file.csv", index=False)
As shown below, the index columns are not included in the exported CSV file.
In fact, many of the methods have been set up on the index, but we are generally more concerned about the data, and often ignore the index, which leads to continue to run may report an error. The above several high-frequency operations are indexed settings, it is recommended that you usually use the habit of setting the index, which will save a lot of time.
8. Specify the index column when reading
In many cases, our data source is a CSV file. Suppose there is a file named, contains the following data.
date,temperature,humidity 07/01/21,95,50 07/02/21,94,55 07/03/21,94,56
By default, pandas will create an indexed row starting at 0 as follows:
>>> pd.read_csv("", parse_dates=["date"]) date temperature humidity 0 2021-07-01 95 50 1 2021-07-02 94 55 2 2021-07-03 94 56
However, we can make it easier for the importing process by setting theindex_col
A parameter set to a column can directly specify the index column.
>>> pd.read_csv("", parse_dates=["date"], index_col="date") temperature humidity date 2021-07-01 95 50 2021-07-02 94 55 2021-07-03 94 56
to this article on the sharing of 8 commonly used pandas index settings of the article is introduced to this, more related to commonly used pandas index settings content please search for my previous posts or continue to browse the following related articles I hope you will support me in the future more!