pandas implementation of removing duplicate columns

Data preparation

Suppose we currently have two data tables:

① A data table for three people with their ids and a few other attributes.

import pandas as pd
import numpy as np
data = ((low=1,high=20,size=(3,4)))
data['id'] = range(1,4)
# exports：included among these，far left0 1 2 Indexing it

② Another data table is the app operation log information of 3 users, one will have multiple app operation records

sample = ((low=1,high=9,size=(7,1)),columns=['hhh'])
sample['id'] = [1,1,2,2,3,3,3]
# exports：

Description of the problem

First of all, we need to count the number of records operated by each user app, for example, the above table shows that there are 2 records operated by user id 1 and 3 records operated by user id 3.

s = ('id').count()
# exports：

② At this point, S is a Series structure indexed by id, and the number of records counted out is value. Because we need to merge the id column later, we need to make the id column from the index column to a real column.

s = s.reset_index()
# exports：

③ Merge S with the uppermost data table, we don't want to see duplicate id columns, even we can extend the problem to S with the data table not only duplicate id columns, there are many other columns duplicates, so how to ensure that there are no duplicate columns after merging them?

prescription

The first idea is to use ('column name') or to use del DataFrame['column name']

But if we use that method, it will remove all the duplicate columns and will not fulfill our requirement.

By: Reference* Answers

cols_to_use = () # pandas version 0.15 and above can use this method, the method to find the S and data table of different columns, and then merge the
(data, s[cols_to_use], left_index=True, right_index=True, how='outer')

This is the whole content of this article.