Merging two dataframes without a common column is equivalent to solving the Cartesian product by row number.
The final result is as follows
The following code is modified with reference to other people's code:
def cartesian_df(A,B): new_df = (columns=list(A).extend(list(B))) for _,A_row in (): for _,B_row in (): row = A_row.append(B_row) new_df = new_df.append(row,ignore_index=True) return new_df #this approach,Error if two tables have duplicate column names
The idea of this code is to loop over each row of the two tables, which runs slowly and should have a complexity of O(m*n), where m is the number of rows in table A and n is the number of rows in table B.
I optimized the code for the above because the merge table I used had more rows and the time was too slow.
The idea is to use the merge function of dataframe, first loop to copy the A table, add the number of loops as columns, and merge directly using merge, the complexity should be O(n) (n is the number of rows in the B table), the code is as follows:
def cartesian_df(df_a,df_b): 'Find the Cartesian product of two dataframes' #df_a Copy n times, index with copy times new_df_a = (columns=list(df_a)) for i in range(0,df_b.shape[0]): df_a['merge_index'] = i new_df_a = new_df_a.append(df_a,ignore_index=True) #df_b set index to rows df_b.reset_index(inplace = True, drop =True) df_b['merge_index'] = df_b.index #merge new_df = (new_df_a,df_b,on=['merge_index'],how='left').drop(['merge_index'],axis = 1) return new_df #The two original tables cannot have column names'merge_index'
Tested with a table of 8 rows and a table of 142 rows, the pre-optimization method took: 5.560689926147461 seconds
Optimized method time: 0.1296539306640625 seconds (table with 142 rows as b-table)
According to the principle of calculation, put the table with less rows in table b can be faster, test time: 0.021603107452392578 seconds (8 rows of table as table b)
This speed is already as expected, and the optimization is complete with basically no feeling of waiting.
This is the whole content of this article.