Often encountered in the work of the need for data output to excel, and the need to merge some of these cells, such as the following table form, the need to merge the value of column A, B, C, the corresponding cells of columns
The to_excel method in pandas can only merge indexes, and xlsxwriter, although there is a merge_range method, but it is just a basic method, every time you need to write a cumbersome test to finally tune, and can not be reused well. So I want to write a method by myself, combining dataframe and merge_range. roughly the idea is:
1, define a MY_DataFrame class, inheriting the DataFrame class, so that you can make good use of many of the features of pandas, without having to reorganize their own data structure.
2, define a my_mergewr_excel method, the parameters are: the path to the output excel, used to determine whether the merger of the key_cols list, used to specify which columns on the cells need to be merged list
3. Encapsulate MY_DataFrame as a My_Module module for reuse.
The algorithm for merging is as follows:
1, according to the given parameters of the [key columns], group counting and sorting, add CN and RN two auxiliary columns
2, to determine the CN is greater than 1, the group needs to be merged, otherwise the group (rows) do not need to be merged (CN = 1 indicates that this group of data rows is unique, no need to merge)
3, corresponding to the need to merge the group, determine whether the current column is in the given parameters [merge column], is to write excel cells with the merger, otherwise it is an ordinary write excel cells.
4, in the need to merge the columns, if for the RN = 1 for the call merge_range, a one-time write want to write CN cells, if the RN & gt; 1 to skip the cell, because in the RN = 1, has been merged to write the cell, if then repeat the call erge_range, open the excel document will be reported as an error.
Explain with a diagram as follows:
The specific code is as follows:
# -*- coding: utf-8 -*- """ Created on 20170301 @author: ARK-Z """ import xlsxwriter import pandas as pd class My_DataFrame(): def __init__(self, data=None, index=None, columns=None, dtype=None, copy=False): .__init__(self, data, index, columns, dtype, copy) def my_mergewr_excel(self,path,key_cols=[],merge_cols=[]): # sheet_name='Sheet1', na_rep='', float_format=None, columns=None, header=True, index=True, index_label=None, startrow=0, startcol=0, engine=None, merge_cells=True, encoding=None, inf_rep='inf', verbose=True): self_copy=My_DataFrame(self,copy=True) line_cn=self_copy. cols=list(self_copy.) if all([v in cols for i,v in enumerate(key_cols)])==False: #Checks if each element of key_cols contains the same column as the object. print("key_cols is not completely include object's columns") return False if all([v in cols for i,v in enumerate(merge_cols)])==False: # Check if each element of merge_cols contains the same columns as the object. print("merge_cols is not completely include object's columns") return False wb2007 = (path) worksheet2007 = wb2007.add_worksheet() format_top = wb2007.add_format({'border':1,'bold':True,'text_wrap':True}) format_other = wb2007.add_format({'border':1,'valign':'vcenter'}) for i,value in enumerate(cols): # Write table header #print(value) (0,i,value,format_top) #merge_cols=['B','A','C'] #key_cols=['A','B'] if key_cols ==[]: #If the key_cols parameter is not passed, no merge is required. self_copy['RN']=1 self_copy['CN']=1 else: self_copy['RN']=self_copy.groupby(key_cols,as_index=False).rank(method='first').ix[:,0] #key_cols as the basis for whether to merge or not self_copy['CN']=self_copy.groupby(key_cols,as_index=False).rank(method='max').ix[:,0] #print(self) for i in range(line_cn): if self_copy.ix[i,'CN']>1: #print('This row has cells that need to be merged') for j,col in enumerate(cols): #print(self_copy.ix[i,col]) if col in (merge_cols): # Which columns need to be merged if self_copy.ix[i,'RN']==1: # Merge writes the first cell, the next first will no longer be written worksheet2007.merge_range(i+1,j,i+int(self_copy.ix[i,'CN']),j, self_copy.ix[i,col],format_other) ## Merge cells, according to LINE_SET[7] to determine the need to merge several #(i+1,j,[i,col]) else: pass #(i+1,j,[i,j]) else: (i+1,j,self_copy.ix[i,col],format_other) #print(',') else: #print('There are no cells in this row that need to be merged') for j,col in enumerate(cols): #print([i,col]) (i+1,j,self_copy.ix[i,col],format_other) () self_copy.drop('CN', axis=1) self_copy.drop('RN', axis=1)
Calling code:
import My_Module DF=My_DataFrame({'A':[1,2,2,2,3,3],'B':[1,1,1,1,1,1],'C':[1,1,1,1,1,1],'D':[1,1,1,1,1,1]}) DF Out[120]: A B C D 0 1 1 1 1 1 2 1 1 1 2 2 1 1 1 3 2 1 1 1 4 3 1 1 1 5 3 1 1 1 DF.my_mergewr_excel('000_2.xlsx',['A'],['B','C'])
The effect is as follows:
You can also set up to merge columns A and B:
DF.my_mergewr_excel('000_2.xlsx',['A'],['A','B'])
The effect is as follows:
This is the whole content of this article.