SoFunction
Updated on 2024-12-13

Share tips for exporting large amounts of data to Excel using python

(1) Description of the problem: In order to better display the data, Excel format data files often have more advantages than text files, but specifically in python, how to export data to Excel? If encountered the need to export a large amount of data and how to operate?

This paper addresses both of these issues.

(2) The specific steps are as follows:

1. The first step is to install openpyxl.

I can use pip install openpyxl, but it installs version 2.2.6 under windows, but centos automatically installs version 4.1, (thanks for the reminder, Hai).

Write the code in windows run no problem, but centos on the error, said ew=ExcelWriter (workbook=wb) less to provide a parameter, so decisively in the 237 server I have installed version 2.2.6, the problem is solved.

pip install openpyxl==2.2.6

2. The second step, haha, no la, nonsense not to say, directly on the code, ps, the code contains xlwt and openpyxl implementation of the two versions.

(3) Expanded reading: through the information, found that there are many different opinions on the Internet, summarized as follows:

There are two groups of module libs for python Excel-related operations, one for xlrd, xlwt, xlutils, and the other for openpyxl.

But the former group (xlrd, xlwt) is relatively old, can only deal with Excel 97-2003 or Excel 97 previous version of the xls format generated excel file, xlwt does not even support the 07 version of the excel, the format excel file in general, the maximum can only support 256 columns or 65536 rows of the excel file. This format excel file in general, the maximum can only support 256 columns or 65536 rows of excel files.

So faced with the need to export a large amount of data to excel, you will have the following three options.

(1) Change the storage format, such as saving as a CSV file

(2) Use openpyxl- as it supports handling of Excel 2007+ xlsx/xlsm format

(3) win32 COM (Windows only)

Of course, we had to face the difficulties head on, and we still chose the second one in order to better present the data to the product and the users.

ps, very lucky, after some searching I found it!openpyxlIt supports 07+ excel, has been maintained by someone, the documentation is clear and easy to read, refer to the Tutorial and API documentation to get started very quickly, that's it~!

(4) Without further ado, let's go straight to the code.

# coding:utf-8
'''
# I hope it's helpful # # Please ask more questions #
create by yaoyz
date: 2017/01/24
'''
import xlrd
import xlwt
# workbook related
from  import Workbook
# ExcelWriter, encapsulates the very powerful excel write function
from  import ExcelWriter
# An eggache for converting numbers to column letters
from  import get_column_letter
from  import load_workbook
class HandleExcel():
 '''Excel related operation classes'''
 def __init__(self):
  self. head_row_labels = [u'Student ID',u'Student Name',u'Contacts',u'Knowledge Point ID',u'Knowledge point name']
 """
  function:
   readouttxtEvery record in the file,Save it in thelistcenter
  Param:
   filename: 要readout的文件名
  Return:
   res_list: of the returned recordlist
 """
 def read_from_file(self,filename):
  res_list=[]
  file_obj=open(filename,"r")
  for line in file_obj.readlines():
   res_list.append(line)
  file_obj.close()
  return res_list
 """
  function:
   readout*.xlsxcenter的每一条记录,Save it in thedata_diccenter返回
  Param:
   excel_name: 要readout的文件名
  Return:
   data_dic: of the returned recorddict
 """
 def read_excel_with_openpyxl(self, excel_name=""):
  # Read excel 2007 files
  wb = load_workbook(filename=excel_name)
  # Show how many tables
  print "Worksheet range(s):" , wb.get_named_ranges()
  print "Worksheet name(s):" , wb.get_sheet_names()
  # Take the first table
  sheetnames = wb.get_sheet_names()
  ws = wb.get_sheet_by_name(sheetnames[0])
  # Display table name, number of table rows, number of table columns
  print "Work Sheet Titile:" ,
  print "Work Sheet Rows:" ,ws.get_highest_row()
  print "Work Sheet Cols:" ,ws.get_highest_column()
  # Get the number of rows and columns of the incoming excel table.
  row_num=ws.get_highest_row()
  col_num=ws.get_highest_column()
  print "row_num: ",row_num," col_num: ",col_num
  # Create dictionaries to store data
  data_dic = {}
  sign=1
  # Storing data in a dictionary
  for row in :
   temp_list=[]
   # print "row",row
   for cell in row:
     print ,
     temp_list.append()
   print ""
   data_dic[sign]=temp_list
   sign+=1
  print data_dic
  return data_dic
 """
  function:
   readout*.xlsxcenter的每一条记录,Save it in thedata_diccenter返回
  Param:
   records: preserved,A file containing every record of thelist
   save_excel_name: File name to save as
   head_row_stu_arrive_star:
  Return:
   data_dic: of the returned recorddict
 """
 def write_to_excel_with_openpyxl(self,records,head_row,save_excel_name=""):
  # Create a new workbook
  wb = Workbook()
  # Create a new excelWriter
  ew = ExcelWriter(workbook=wb)
  # Set the file output path and name
  dest_filename = save_excel_name.decode('utf-8')
  # The first sheet is ws
  ws = [0]
  # Set the name of the ws
   = "range names"
  # Write the first line, the title line
  for h_x in range(1,len(head_row)+1):
   h_col=get_column_letter(h_x)
   #print h_col
   ('%s%s' % (h_col, 1)).value = '%s' % (head_row[h_x-1])
  # Write the second and subsequent lines
  i = 2
  for record in records:
   record_list=str(record).strip().split("\t")
   for x in range(1,len(record_list)+1):
    col = get_column_letter(x)
    ('%s%s' % (col, i)).value = '%s' % (record_list[x-1].decode('utf-8'))
   i += 1
  # Write files
  (filename=dest_filename)
 """
  function:
   Test outputting Excel content
   Read the Excel file
  Param.
   excel_name: name of the Excel file to be read out
  Return.
   None
 """
 def read_excel(self,excel_name):
  workbook=xlrd.open_workbook(excel_name)
  print workbook.sheet_names()
  # Get all sheets
  print workbook.sheet_names() # [u'sheet1', u'sheet2']
  sheet2_name = workbook.sheet_names()[1]
  # Get sheet content by sheet index or name
  sheet2 = workbook.sheet_by_index(1) # The sheet index starts at 0
  sheet2 = workbook.sheet_by_name('Sheet1')
  # Name of sheet, number of rows, number of columns
  print ,,
  # Get whole rows and columns of values (arrays)
  rows = sheet2.row_values(3) # Get the fourth line
  cols = sheet2.col_values(2) # Get the contents of the third column
  print rows
  print cols
  # Get cell contents
  print (1,0).value
  print sheet2.cell_value(1,0)
  print (1)[0].value
  # Get the data type of the cell contents
  print (1,0).ctype
  # By name
  return workbook.sheet_by_name(u'Sheet1')
 """
  function:
   Sets the cell style
  Param.
   name: name of the font
   height: height of the font
   bold: whether to capitalize the font
  Return: style: Returns the set formatting object.
   style: Returns the set formatting object
 """
 def set_style(self,name,height,bold=False):
  style = () # Initialize styles
  font = () # Create fonts for styles
   = name # 'Times New Roman'
   = bold
  font.color_index = 4
   = height
  borders= ()
  = 6
  = 6
  = 6
  = 6
   = font
   = borders
  return style
 """
  function:
   Set cell styles according to Set cell styles to convert calculation results from txt to Excel storage
  Param.
   dataset: the result data to be saved, list storage
  Return.
   Save the result as an excel object
 """
 def write_to_excel(self, dataset,save_excel_name,head_row):
  f = () # Create workbooks
  # Create the first sheet.
  # sheet1
  count=1
  sheet1 = f.add_sheet(u'sheet1', cell_overwrite_ok=True) # Create sheets
  # First line heading:
  for p in range(len(head_row)):
    (0,p,head_row[p],self.set_style('Times New Roman',250,True))
  default=self.set_style('Times New Roman',200,False) # define style out the loop will work
  for line in dataset:
   row_list=str(line).strip("\n").split("\t")
   for pp in range(len(str(line).strip("\n").split("\t"))):
    (count,pp,row_list[pp].decode('utf-8'),default)
   count+=1
  (save_excel_name) # Save the document
 def run_main_save_to_excel_with_openpyxl(self):
  print "Test read/write 2007 and later excel file xlsx to facilitate writing more data to the file"
  print "1. Read txt file into memory and store it as a list object."
  dataset_list=self.read_from_file("test_excel.txt")
  '''test use openpyxl to handle EXCEL 2007'''
  print "2. Writing documents to Excel tables"
  head_row_label=self.head_row_labels
  save_name="test_openpyxl.xlsx"
  self.write_to_excel_with_openpyxl(dataset_list,head_row_label,save_name)
  print "3. Tasks completed, saved from txt format file to Excel file"
 def run_main_save_to_excel_with_xlwt(self):
  print " 4. read the txt file into memory and store it as a list object "
  dataset_list=self.read_from_file("test_excel.txt")
  '''test use xlwt to handle EXCEL 97-2003'''
  print " 5. Write the document to an Excel spreadsheet."
  head_row_label=self.head_row_labels
  save_name="test_xlwt.xls"
  self.write_to_excel_with_openpyxl(dataset_list,head_row_label,save_name)
  print "6. Tasks completed, saved from txt format file to Excel file"
if __name__ == '__main__':
 print "create handle Excel Object"
 obj_handle_excel=HandleExcel()
 # Write data to a file using openpyxl and xlwt, respectively
 obj_handle_excel.run_main_save_to_excel_with_openpyxl()
 obj_handle_excel.run_main_save_to_excel_with_xlwt()
 '''Test reading out files, note that openpyxl cannot read xls files, xlrd cannot read xlsx format files'''
 #obj_handle_excel.read_excel_with_openpyxl("") # Wrong way to write it
 #obj_handle_excel.read_excel_with_openpyxl("") # Wrong way to write it
 obj_handle_excel.read_excel("")
 obj_handle_excel.read_excel_with_openpyxl("")

Above this use python to export a large amount of data to Excel tips to share is all I have to share with you, I hope to be able to give you a reference, and I hope you support me more.