introduction
In daily office work, we often need to handle tasks such as format adjustment and content update of Word documents. Python and its rich third-party libraries provide strong support for developers who want to automate these tasks through programming. This article will introduce how to use the python-docx library to copy the content and style of a Word document, and show how to use this method to automate the document content.
Environmental preparation
First, make sure you have installed the python-docx library. If it has not been installed, you can install it through the following command:
pip install python-docx
Main functions implementation
Copy paragraph and text box styles: By definition
copy_paragraph_style
Function, we can copy the style of an old paragraph or text box to a newly created paragraph or text box.Identify page breaks:
is_page_break
Functions help us identify whether page breaks are included between document elements, which is very important for maintaining consistency in document layout.Cloning paragraphs and tables:pass
clone_paragraph
andclone_table
Functions, we can create new paragraphs or tables based on paragraphs or tables in old documents and retain the original style settings.Copy cell borders: In order to make the newly generated table look consistent with the original table, we implemented
copy_cell_borders
Function to copy the border style of each cell.Copy the full document: Finally, by
clone_document
Functions, we can copy the content and styles of the entire document into a new Word document.
Sample code
Here is a simplified version of the core code example showing how to extract content from an old document and create a new document:
from docx import Document # Assume that other required imports are included here def clone_document(old_doc_path, new_doc_path, out_text_list): try: # Load old documents and create new documents old_doc = Document(old_doc_path) new_doc = Document() # Copy the main content elements = old_doc. para_index = 0 table_index = 0 index = 0 while index < len(elements): element = elements[index] if ('p'): # Processing paragraphs... para_index += 1 elif ('tbl'): # Processing form... table_index += 1 index += 1 # Save new document new_doc.save(new_doc_path) print(f"The document has been saved to:{new_doc_path}") except Exception as e: print(f"An error occurred while copying a document:{e}")
in conclusion
Through the above method, we can efficiently copy the content and style of Word documents, which provides an effective solution for document processing automation. Of course, depending on actual needs, you can further expand and improve this basic framework, such as adding support for more styles, optimizing performance, etc.
I hope this article can provide you with valuable reference and help you process Word documents more efficiently in your daily work.
Generate file code
from docx import Document from import WD_BREAK from import OxmlElement from import qn from copy_word_only_text_json import clone_document as gen_to_list def copy_paragraph_style(run_from, run_to): """Copy run style""" run_to.bold = run_from.bold run_to.italic = run_from.italic run_to.underline = run_from.underline run_to. = run_from. run_to. = run_from. run_to. = run_from. run_to.font.all_caps = run_from.font.all_caps run_to. = run_from. run_to. = run_from. def is_page_break(element): """Judge whether an element is a page break (after a paragraph or table)""" if ('p'): for child in element: if ('br') and (qn('type')) == 'page': return True elif ('tbl'): # There may be page breaks after the table (judged by the next element) if () is not None: next_element = () if next_element.('p'): for child in next_element: if ('br') and (qn('type')) == 'page': return True return False def clone_paragraph(old_para, new_doc, out_text_list): """Create a new paragraph from an old paragraph""" new_para = new_doc.add_paragraph() if old_para.style: new_para.style = old_para.style for old_run in old_para.runs: new_run = new_para.add_run(out_text_list.pop(0)) copy_paragraph_style(old_run, new_run) new_para.alignment = old_para.alignment return new_para def copy_cell_borders(old_cell, new_cell): """Copy the border style of the cell""" old_tc = old_cell._tc new_tc = new_cell._tc old_borders = old_tc.xpath('.//w:tcBorders') if old_borders: old_border = old_borders[0] new_border = OxmlElement('w:tcBorders') border_types = ['top', 'left', 'bottom', 'right', 'insideH', 'insideV'] for border_type in border_types: old_element = old_border.find(f'.//w:{border_type}', namespaces={ 'w': '/wordprocessingml/2006/main' }) if old_element is not None: new_element = OxmlElement(f'w:{border_type}') for attr, value in old_element.(): new_element.set(attr, value) new_border.append(new_element) tc_pr = new_tc.get_or_add_tcPr() tc_pr.append(new_border) def clone_table(old_table, new_doc, out_text_list): """Create a new form from an old form""" new_table = new_doc.add_table(rows=len(old_table.rows), cols=len(old_table.columns)) if old_table.style: new_table.style = old_table.style for i, old_row in enumerate(old_table.rows): for j, old_cell in enumerate(old_row.cells): new_cell = new_table.cell(i, j) for paragraph in new_cell.paragraphs: new_cell._element.remove(paragraph._element) for old_paragraph in old_cell.paragraphs: new_paragraph = new_cell.add_paragraph() for old_run in old_paragraph.runs: new_run = new_paragraph.add_run(out_text_list.pop(0)) copy_paragraph_style(old_run, new_run) new_paragraph.alignment = old_paragraph.alignment copy_cell_borders(old_cell, new_cell) for i, col in enumerate(old_table.columns): if is not None: new_table.columns[i].width = return new_table def clone_document(old_doc_path, new_doc_path, out_text_list ): # global out_text_list try: old_doc = Document(old_doc_path) new_doc = Document() # # Copy section breaks and header footer # for old_section in old_doc.sections: # new_section = new_doc.add_section(start_type=old_section.start_type) # new_section.left_margin = old_section.left_margin # new_section.right_margin = old_section.right_margin # # Other section breaking attributes... # # # header # for para in old_section.: # new_para = new_section.header.add_paragraph() # for run in : # new_run = new_para.add_run() # copy_paragraph_style(run, new_run) # new_para.alignment = # # # footer # for para in old_section.: # new_para = new_section.footer.add_paragraph() # for run in : # new_run = new_para.add_run() # copy_paragraph_style(run, new_run) # new_para.alignment = # Copy the main content elements = old_doc. para_index = 0 table_index = 0 index = 0 while index < len(elements): element = elements[index] if ('p'): old_para = old_doc.paragraphs[para_index] clone_paragraph(old_para, new_doc, out_text_list) para_index += 1 index += 1 elif ('tbl'): old_table = old_doc.tables[table_index] clone_table(old_table, new_doc, out_text_list) table_index += 1 index += 1 elif ('br') and (qn('type')) == 'page': if index > 0: new_doc.add_paragraph().add_run().add_break(WD_BREAK.PAGE) index += 1 else: index += 1 # Check page breaks if index < len(elements) and is_page_break(elements[index]): if index > 0: new_doc.add_paragraph().add_run().add_break(WD_BREAK.PAGE) index += 1 if new_doc_path: new_doc.save(new_doc_path) print(f"The document has been saved to:{new_doc_path}") else: return out_text_list except Exception as e: print(f"An error occurred while copying a document:{e}") #User Exampleif __name__ == "__main__": out = gen_to_list('.docx', '') if out: print("Document content:\n", out, """Please change the document content according to user requirements.,Without changing the order,And do not change the number of contents,Finally, the contentlist Output to the givenjsonmiddle ```json {"Output":[]} ``` User input:Please polish """) print("Request llm") print("Extract json") print("Fill in template") out = clone_document('.docx', 'only_text.docx',out)
Generate text list code
from docx import Document from import WD_BREAK from import OxmlElement from import qn def copy_paragraph_style(run_from, run_to): """Copy run style""" run_to.bold = run_from.bold run_to.italic = run_from.italic run_to.underline = run_from.underline run_to. = run_from. run_to. = run_from. run_to. = run_from. run_to.font.all_caps = run_from.font.all_caps run_to. = run_from. run_to. = run_from. def is_page_break(element): """Judge whether an element is a page break (after a paragraph or table)""" if ('p'): for child in element: if ('br') and (qn('type')) == 'page': return True elif ('tbl'): # There may be page breaks after the table (judged by the next element) if () is not None: next_element = () if next_element.('p'): for child in next_element: if ('br') and (qn('type')) == 'page': return True return False def clone_paragraph(old_para, new_doc,out_text_list): """Create a new paragraph from an old paragraph""" new_para = new_doc.add_paragraph() if old_para.style: new_para.style = old_para.style for old_run in old_para.runs: out_text_list.append(old_run.text) new_run = new_para.add_run(old_run.text) copy_paragraph_style(old_run, new_run) new_para.alignment = old_para.alignment return new_para def copy_cell_borders(old_cell, new_cell): """Copy the border style of the cell""" old_tc = old_cell._tc new_tc = new_cell._tc old_borders = old_tc.xpath('.//w:tcBorders') if old_borders: old_border = old_borders[0] new_border = OxmlElement('w:tcBorders') border_types = ['top', 'left', 'bottom', 'right', 'insideH', 'insideV'] for border_type in border_types: old_element = old_border.find(f'.//w:{border_type}', namespaces={ 'w': '/wordprocessingml/2006/main' }) if old_element is not None: new_element = OxmlElement(f'w:{border_type}') for attr, value in old_element.(): new_element.set(attr, value) new_border.append(new_element) tc_pr = new_tc.get_or_add_tcPr() tc_pr.append(new_border) def clone_table(old_table, new_doc,out_text_list): """Create a new form from an old form""" new_table = new_doc.add_table(rows=len(old_table.rows), cols=len(old_table.columns)) if old_table.style: new_table.style = old_table.style for i, old_row in enumerate(old_table.rows): for j, old_cell in enumerate(old_row.cells): new_cell = new_table.cell(i, j) for paragraph in new_cell.paragraphs: new_cell._element.remove(paragraph._element) for old_paragraph in old_cell.paragraphs: new_paragraph = new_cell.add_paragraph() for old_run in old_paragraph.runs: out_text_list.append(old_run.text) new_run = new_paragraph.add_run(old_run.text) copy_paragraph_style(old_run, new_run) new_paragraph.alignment = old_paragraph.alignment copy_cell_borders(old_cell, new_cell) for i, col in enumerate(old_table.columns): if is not None: new_table.columns[i].width = return new_table def clone_document(old_doc_path, new_doc_path): # global out_text_list out_text_list = [] try: old_doc = Document(old_doc_path) new_doc = Document() # # Copy section breaks and header footer # for old_section in old_doc.sections: # new_section = new_doc.add_section(start_type=old_section.start_type) # new_section.left_margin = old_section.left_margin # new_section.right_margin = old_section.right_margin # # Other section breaking attributes... # # # header # for para in old_section.: # new_para = new_section.header.add_paragraph() # for run in : # new_run = new_para.add_run() # copy_paragraph_style(run, new_run) # new_para.alignment = # # # footer # for para in old_section.: # new_para = new_section.footer.add_paragraph() # for run in : # new_run = new_para.add_run() # copy_paragraph_style(run, new_run) # new_para.alignment = # Copy the main content elements = old_doc. para_index = 0 table_index = 0 index = 0 while index < len(elements): element = elements[index] if ('p'): old_para = old_doc.paragraphs[para_index] clone_paragraph(old_para, new_doc,out_text_list) para_index += 1 index += 1 elif ('tbl'): old_table = old_doc.tables[table_index] clone_table(old_table, new_doc,out_text_list) table_index += 1 index += 1 elif ('br') and (qn('type')) == 'page': if index > 0: new_doc.add_paragraph().add_run().add_break(WD_BREAK.PAGE) index += 1 else: index += 1 # Check page breaks if index < len(elements) and is_page_break(elements[index]): if index > 0: new_doc.add_paragraph().add_run().add_break(WD_BREAK.PAGE) index += 1 if new_doc_path: new_doc.save(new_doc_path) print(f"The document has been saved to:{new_doc_path}") else: return out_text_list except Exception as e: print(f"An error occurred while copying a document:{e}") #User Exampleif __name__ == "__main__": out=clone_document('Nanshan Three Defense Work Special Report.docx', '') if out: print("Document content:\n",out,"""Please change the document content according to user requirements.,Without changing the order,And do not change the number of contents,Finally, the contentlist Output to the givenjsonmiddle ```json {"Output":[]} ``` User input:Please polish """) print("Request llm") print("Extract json") print("Fill in template")
This is the article about how to copy Word document styles using the Python-docx library. This is the end of this article. For more related content related to Python python-docx library, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!