Basics of python-docx library
-
python-docx is an open source python library for working with word documents. If there is a requirement to work with word documents using python code, this library is the best option available.
This is also free to use, simple to understand and optimized for best utilization of CPU.
Most of the functionalities that Microsoft Word offers is covered in this library.
Also compatible with the latest version of Python.
Go through this tutorial and you will get to know all the basics you need to work with word documents using python.
Each of the common functionalities are explained with easy to understand code samples. -
Contents
-
Install python-docx
Download and install the latest version of python from Python Website if not already done.
Setting up virtual environment is not a must. But this is a best practice when working on python projects. Click virtual environment for more details.
Install python-docx using the command: pip install python-docx
-
Check if python-docx installed successfully
Type command pip freeze
Should give an output like:lxml==4.7.1 python-docx==0.8.11
-
Create new document
Blank word documents can be easily created using python-docx. Import Document module from docx library and also import the os standard library. We need os module functionalities to Open or Save files.
Then create a Document object and add the necessary contents and save the file by calling Save() function.
Copiedfrom docx import Document import os newDoc = Document() newDoc.add_paragraph("First Line") filepath = os.path.dirname(__file__) + '\TestFile.docx' newDoc.save(filepath)
-
Open and read a document
To open word files, call Document class and pass the file path as parameter as shown below. Then, we can loop through paragraphs or headings of any other type of contents to access the data.
Copiedfrom docx import Document import os filepath = os.path.dirname(__file__) + '\TestFile.docx' d = Document(filepath) print(len(d.paragraphs)) for p in d.paragraphs: print(p)
To modify a document, open it and add or remove paragraphs as needed and apply styles and save.
-
Add text content to a document
To add all the text content use add_paragraph built-in function. All content added are by default of normal style. Apply different styles as below. To add a line break, add an empty paragraph.
Copiedfrom docx import Document import os newDoc = Document() newDoc.add_paragraph("Top Heading", style = 'Heading 1')#'Heading 1' newDoc.add_paragraph("First Line")#'Normal' newDoc.add_paragraph("List 1", style = 'List Number')#'Ordered List' newDoc.add_paragraph("List 2", style = 'List Number')#'Ordered List' newDoc.add_paragraph("List 1", style = 'List Bullet')#'Unordered List' newDoc.add_paragraph("List 2", style = 'List Bullet')#'Unordered List' filepath = os.path.dirname(__file__) + '\TestFile.docx' newDoc.save(filepath)
-
Align content
To align content left, right, center or justified, import WD_ALIGN_PARAGRAPH and use it as below.
Copiedfrom docx import Document from docx.enum.text import WD_ALIGN_PARAGRAPH import os newDoc = Document() htop = newDoc.add_paragraph("Top Heading", style = 'Heading 1') htop.alignment = WD_ALIGN_PARAGRAPH.CENTER c = newDoc.add_paragraph("This is a justified paragraph.\nThis is a justified paragraph") c.alignment = WD_ALIGN_PARAGRAPH.JUSTIFY filepath = os.path.dirname(__file__) + '\TestFile.docx' newDoc.save(filepath)
-
Apply styles - add_run function
To apply additional styles to a content we have to use add_run function. This function returns an object variable to which we can apply styles.
Copiedfrom docx import Document from docx.enum.text import WD_ALIGN_PARAGRAPH import os newDoc = Document() a1 = newDoc.add_paragraph().add_run('Bold Line') a1.bold = True filepath = os.path.dirname(__file__) + '\TestFile.docx' newDoc.save(filepath)
-
Bold, Italics, Underline
We can change the style of the text content to Bold, Italics or Underlined by using the built-in properties shown below.
Copiedfrom docx import Document from docx.enum.text import WD_ALIGN_PARAGRAPH import os newDoc = Document() p = newDoc.add_paragraph() p.add_run('Bold Content\n') p.bold = True p.add_run('Italic Content\n').italic = True p.add_run('Underline Content\n').underline = True filepath = os.path.dirname(__file__) + '\TestFile.docx' newDoc.save(filepath)
-
Font color, size and family
To modify the font color, size or family, import RGBColor, Pt from docx.shared module of python-docx.
Copiedfrom docx import Document from docx.shared import RGBColor, Pt import os newDoc = Document() p = newDoc.add_paragraph().add_run('Test Content') p.font.color.rgb = RGBColor(115, 52, 100) p.font.name = 'Calibri' p.font.size = Pt(20) filepath = os.path.dirname(__file__) + '\TestFile.docx' newDoc.save(filepath)
-
Tables
Table creation sample provided below. To access the table content, we can use a loop.
Copiedfrom docx import Document import os newDoc = Document() t = newDoc.add_table(rows = 2, cols = 2) t.rows[0].cells[0].text = 'Cell 1' t.rows[0].cells[1].text = 'Cell 2' t.rows[1].cells[0].text = 'Cell 3' t.rows[1].cells[1].text = 'Cell 4' t.add_row() t.rows[2].cells[0].text = 'Cell 5' t.rows[2].cells[1].text = 'Cell 6' filepath = os.path.dirname(__file__) + '\TestFile.docx' newDoc.save(filepath)
-
Tabbed paragraph
Use add_tab function as below to create a tabbed paragraph.
Copiedfrom docx import Document import os newDoc = Document() p = newDoc.add_paragraph() p.add_run().add_tab() p.add_run('After Tab') filepath = os.path.dirname(__file__) + '\TestFile.docx' newDoc.save(filepath)
-
Add Image
To use images in the word file, import RGBColor, Pt, Inches from docx.shared module of python-docx.
Copiedfrom docx import Document from docx.shared import RGBColor, Pt, Inches import os newDoc = Document() img = newDoc.add_picture(imgpath, height=Inches(3), width=Inches(2)) filepath = os.path.dirname(__file__) + '\TestFile.docx' newDoc.save(filepath)
-
Loop through paragraphs
We can use for loop to navigate through the contents of the document. There are options to access the content based on the type like Paragraphs, Images, Headings, etc. Or we can provide a common property to a group of contents and access those.
To see how this works, refer the sample application link provided below.
Copiedimport docx as doc import os filepath = os.path.dirname(__file__) + '\TestFile.docx' d = doc.Document(filepath) print(len(d.paragraphs)) for p in d.paragraphs: print(p)
-
python-docx Samples