Basics of python-docx library

  1. Install python-docx

    Download and install the latest version of python from Python Website if not already done.

    Setting up virtual environment is not a must. But this is a best practice when working on python projects. Click virtual environment for more details.

    Install python-docx using the command: pip install python-docx

  2. Check if python-docx installed successfully

    Type command pip freeze
    Should give an output like:

    lxml==4.7.1
    python-docx==0.8.11
    
  3. Create new document

    Blank word documents can be easily created using python-docx. Import Document module from docx library and also import the os standard library. We need os module functionalities to Open or Save files.

    Then create a Document object and add the necessary contents and save the file by calling Save() function.

    Copied
    from docx import Document
    import os
    
    newDoc = Document()
    newDoc.add_paragraph("First Line")
    
    filepath = os.path.dirname(__file__) + '\TestFile.docx'
    newDoc.save(filepath)
    
  4. Open and read a document

    To open word files, call Document class and pass the file path as parameter as shown below. Then, we can loop through paragraphs or headings of any other type of contents to access the data.

    Copied
    from docx import Document
    import os
    
    filepath = os.path.dirname(__file__) + '\TestFile.docx'
    d = Document(filepath)
    
    print(len(d.paragraphs))
    
    for p in d.paragraphs:
        print(p)
    

    To modify a document, open it and add or remove paragraphs as needed and apply styles and save.

  5. Add text content to a document

    To add all the text content use add_paragraph built-in function. All content added are by default of normal style. Apply different styles as below. To add a line break, add an empty paragraph.

    Copied
    from docx import Document
    import os
    
    newDoc = Document()
    newDoc.add_paragraph("Top Heading", style = 'Heading 1')#'Heading 1'
    newDoc.add_paragraph("First Line")#'Normal'
    newDoc.add_paragraph("List 1", style = 'List Number')#'Ordered List'
    newDoc.add_paragraph("List 2", style = 'List Number')#'Ordered List'
    
    newDoc.add_paragraph("List 1", style = 'List Bullet')#'Unordered List'
    newDoc.add_paragraph("List 2", style = 'List Bullet')#'Unordered List'
    
    filepath = os.path.dirname(__file__) + '\TestFile.docx'
    newDoc.save(filepath)
    
  6. Align content

    To align content left, right, center or justified, import WD_ALIGN_PARAGRAPH and use it as below.

    Copied
    from docx import Document
    from docx.enum.text import WD_ALIGN_PARAGRAPH
    import os
    
    newDoc = Document()
    htop = newDoc.add_paragraph("Top Heading", style = 'Heading 1')
    htop.alignment = WD_ALIGN_PARAGRAPH.CENTER
    
    c = newDoc.add_paragraph("This is a justified paragraph.\nThis is a justified paragraph")
    c.alignment = WD_ALIGN_PARAGRAPH.JUSTIFY
    
    filepath = os.path.dirname(__file__) + '\TestFile.docx'
    newDoc.save(filepath)
    
  7. Apply styles - add_run function

    To apply additional styles to a content we have to use add_run function. This function returns an object variable to which we can apply styles.

    Copied
    from docx import Document
    from docx.enum.text import WD_ALIGN_PARAGRAPH
    import os
    
    newDoc = Document()
    
    a1 = newDoc.add_paragraph().add_run('Bold Line')
    a1.bold = True
    
    filepath = os.path.dirname(__file__) + '\TestFile.docx'
    newDoc.save(filepath)
    
  8. Bold, Italics, Underline

    We can change the style of the text content to Bold, Italics or Underlined by using the built-in properties shown below.

    Copied
    from docx import Document
    from docx.enum.text import WD_ALIGN_PARAGRAPH
    import os
    
    newDoc = Document()
    
    p = newDoc.add_paragraph()
    p.add_run('Bold Content\n')
    p.bold = True
    p.add_run('Italic Content\n').italic = True
    p.add_run('Underline Content\n').underline = True
    
    filepath = os.path.dirname(__file__) + '\TestFile.docx'
    newDoc.save(filepath)
    
  9. Font color, size and family

    To modify the font color, size or family, import RGBColor, Pt from docx.shared module of python-docx.

    Copied
    from docx import Document
    from docx.shared import RGBColor, Pt
    import os
    
    newDoc = Document()
    
    p = newDoc.add_paragraph().add_run('Test Content')
    p.font.color.rgb = RGBColor(115, 52, 100)
    p.font.name = 'Calibri'
    p.font.size = Pt(20)
    
    filepath = os.path.dirname(__file__) + '\TestFile.docx'
    newDoc.save(filepath)
    
  10. Tables

    Table creation sample provided below. To access the table content, we can use a loop.

    Copied
    from docx import Document
    import os
    
    newDoc = Document()
    
    t = newDoc.add_table(rows = 2, cols = 2)
    t.rows[0].cells[0].text = 'Cell 1'
    t.rows[0].cells[1].text = 'Cell 2'
    t.rows[1].cells[0].text = 'Cell 3'
    t.rows[1].cells[1].text = 'Cell 4'
    t.add_row()
    t.rows[2].cells[0].text = 'Cell 5'
    t.rows[2].cells[1].text = 'Cell 6'
    
    filepath = os.path.dirname(__file__) + '\TestFile.docx'
    newDoc.save(filepath)
    
  11. Tabbed paragraph

    Use add_tab function as below to create a tabbed paragraph.

    Copied
    from docx import Document
    import os
    
    newDoc = Document()
    
    p = newDoc.add_paragraph()
    p.add_run().add_tab()
    p.add_run('After Tab')
    
    filepath = os.path.dirname(__file__) + '\TestFile.docx'
    newDoc.save(filepath)
    
  12. Add Image

    To use images in the word file, import RGBColor, Pt, Inches from docx.shared module of python-docx.

    Copied
    from docx import Document
    from docx.shared import RGBColor, Pt, Inches
    import os
    
    newDoc = Document()
    
    img = newDoc.add_picture(imgpath, height=Inches(3), width=Inches(2))
    
    filepath = os.path.dirname(__file__) + '\TestFile.docx'
    newDoc.save(filepath)
    
  13. Loop through paragraphs

    We can use for loop to navigate through the contents of the document. There are options to access the content based on the type like Paragraphs, Images, Headings, etc. Or we can provide a common property to a group of contents and access those.

    To see how this works, refer the sample application link provided below.

    Copied
    import docx as doc
    import os
    
    filepath = os.path.dirname(__file__) + '\TestFile.docx'
    d = doc.Document(filepath)
    
    print(len(d.paragraphs))
    
    for p in d.paragraphs:
        print(p)
    
  14. python-docx Samples

Absolute Code Works - Python Topics