A python based automated tool to validate word documents

This is a code sample to illustrate the capability of Python to automate validating word documents. Suppose, we are required to work on a set of word documents that are similar in nature. Normally, we will have to open each document and read the content to find the answers. This can be a tedious task, if the number of documents are large.

Example scenarios include;

  1. Searching for a set of content in a group of files
  2. Validating answer sheets of an entire class
  3. Reading a set of data from a group of applications to find the right candidate
  4. Inserting a predefined set of content to a group of files

We will try to implement one of these scenarios in the following code sample. This is an app to validate answer sheets of a class. We are going to use python-docx library to work on the word files.

This code works in two parts. First module generates question templates. Second module validates answersheets placed in a specified folder and calculate marks and save the results. This sample uses a json file to store results. We can easily store data to any database as well using additional python libraries.

Here, a json file is used to store the results. We can easily store data to any database as well using additional python libraries.
Step by step explanation below.

  1. Install Python and python-docx

    Download and install the latest version of python from Python Website.

    Keep your favorite Python editor ready. This can be a basic notepad application available by default for your OS or you can download and install any free editor like VS Code.

    Set up virtual environment also to avoid any unexpected errors.

    Open command prompt and navigate to the folder where you are going to place the code and activate virtual environment.

    Install python-docx using command pip install python-docx
    Refer python-docx basics if needed.

    To check if libraries installed successfully, type command pip freeze
    Should give an output like:

    lxml==4.7.1
    python-docx==0.8.11
    
  2. Generate Question Template

    makeQuestionDoc.py generates contents of the question template and applies styles to these. This sample generates 5 questions as list with numbers. Below each question, a space is made available to provide answers. Important thing to note here is that for all the answer paragraphs, we are applying a style 'Body Text'. We use this style to identify the answer spaces and loop through those.

    Copied
    from docx import Document
    from docx.enum.text import WD_ALIGN_PARAGRAPH
    from docx.shared import RGBColor
    import os
    
    doc = Document()
    
    htop = doc.add_paragraph()
    htop.add_run('Maths Test').bold = True
    htop.alignment = WD_ALIGN_PARAGRAPH.CENTER
    
    sno = doc.add_paragraph("Roll No: ")
    sname = doc.add_paragraph("Name: ")
    
    doc.add_paragraph()
    
    doc.add_paragraph("Number of sides in a pentagon", style = 'List Number')
    a1 = doc.add_paragraph(style = 'Body Text').add_run('Ans: ')
    a1.bold = True
    a1.font.color.rgb = RGBColor(115, 52, 100)
    
    doc.add_paragraph("10 + 12", style = 'List Number')
    a2 = doc.add_paragraph(style = 'Body Text').add_run('Ans: ')
    a2.bold = True
    a2.font.color.rgb = RGBColor(115, 52, 100)
    
    doc.add_paragraph("30 - 15", style = 'List Number')
    a3 = doc.add_paragraph(style = 'Body Text').add_run('Ans: ')
    a3.bold = True
    a3.font.color.rgb = RGBColor(115, 52, 100)
    
    doc.add_paragraph("12 * 4", style = 'List Number')
    a4 = doc.add_paragraph(style = 'Body Text').add_run('Ans: ')
    a4.bold = True
    a4.font.color.rgb = RGBColor(115, 52, 100)
    
    doc.add_paragraph("25 / 5", style = 'List Number')
    a5 = doc.add_paragraph(style = 'Body Text').add_run('Ans: ')
    a5.bold = True
    a5.font.color.rgb = RGBColor(115, 52, 100)
    
    filepath = os.path.dirname(__file__) + '\MathsTest.docx'
    doc.save(filepath)
    
    print("Question File Generated")
    

    Use command python makeQuestionDoc.py or py makeQuestionDoc.py to run the module. If no errors, a word document named MathsTest.docx with below content will get generated.

    Answer sheet validator - Sample sheet
  3. Validate Answers

    Each candidate are supposed to provide answers in the specified places and send it back. All sheets are to be placed in a folder named AnswersFolder and run the python code below.
    Code retrieves all *.docx files in the folder. A list named anskey should be updated with correct answers in the right order before running. List of answers are fetched using the style applied 'Body Text' and compared with the answer key. Result of all candidates will get finally stored in a json file results.json. Make sure to create this file also in the same folder. Results can also be written to another word file or an excel or a database.

    Copied
    from docx import Document
    import os
    import json
    
    
    resfile = os.path.dirname(__file__) + '\\AnswersFolder\\results.json'
    anskey = [5, 22, 15, 48, 5]
    res = []
    
    for name in os.listdir("AnswersFolder/"):
        if(name.endswith('.docx')):
            filepath = os.path.dirname(__file__) + '\AnswersFolder\\' + name
            d = Document(filepath)
            sid = d.paragraphs[1].text.replace("Roll No:", "").strip()
            sname = d.paragraphs[2].text.replace("Name:", "").strip()
            ans = [x for x in d.paragraphs if x.style.name == "Body Text"]
            marks = 0
    
            for r, p in enumerate(ans):
                if(str(anskey[r]) == p.text.replace("Ans:", "").strip()):
                    marks += 1
            
            res.append({'rollno': sid, 'name': sname, 'marks': marks})
    
    with open(resfile, 'w') as w:
        json.dump(res, w)
    
    print('Validation Complete')
    

    Sample output below:

    [
        {"rollno": "101", "name": "Lisa", "marks": 5},
        {"rollno": "102", "name": "Sam", "marks": 4},
        {"rollno": "103", "name": "Ted", "marks": 3}
    ]
    
Absolute Code Works - Python Topics