A python based automated tool to validate word documents
This is a code sample to illustrate the capability of Python to automate validating word documents. Suppose, we are required to work on a set of word documents that are similar in nature. Normally, we will have to open each document and read the content to find the answers. This can be a tedious task, if the number of documents are large.
Example scenarios include;
- Searching for a set of content in a group of files
- Validating answer sheets of an entire class
- Reading a set of data from a group of applications to find the right candidate
- Inserting a predefined set of content to a group of files
We will try to implement one of these scenarios in the following code sample. This is an app to validate answer sheets of a class. We are going to use python-docx library to work on the word files.
This code works in two parts. First module generates question templates. Second module validates answersheets placed in a specified folder and calculate marks and save the results. This sample uses a json file to store results. We can easily store data to any database as well using additional python libraries.
Here, a json file is used to store the results. We can easily store data to any database as well using additional python libraries.
Step by step explanation below.
-
Install Python and python-docx
Download and install the latest version of python from Python Website.
Keep your favorite Python editor ready. This can be a basic notepad application available by default for your OS or you can download and install any free editor like VS Code.
Set up virtual environment also to avoid any unexpected errors.
Open command prompt and navigate to the folder where you are going to place the code and activate virtual environment.
Install python-docx using command pip install python-docx
Refer python-docx basics if needed.To check if libraries installed successfully, type command pip freeze
Should give an output like:lxml==4.7.1 python-docx==0.8.11
-
Generate Question Template
makeQuestionDoc.py generates contents of the question template and applies styles to these. This sample generates 5 questions as list with numbers. Below each question, a space is made available to provide answers. Important thing to note here is that for all the answer paragraphs, we are applying a style 'Body Text'. We use this style to identify the answer spaces and loop through those.
Copiedfrom docx import Document from docx.enum.text import WD_ALIGN_PARAGRAPH from docx.shared import RGBColor import os doc = Document() htop = doc.add_paragraph() htop.add_run('Maths Test').bold = True htop.alignment = WD_ALIGN_PARAGRAPH.CENTER sno = doc.add_paragraph("Roll No: ") sname = doc.add_paragraph("Name: ") doc.add_paragraph() doc.add_paragraph("Number of sides in a pentagon", style = 'List Number') a1 = doc.add_paragraph(style = 'Body Text').add_run('Ans: ') a1.bold = True a1.font.color.rgb = RGBColor(115, 52, 100) doc.add_paragraph("10 + 12", style = 'List Number') a2 = doc.add_paragraph(style = 'Body Text').add_run('Ans: ') a2.bold = True a2.font.color.rgb = RGBColor(115, 52, 100) doc.add_paragraph("30 - 15", style = 'List Number') a3 = doc.add_paragraph(style = 'Body Text').add_run('Ans: ') a3.bold = True a3.font.color.rgb = RGBColor(115, 52, 100) doc.add_paragraph("12 * 4", style = 'List Number') a4 = doc.add_paragraph(style = 'Body Text').add_run('Ans: ') a4.bold = True a4.font.color.rgb = RGBColor(115, 52, 100) doc.add_paragraph("25 / 5", style = 'List Number') a5 = doc.add_paragraph(style = 'Body Text').add_run('Ans: ') a5.bold = True a5.font.color.rgb = RGBColor(115, 52, 100) filepath = os.path.dirname(__file__) + '\MathsTest.docx' doc.save(filepath) print("Question File Generated")
Use command python makeQuestionDoc.py or py makeQuestionDoc.py to run the module. If no errors, a word document named MathsTest.docx with below content will get generated.
-
Validate Answers
Each candidate are supposed to provide answers in the specified places and send it back. All sheets are to be placed in a folder named AnswersFolder and run the python code below.
Code retrieves all *.docx files in the folder. A list named anskey should be updated with correct answers in the right order before running. List of answers are fetched using the style applied 'Body Text' and compared with the answer key. Result of all candidates will get finally stored in a json file results.json. Make sure to create this file also in the same folder. Results can also be written to another word file or an excel or a database.Copiedfrom docx import Document import os import json resfile = os.path.dirname(__file__) + '\\AnswersFolder\\results.json' anskey = [5, 22, 15, 48, 5] res = [] for name in os.listdir("AnswersFolder/"): if(name.endswith('.docx')): filepath = os.path.dirname(__file__) + '\AnswersFolder\\' + name d = Document(filepath) sid = d.paragraphs[1].text.replace("Roll No:", "").strip() sname = d.paragraphs[2].text.replace("Name:", "").strip() ans = [x for x in d.paragraphs if x.style.name == "Body Text"] marks = 0 for r, p in enumerate(ans): if(str(anskey[r]) == p.text.replace("Ans:", "").strip()): marks += 1 res.append({'rollno': sid, 'name': sname, 'marks': marks}) with open(resfile, 'w') as w: json.dump(res, w) print('Validation Complete')
Sample output below:
[ {"rollno": "101", "name": "Lisa", "marks": 5}, {"rollno": "102", "name": "Sam", "marks": 4}, {"rollno": "103", "name": "Ted", "marks": 3} ]