pywordform is a python module to parse Microsoft Word forms in docx format, and extract all field values with their tags into a python dictionary.
The archive is available on the project page.
BSD, open-source. See LICENCE.txt for more info.
Open the file sample_form.docx (provided with the source code) in MS Word, and edit field values. You may also add or edit fields, and create your own Word form (see below).
From the shell, you may use the module as a tool to extract all fields with tags:
> python pywordform.py sample_form.docx field1 = "hello, world." field2 = "hello," field3 = "value B" field4 = "04-03-2012"
In a python script, the parse_form function returns a dictionary of field values indexed by tags:
import pywordform fields = pywordform.parse_form('sample_form.docx') print fields
Output:
{'field2': 'hello,\nworld.', 'field3': 'value B', 'field1': 'hello, world.', 'field4': '04-03-2012'}
For more information, see the main program at the end of the module, and also docstrings.
The code is available in a Mercurial repository on bitbucket. You may use it to submit enhancements or to report any issue.
To report a bug, please use the issue reporting page, or send me an e-mail.