rtfobj - a python tool to extract embedded objects from RTF files

rtfobj is a Python module to extract embedded objects from RTF files, such as OLE ojects. It can be used as a Python library or a command-line tool. It is part of the oletools package. 

See the oletools page for more info.

News

  • 2013-04-18 v0.02: fixed bug in rtfobj, added documentation
  • 2012-11-09 v0.01: 1st version of rtfobj, used by pyxswf
  • See changelog in source code for more info.

Download:

The archive is available on the project page.

Usage

Usage: rtfobj.py <file.rtf>

It extracts and decodes all the data blocks encoded as hexadecimal in the RTF document, and saves them as files named "object_xxxx.bin", xxxx being the location of the object in the RTF file.

Usage as python module: rtf_iter_objects(filename) is an iterator which yields a tuple (index, object) providing the index of each hexadecimal stream in the RTF file, and the corresponding decoded object. Example:

import rtfobj    
for index, data in rtfobj.rtf_iter_objects("myfile.rtf"):
    print 'found object size %d at index %08X' % (len(data), index)