oletools - python tools to analyze OLE and MS Office files

python-oletools is a package of python tools to analyze Microsoft OLE2 files (also called Structured Storage, Compound File Binary Format or Compound Document File Format), such as Microsoft Office documents or Outlook messages, mainly for malware analysis, forensics and debugging. It is based on my olefile parser. 

Quick links: Home page - Download/Install - Documentation - Report Issues/Suggestions/Questions - Contact the Author - Repository - Updates on Twitter

Note: python-oletools is not related to OLETools published by BeCubed Software.

News

 

Tools in oletools:

Tools to analyze malicious documents

Tools to analyze the structure of OLE files

Projects using oletools:

oletools are used by a number of projects and online malware analysis services, including Viper, REMnux, Hybrid-analysis.com, Joe Sandbox, Deepviz, Laika BOSS, Cuckoo Sandbox, Anlyz.io, ViperMonkey, pcodedmp, dridex.malwareconfig.com, and probably VirusTotal. (Please contact me if you have or know a project using oletools)

Download and Install:

The recommended way to download and install/update the latest stable release of oletools is to use pip:

This should automatically create command-line scripts to run each tool from any directory: olevbaolevba, mraptormraptor, rtfobjrtfobj, etc.

To get the latest development version instead:

See the documentation for other installation options.

Documentation:

The latest version of the documentation can be found online, otherwise a copy is provided in the doc subfolder of the package.

How to Suggest Improvements, Report Issues or Contribute:

This is a personal open-source project, developed on my spare time. Any contribution, suggestion, feedback or bug report is welcome.

To suggest improvements, report a bug or any issue, please use the issue reporting page, providing all the information and files to reproduce the problem.

You may also contact the author directly to provide feedback.

The code is available in a GitHub repository. You may use it to submit enhancements using forks and pull requests.

License

This license applies to the python-oletools package, apart from the thirdparty folder which contains third-party files published with their own license.

The python-oletools package is copyright (c) 2012-2020 Philippe Lagadec (http://www.decalage.info)

All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.


olevba contains modified source code from the officeparser project, published under the following MIT License (MIT):

officeparser is copyright (c) 2014 John William Davison

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

olemeta - a tool to extract all standard properties (metadata) from OLE files such as MS Office

olemeta is a script to parse OLE files such as MS Office documents (e.g. Word, Excel), to extract all standard properties present in the OLE file. It is part of the python-oletools package.

Quick links: Home page - Download - Documentation - Report Issues/Suggestions/Questions - Contact the Author - Repository - Updates on Twitter

Usage

olemeta.py <file>

Example

Checking the malware sample DIAN_caso-5415.doc:

>olemeta.py DIAN_caso-5415.doc

Properties from SummaryInformation stream:
- codepage: 1252
- title: 'Gu\xeda MIPYME para ser emisor electr\xf3nico'
- subject: ''
- author: 'OFEyDV'
- keywords: ''
- comments: ''
- template: 'Normal.dotm'
- last_saved_by: 'clein'
- revision_number: '13'
- total_edit_time: 4800L
- last_printed: datetime.datetime(2006, 6, 7, 14, 4)
- create_time: datetime.datetime(2009, 3, 30, 14, 18)
- last_saved_time: datetime.datetime(2014, 5, 14, 12, 45)
- num_pages: 7
- num_words: 269
- num_chars: 1485
- thumbnail: None
- creating_application: 'Microsoft Office Word'
- security: 0

Properties from DocumentSummaryInformation stream:
- codepage_doc: 1252
- category: None
- presentation_target: None
- bytes: None
- lines: 12
- paragraphs: 3
- slides: None
- notes: None
- hidden_slides: None
- mm_clips: None
- scale_crop: False
- heading_pairs: None
- titles_of_parts: None
- manager: None
- company: 'Servicio de Impuestos Internos'
- links_dirty: False
- chars_with_spaces: 1751
- unused: None
- shared_doc: False
- link_base: None
- hlinks: None
- hlinks_changed: False
- version: 786432
- dig_sig: None
- content_type: None
- content_status: None
- language: None
- doc_version: None

How to use olemeta in Python applications

TODO

oletimes - a tool to extract creation and modification timestamps of all streams and storages in OLE files

oletimes is a script to parse OLE files such as MS Office documents (e.g. Word, Excel), to extract creation and modification times of all streams and storages in the OLE file. It is part of the python-oletools package.

Quick links: Home page - Download - Documentation - Report Issues/Suggestions/Questions - Contact the Author - Repository - Updates on Twitter

Usage

oletimes.py <file>

Example

Checking the malware sample DIAN_caso-5415.doc:

>oletimes.py DIAN_caso-5415.doc

- Root mtime=2014-05-14 12:45:24.752000 ctime=None
- '\x01CompObj': mtime=None ctime=None
- '\x05DocumentSummaryInformation': mtime=None ctime=None
- '\x05SummaryInformation': mtime=None ctime=None
- '1Table': mtime=None ctime=None
- 'Data': mtime=None ctime=None
- 'Macros': mtime=2014-05-14 12:45:24.708000 ctime=2014-05-14 12:45:24.355000
- 'Macros/PROJECT': mtime=None ctime=None
- 'Macros/PROJECTwm': mtime=None ctime=None
- 'Macros/VBA': mtime=2014-05-14 12:45:24.684000 ctime=2014-05-14 12:45:24.355000
- 'Macros/VBA/ThisDocument': mtime=None ctime=None
- 'Macros/VBA/_VBA_PROJECT': mtime=None ctime=None
- 'Macros/VBA/__SRP_0': mtime=None ctime=None
- 'Macros/VBA/__SRP_1': mtime=None ctime=None
- 'Macros/VBA/__SRP_2': mtime=None ctime=None
- 'Macros/VBA/__SRP_3': mtime=None ctime=None
- 'Macros/VBA/dir': mtime=None ctime=None
- 'WordDocument': mtime=None ctime=None

How to use oletimes in Python applications

TODO

olevba - a tool to extract VBA Macro source code from MS Office documents (OLE and OpenXML)

olevba is a script to parse OLE and OpenXML files such as MS Office documents (e.g. Word, Excel), to detect VBA Macros, extract their source code in clear text, decode malware obfuscation (Hex/Base64/StrReverse/Dridex) and detect security-related patterns such as auto-executable macros, suspicious VBA keywords used by malware, and potential IOCs (IP addresses, URLs, executable filenames, etc). It is part of the python-oletools package.

It can be used either as a command-line tool, or as a python module from your own applications.Supported formats:

olevba is based on source code from officeparser by John William Davison, with significant modifications.

Quick links: Home page - Download - Documentation - Report Issues/Suggestions/Questions - Contact the Author - Repository - Updates on Twitter

Main Features

 

MS Office files encrypted with a password are also supported, because VBA macro code is never encrypted, only the content of the document.

About VBA Macros

See this article for more information and technical details about VBA Macros and how they are stored in MS Office documents.

Usage, Examples, Python API

See the olevba documentation.

olebrowse - a simple python GUI to browse OLE files and extract streams

olebrowse is a simple GUI to browse OLE files (e.g. MS Word, Excel, Powerpoint documents), to view and extract individual data streams. It is part of the oletools package.

See the oletools page for more info.

News

Download:

The oletools package is available on the project page.

Usage

Usage: olebrowse.py [file]

If you provide a file it will be opened, else a dialog will allow you to browse folders to open a file. Then if it is a valid OLE file, the list of data streams will be displayed. You can select a stream, and then either view its content in a builtin hexadecimal viewer, or save it to a file for further analysis.

Screenshots

Main menu, showing all streams in the OLE file:

Menu with actions for a stream:

 Hex view for a stream:

oleid - a python tool to quickly analyze OLE files

oleid is a script to analyze OLE files such as MS Office documents (e.g. Word, Excel), to detect specific characteristics that could potentially indicate that the file is suspicious or malicious, in terms of security (e.g. malware). For example it can detect VBA macros, embedded Flash objects, fragmentation. It is part of the oletools package. 

See the oletools page for more info.

News

Download:

The archive is available on the project page.

Usage

Usage: oleid.py <file>

Example

Analyzing a Word document containing a Flash object and VBA macros:

C:\oletools>oleid.py word_flash_vba.doc
Filename: word_flash_vba.doc
OLE format: True
Has SummaryInformation stream: True
Application name: Microsoft Office Word
Encrypted: False
Word Document: True
VBA Macros: True
Excel Workbook: False
PowerPoint Presentation: False
Visio Drawing: False
ObjectPool: True
Flash objects: 1

pyxswf - a python tool to extract SWF (Flash) objects from documents (improved xxxswf)

pyxswf is a script to detect, extract and analyze Flash objects (SWF files) that may be embedded in files such as MS Office documents (e.g. Word, Excel) and RTF, which is especially useful for malware analysis. It is part of the oletools package. pyxswf is an extension of xxxswf.py published by Alexander Hanel.

See the oletools page for more info.

pyxswf and xxxswf

pyxswf is an extension of xxxswf.py published by Alexander Hanel. Compared to xxxswf, it can extract streams from MS Office documents by parsing their OLE structure properly, which is necessary when streams are fragmented. Stream fragmentation is a known obfuscation technique, as explained on http://www.breakingpointsystems.com/resources/blog/evasion-with-ole2-fragmentation/

It can also extract Flash objects from RTF documents, by parsing embedded objects encoded in hexadecimal format (-f option).

For this, simply add the -o option to work on OLE streams rather than raw files, or the -f option to work on RTF files.

News

Download:

The archive is available on the project page.

Usage

Usage: pyxswf.py [options] <file.bad>

Options:
  -o, --ole             Parse an OLE file (e.g. Word, Excel) to look for SWF
                        in each stream
  -f, --rtf             Parse an RTF file to look for SWF in each embedded
                        object
  -x, --extract         Extracts the embedded SWF(s), names it MD5HASH.swf &
                        saves it in the working dir. No addition args needed
  -h, --help            show this help message and exit
  -y, --yara            Scans the SWF(s) with yara. If the SWF(s) is
                        compressed it will be deflated. No addition args
                        needed
  -s, --md5scan         Scans the SWF(s) for MD5 signatures. Please see func
                        checkMD5 to define hashes. No addition args needed
  -H, --header          Displays the SWFs file header. No addition args needed
  -d, --decompress      Deflates compressed SWFS(s)
  -r PATH, --recdir=PATH
                        Will recursively scan a directory for files that
                        contain SWFs. Must provide path in quotes
  -c, --compress        Compresses the SWF using Zlib

Examples

Example 1 - detecting and extracting a SWF file from a Word document on Windows:

C:\oletools>pyxswf.py -o word_flash.doc
OLE stream: 'Contents'
[SUMMARY] 1 SWF(s) in MD5:993664cc86f60d52d671b6610813cfd1:Contents
        [ADDR] SWF 1 at 0x8  - FWS Header

C:\oletools>pyxswf.py -xo word_flash.doc
OLE stream: 'Contents'
[SUMMARY] 1 SWF(s) in MD5:993664cc86f60d52d671b6610813cfd1:Contents
        [ADDR] SWF 1 at 0x8  - FWS Header
                [FILE] Carved SWF MD5: 2498e9c0701dc0e461ab4358f9102bc5.swf

Example 2 - detecting and extracting a SWF file from a RTF document on Windows:

C:\oletools>pyxswf.py -xf "rtf_flash.rtf"
RTF embedded object size 1498557 at index 000036DD
[SUMMARY] 1 SWF(s) in MD5:46a110548007e04f4043785ac4184558:RTF_embedded_object_0
00036DD
        [ADDR] SWF 1 at 0xc40  - FWS Header
                [FILE] Carved SWF MD5: 2498e9c0701dc0e461ab4358f9102bc5.swf

See also the article How to Extract Flash Objects From Malicious MS Office Documents, which shows how to use xxxswf.py in practice. You may simply use "pyxswf.py -o" instead of xxxswf.py.

rtfobj - a python tool to extract embedded objects from RTF files

rtfobj is a Python module to extract embedded objects from RTF files, such as OLE ojects. It can be used as a Python library or a command-line tool. It is part of the oletools package. 

See the oletools page for more info.

News

  • 2013-04-18 v0.02: fixed bug in rtfobj, added documentation
  • 2012-11-09 v0.01: 1st version of rtfobj, used by pyxswf
  • See changelog in source code for more info.

Download:

The archive is available on the project page.

Usage

Usage: rtfobj.py <file.rtf>

It extracts and decodes all the data blocks encoded as hexadecimal in the RTF document, and saves them as files named "object_xxxx.bin", xxxx being the location of the object in the RTF file.

Usage as python module: rtf_iter_objects(filename) is an iterator which yields a tuple (index, object) providing the index of each hexadecimal stream in the RTF file, and the corresponding decoded object. Example:

import rtfobj    
for index, data in rtfobj.rtf_iter_objects("myfile.rtf"):
    print 'found object size %d at index %08X' % (len(data), index)