Tools to extract VBA Macro source code from MS Office Documents

This article presents several tools that can be used to extract VBA Macros source code from MS Office Documents, for malware analysis and forensics. It also provides an overview of how VBA Macros are stored.

A few years ago, it was not easy to find tools to extract VBA Macros, mainly because the file format was not documented. Also, using VBA Macros was not so trendy anymore among malware writers.

But since Microsoft published official specifications of the VBA Macro storage [MS-OVBA], several tools have been developed to extract VBA code. Moreover, in 2013-2014 several reports highlight a recent resurgence of malware involving VBA macros. So it may be useful to have a new look at how to extract those macros.

The first section below provides an overview of how VBA Macros are stored in various types of documents. The following sections present several tools that can be used to extract VBA source code.

VBA Macros storage overview

MS Office 97-2003 documents

Most of the MS Office 97-2003 documents use the same underlying file format called Microsoft Compound File Binary (CFB) file format, or simply OLE2 file format.

An OLE file can be seen as a mini file system or a Zip archive: It contains streams of data that look like files embedded within the OLE file. Each stream has a name. For example, the main stream of a MS Word document containing its text is named "WordDocument".

An OLE file can also contain storages. A storage is a folder that contains streams or other storages. For example, a MS Word document with VBA macros has a storage called "Macros".

A typical MS Word document with VBA macros may look like this:

VBA macros are normally contained in a VBA project structure, located in different places depending on the document type:

  • Word 97-2003: in a storage called "Macros", at the root of the OLE file.
  • Excel 97-2003: in a storage called "_VBA_PROJECT_CUR", at the root of the OLE file.
  • PowerPoint 97-2003: VBA macros are stored within the binary structure of the presentation, not in an OLE storage.

According to [MS-OVBA], a VBA project root (e.g. "Macros"or "_VBA_PROJECT_CUR") must contain at least the following elements (case-insensitive names):

  • a VBA storage
  • a PROJECT stream
  • two streams VBA/_VBA_PROJECT and VBA/dir (within VBA)

The VBA source code is stored in one ore several streams located in the VBA storage (for example "ThisDocument" in the sample above). The code is not stored in clear text: It is compressed using a specific run-length encoding algorithm described in [MS-OVBA]. Moreover, the compressed content does not start at the beginning of those streams. It is necessary to parse binary structures in the VBA/dir stream (also compressed with the same RLE algorithm) in order to find the exact offset of the compressed VBA content in the code streams.

This is why extracting VBA source code is not straightforward. Luckily, several open-source tools are now available for this task.

Some tools such as oledump (see below) use a simpler heuristic, looking for any stream containing the string "\x00Attribut", which is in fact the very first VBA keyword found at the beginning of the code of most macros. But since this keyword is VBA code, it may be possible to tweak macros to evade detection.

MS Office 2007+ documents

MS Office 2007+ file formats, also called MS Open XML, are quite different because they are made of XML files stored in Zip archives.

However, VBA macros are usually stored in a binary OLE file within the Zip archive, called "vbaProject.bin". Then the vbaProject.bin OLE file contains the same VBA project structure as described above for MS Office 97-2003 documents.

Here again, the vbaProject.bin file may be stored in different places in the Zip archive, depending on the document type:

  • Word 2007+: word/vbaProject.bin
  • Excel 2007+: xl/vbaProject.bin
  • PowerPoint 2007+: ppt/vbaProject.bin

Here is the content of a sample Word 2007+ document with VBA macros:

And here is the content of the OLE file vbaProject.bin:

Important note: the name "vbaProject.bin" is used by default by MS Office, but the Open XML standard and MS Office allow any file name, as long as the relationships are defined accordingly in the XML files (see this article page 18 for details). Therefore, it is not safe to find this file only by name.


olevba

Usage:

olevba.py <file>

Sample screenshot:


oledump

  • Licence: open-source, public domain
  • Language: Python 2.x
  • Platform: any
  • Supported formats:
    • Word 97-2003: Yes
    • Excel 97-2003: Yes
    • PowerPoint 97-2003: No
    • Word/Excel/PowerPoint 2007+: not directly, must extract vbaProject.bin first
  • Download: http://videos.didierstevens.com/2014/08/26/oledump-py-beta/
  • Pre-requisites: olefile (can be installed with "pip install olefile")

Usage:

  1. If the file is an OpenXML document (MS Office 2007+), first find and unzip vbaProject.bin using any zip tool.
  2. Run "oledump.py <file>" to see the list of OLE streams. The ones containing VBA macros are tagged with "M".
  3. Run "oledump.py <file> -v -s i", i being the index number of the stream with VBA macros. The tool should display the VBA source code.

Sample screenshot:


officeparser

  • Licence: open-source, MIT
  • Language: Python 2.x
  • Platform: any
  • Supported formats:
    • Word 97-2003: Yes
    • Excel 97-2003: Yes
    • PowerPoint 97-2003: No
    • Word/Excel/PowerPoint 2007+: Yes
  • Website: https://github.com/unixfreak0037/officeparser

Usage:

  1. Run "officeparser.py --extract-macros <file>" to extract VBA code.
  2. The code is saved in one or several files matching VBA stream names.

Sample screenshot:


OfficeMalScanner

Usage:

  1. If the file is an OpenXML document (MS Office 2007+), first find and unzip vbaProject.bin using "OfficeMalScanner <file> inflate" or any zip tool.
  2. Run "OfficeMalScanner <file> info" to extract VBA code.
  3. The code is saved in a subfolder matching the file name

Sample screenshot:


gsf_vba_dump

TODO


sigtool

TODO