This article describes the Microsoft Office 97-2003 legacy/binary file formats (doc, xls, ppt), related security issues and useful resources.
The original location of this page is http://www.decalage.info/file_formats_security/office.
Last update: 2014-11-19 (created 2010-03-08)
MS Office binary formats are widely used:
- Word documents (.doc) for texts.
- Excel workbooks (.xls) for worksheets with numeric values.
- PowerPoint (.ppt) for presentations.
Except for very old MS Office versions, all these formats share the same basic container structure, either called OLE2, OLECF, structured storage or compound file/document.
MS Office also contain other applications such as MS Access which use different file formats not based on the OLE2 format.
Since MS Office 2007, new file formats based on XML (docx, xslx, pptx) are used by default. See the article about MS Office Open XML.
Main client applications
The main applications used to open MS Office files are part of the MS Office suite:
- MS Word for documents
- MS Excel for workbooks
- MS Powerpoint for presentations
Many alternative applications are also able to open MS Office files, such as OpenOffice, StarOffice, GNOME Office and KOffice.
Main security issues
- VBA macros
- Embedded OLE objects (particularly Package objects, which may contain any file or launch a shell command)
- Embedded Flash objects (SWF), which have been used by malware to exploit vulnerabilities in the Flash Player.
- Vulnerabilities in MS Office applications, exploited by malformed documents.
- Formulas in spreadsheets, which can open URLs, run code or external files. (see Comma Separated Vulnerabilities)
Examples of known vulnerabilities and exploits
- CVE-2011-1983: Use-after-free vulnerability in Microsoft Office 2007 SP2 and SP3, Office 2010 Gold and SP1, and Office for Mac 2011 allows remote attackers to execute arbitrary code via a crafted Word document, aka "Word Use After Free Vulnerability".
-
Analysis Techniques
- python-oletools: a package of python tools to analyze OLE files based on olefile, mainly for malware analysis and debugging. It includes olebrowse, a graphical tool to browse and extract OLE streams, oleid to quickly identify characteristics of malicious documents, olevba to extract VBA macros source code, olemeta to extract metadata, oletimes to extract timestamps and pyxswf to extract Flash objects (SWF) from OLE files.
- SSView: a visual MS OLE2 file parser and editor.
- OfficeCat: to detect many known exploits. (not updated anymore)
- OfficeMalScanner: to analyze suspicious MS Office documents and to extract VBA macros.
- OffVis: a Microsoft tool to analyze suspicious MS Office documents. See also http://www.microsoft.com/downloads/details.aspx?displaylang=en&FamilyID=...
- pyOLEscanner: a malware analysis tool
- officeparser: a Python tool to parse MS OLE files and extract VBA macros.
- oledump: python tool to extract VBA macros source code, detect embedded EXE, etc
- py-office-tools: A set of python scripts used to display the records inside of Excel and PowerPoint files
- olefile (formerly OleFileIO_PL): a Python module to parse, read and write MS OLE2 files.
- xlrd/xlwt: Python modules to read and create (not modify) MS Excel files.
- POIFS: a Java library to read and write MS OLE2 files.
- ruby-ole: a Ruby library to read and write MS OLE2 files.
- Rex::OLE: a Ruby library to read, write and create MS OLE2 files.
- libolecf / pyolecf: a C library with a python wrapper to parse and read MS OLE2 files, for forensics purposes.
- compoundfiles: a Python package to parse and read MS OLE2 files.
- LibForensics: a Python library including an MS OLE parser.
- ExeFilter: to sanitize MS Office files by removing macros and OLE Package objects.