From time to time, people report strange malicious documents which are not successfully analyzed by malware analysis tools nor by sandboxes. Let's investigate. (this is a follow-up to the post "Malfunctioning Malware" by Didier Stevens)
A few weeks ago, Katja Hahn reported such a malware sample on Twitter: link.
The file is available here (SHA256 41a84ee951ec7efa36dc16c70aaaf6b8e6d1bce8bd9002d0ab5236197eb3b32a).
Looking at the file in a hex editor, it is obvious that it contains some suspicious VBA macro code:
But none of the analysis tools such as oledump or olevba detects any VBA macro. Even MS Word does not show nor run any macro.
$ olevba 41a84ee951ec7efa36dc16c70aaaf6b8e6d1bce8bd9002d0ab5236197eb3b32a.bin olevba 0.42 - http://decalage.info/python/oletools [...] FILE: 41a84ee951ec7efa36dc16c70aaaf6b8e6d1bce8bd9002d0ab5236197eb3b32a.bin Type: OLE No VBA macros found.
In his post "Malfunctioning Malware", Didier Stevens showed how he managed to rebuild and extract the macro source code from the document, using some low-level techniques.
Let's investigate further: Is it a trick to avoid detection, or some kind of exploit?
First, let's look at the internal OLE directory in the document using oledir, a new tool that will be released shortly with my oletools package:
The OLE directory is an array structure in an OLE file containing names and locations of all the data streams stored in the file (see [MS-CFB]). Each entry in the directory may be in use, or empty.
In our case, one can see there are several empty directory entries (id 6 to 12), followed by a non-empty one (id 13). Since entries are usually created in sequence, this is an indication that several entries may have been used and deleted later on.
This also confirms that there is no registered VBA macro, because there is no storage named “VBA” in the file (see [MS-OVBA]).
Second, let's look at the list of sectors from the OLE FAT (File Allocation Table) using olemap, another new tool:
[…]
It appears that many sectors are marked as free, followed by used sectors. This is another indication that some streams may have been deleted.
This also confirms that the VBA macro code seen in the Hex editor above at the offset 0xE440 is located in an unused sector (number 71, offset 0xE400).
It is then very likely that the document used to contain VBA macros which have been deleted. It would have been odd for a malware author to send a document with deleted macros, because they would never be executed.
I believe the document has in fact been sanitized by an antivirus or some kind of cleaning tool. For example, the F-Prot and McAfee antivirus engines provide such a macro removal feature. Another example is ExeFilter.
To confirm this, let's try to find the original version of the malware sample before it was sanitized. This is not straightforward, because the file has been modified: the original sample has different hashes. Furthermore, its original name is unknown.
In order to find the original file, the best way is to identify specific strings that are shared with the sanitized file. For example, strings of printable characters are extracted by the malware analysis websites malwr.com and hybrid-analysis.com. In our case, I picked the string “DownloadDB403” as a good candidate:
The issue is, the current version of hybrid-analysis.com does not provide any string search feature.
Malwr.com can search strings within samples using the “string:...” syntax on its search page. However, for now that search feature often fails with a server error.
@PayloadSecurity gave me another tip, which proves to be very useful: simply use well-known search engines such as Google, by limiting the search to the hybrid-analysis.com or malwr.com websites. All malware analysis reports are already indexed by search engines, including the list of strings extracted from the analyzed files. So let's search our string “DownloadDB403” on Google, using this syntax:
Important: to get all the relevant results, it is necessary to click on “repeat the search with the omitted results included”.
This method is therefore quite handy when looking for malware samples containing a specific string.
Looking at all the results, I could find at least two other malware samples with exactly the same size as our mysterious document (78848 bytes), submitted during the same period, but with different hashes:
WIN_019_11.doc, SHA256 6780af202bf7534fd7fcfc37aa57e5a998e188ca7d65e22c0ea658c73fad36a2
WIN_019_11.doc, SHA256 f36cb4c31ee6cbce90b5d879cd2a97bcfe23a38d37365196c25e6ff6a9f8aaa6
This time, olevba confirms both files contain active VBA macros, with several characteristics of typical malware (see below). These two files are almost identical, apart from one line in the macro source code.
$ olevba -a 6780af202bf7534fd7fcfc37aa57e5a998e188ca7d65e22c0ea658c73fad36a2.bin olevba 0.42 - http://decalage.info/python/oletools Flags Filename ----------- ----------------------------------------------------------------- OLE:MASIHB-V 6780af202bf7534fd7fcfc37aa57e5a998e188ca7d65e22c0ea658c73fad36a2.bin (Flags: OpX=OpenXML, XML=Word2003XML, MHT=MHTML, TXT=Text, M=Macros, A=Auto-executable, S=Suspicious keywords, I=IOCs, H=Hex strings, B=Base64 strings, D=Dridex strings, V=VBA strings, ?=Unknown) =============================================================================== FILE: 6780af202bf7534fd7fcfc37aa57e5a998e188ca7d65e22c0ea658c73fad36a2.bin Type: OLE ------------------------------------------------------------------------------- VBA MACRO ThisDocument.cls in file: 6780af202bf7534fd7fcfc37aa57e5a998e188ca7d65e22c0ea658c73fad36a2.bin - OLE stream: u'Macros/VBA/ThisDocument' ------------------------------------------------------------------------------- VBA MACRO Module1.bas in file: 6780af202bf7534fd7fcfc37aa57e5a998e188ca7d65e22c0ea658c73fad36a2.bin - OLE stream: u'Macros/VBA/Module1' +------------+----------------------+-----------------------------------------+ | Type | Keyword | Description | +------------+----------------------+-----------------------------------------+ | AutoExec | AutoOpen | Runs when the Word document is opened | | Suspicious | Kill | May delete a file | | Suspicious | Open | May open a file | | Suspicious | Shell | May run an executable file or a system | | | | command | | Suspicious | vbNormal | May run an executable file or a system | | | | command | | Suspicious | CreateObject | May create an OLE object | | Suspicious | Chr | May attempt to obfuscate specific | | | | strings | | Suspicious | FileCopy | May copy a file | | Suspicious | SaveToFile | May create a text file | | Suspicious | Write | May write to a file (if combined with | | | | Open) | | Suspicious | Hex Strings | Hex-encoded strings were detected, may | | | | be used to obfuscate strings (option | | | | --decode to see all) | | Suspicious | Base64 Strings | Base64-encoded strings were detected, | | | | may be used to obfuscate strings | | | | (option --decode to see all) | | Suspicious | VBA obfuscated | VBA string expressions were detected, | | | Strings | may be used to obfuscate strings | | | | (option --decode to see all) | | IOC | codakes.exe | Executable file name (obfuscation: VBA | | | | expression) | | Base64 | 2'+ | Micr | | String | | | | VBA string | GE | Chr(80 - 9) + "E" | | VBA string | t | (Chr(100 + 10 + 6)) | | VBA string | TE | Chr(80 + 4) + "E" | | VBA string | mP | (Chr(80 + 20 + 9)) + "P" | | VBA string | \codakes.exe | Chr(90 + 2) + "codakes" + Chr(50 - 4) + | | | | "exe" | | VBA string | ConnectionDB | ("Connection") & "DB" | +------------+----------------------+-----------------------------------------+
If we compare the OLE directory structure of these two files with the initial sample, it appears that the directory entries are identical, apart from the streams and storages containing the VBA macros:
Furthermore, the document metadata is exactly the same, including the last saved timestamp:
This would not have been the case if the document had been modified in an editor such as MS Word.
After this investigation, it is almost certain that the initial file which could not be analyzed (SHA256 41a84ee951ec7efa36dc16c70aaaf6b8e6d1bce8bd9002d0ab5236197eb3b32a) is actually not the original functional version sent by the malware author, but a sanitized version where the malicious macro payload has been disabled by an antivirus or a similar tool.
Looking for similar malware samples containing the same specific strings, it is possible to discover the original, functional version of the file. And then malware analysis tools such as olevba or oledump can be used to extract and analyse the VBA macro source code.
As a general recommendation, if you encounter such a malware sample that cannot be analysed by usual tools, first check if any payload actually runs when opening it in MS Office. If not, then it is very likely a crippled version of the malware, and not the original one.
If in doubt, please report any bug or strange sample to tool developers (for olevba, use https://bitbucket.org/decalage/oletools/issues?status=new&status=open), just in case malware authors have found a new way to evade detection.
As for antivirus engines and file sanitization tools, this example shows that it is not enough to delete streams containing macros. It is much better to also overwrite the corresponding sectors with null chars or spaces, to avoid triggering malware detection signatures on cleaned files.