File Scanning Frameworks for Malware Analysis and Incident Response

This article presents several new open source frameworks meant to simplify static file scanning for malware analysis and incident response: MASTIFF, Viper, IRMA and a few others. Their goal is to provide an extensible framework to integrate many existing scanning tools.

Note: This article is about tools running locally on your own system for static file analysis. I will not describe online file analysis services such as VirusTotal or Wepawet, which have already been well covered (for example here and here). It is not either about dynamic malware analysis tools such as Cuckoo Sandbox (see here).

Please do not hesitate to contact me if you have comments or if you know another tool similar to the ones described in this article.

Table of contents

Why a file scanning framework?

There is a growing number of tools developed for malware analysis. For static analysis, we have now a large choice of tools to hash files, to identify their type, to parse their content, to extract metadata and main characteristics, to identify anomalies or signs of malicious content, to detect and extract embedded files, etc. To analyze a single file, it may become a bit tedious to run all the relevant tools and store the results in a convenient location.

The idea of a file scanning framework is to integrate all these tools into a single interface ready to use to simplify the work of the analyst, to automate parts of the work, and to store the results in a structured way (e.g. a database) for further analysis across a collection of samples. Ideally, the framework should be able to detect file types in order to run the corresponding tools in a smart way (e.g. PDF tools for PDF, PE tools for PE executables, etc). It should also handle archives, containers and embedded files automatically. And last but not least, such a framework should have a simple plugin system, so that the community could easily contribute new scanning features.


MASTIFF

From the home page: "MASTIFF is a static analysis framework that automates the process of extracting key characteristics from a number of different file formats. To ensure the framework remains flexible and extensible, a community-driven set of plug-ins is used to perform file analysis and data extraction. While originally designed to support malware, intrusion, and forensic analysis, the framework is well-suited to support a broader range of analytic needs. In a nutshell, MASTIFF allows analysts to focus on analysis rather than figuring out how to parse files."

Main features and characteristics:

  • command-line interface
  • developed in Python 2.x
  • platform: Linux, possibly Mac OSX (Windows is not mentioned)
  • extensible with plugins
  • Scan results are stored in one folder per scanned file, each plugin creates an output file.
  • Results are also stored in a structured SQLite database
  • specialized plugins (for EXE, PDF, Office, Zip, etc) are called according to the file type
  • Files within Zip archives are automatically extracted and analyzed recursively
  • pre-installed on REMnux

Plugins available in the current version 0.6.0 as of 11 July 2014:

  • File type identification: libmagic, python-magic, TrID
  • Generic analysis: file information, MD5/SHA1/SHA256 hashes, fuzzy hashing (ssdeep), strings, YARA, VirusTotal
  • PDF: metadata (exiftool), pdf-parser, pdfid
  • MS Office: metadata (exiftool), pyOLEscanner
  • EXE: PE info and resources (pefile), digital signature extraction (disitool), single-byte string deobfuscation (distorm)
  • Zip: info and file listing, extract files for recursive analysis

//digital-forensics.sans.org/blog/2013/05/07/mastiff-for-auto-static-malware-analysis
MASTIFF screenshot from http://digital-forensics.sans.org/blog/2013/05/07/mastiff-for-auto-stati...

Links:

Other articles and related tools:


VIPER

Viper is a "framework to store, classify and investigate binary files of any sort". It provides a command-line shell with a number of commands to open and store files, to analyze them using various tools and plugins written in Python, and to perform other actions. All results and files are stored in a database to keep them organized for further analysis such as finding similar samples. Unlike MASTIFF, the analysis is not automated, each action is launched by typing a simple command such as "yara" or "pe resources". The tools are well integrated in the shell and the real benefit is the database.

Main features and characteristics:

  • command-line interface with a shell
  • developed in Python 2.x
  • platform: Linux, possibly Mac OSX (Windows is not mentioned)
  • extensible with plugins
  • Scan results are stored in a structured SQLite database for later analysis
  • each sample can have tags added manually or by plugins
  • can look for similar samples in the database using various attributes
  • provided with PEiD and YARA signatures
  • pre-installed on REMnux (old version, see here for upgrade instructions)

Plugins available in the current version 1.0 as of 20 July 2014:

  • Generic analysis: file information, MD5/SHA1/SHA256 hashes, fuzzy hashing (pydeep), metadata (exiftool), extract strings and addresses (IPv4/v6, domain names), detect known shellcode patterns, send to Cuckoo Sandbox, launch IDA Pro, search on Malwr/Anubis/VirusTotal, XOR search, YARA scan
  • EXE: PE info and sections/imports/exports/resources (pefile), detect packer (PEiD), imphash, guess language, digital signature, find other samples with similar imphash/compile time/etc
  • HTML: extract scripts, links, iframes, Java applets, Flash objects, images (BeautifulSoup)
  • PDF: pdfid
  • MS Office: OLE security checks (oleid), OLE metadata (olemeta), OpenXML metadata, OLE timestamps (oletimes), extract streams/objects
  • Pictures: forensics analysis (imageforensic.org)
  • Java: parse JAR and IDX
  • Parse e-mails in MIME or Outlook MSG formats
  • RAT config decoders
  • extract samples from McAfee BUP files (quarantine)
  • and more

Viper screenshot
Viper screenshot from http://viper.li/

Links:

Other articles and related tools:


IRMA

From the home page: "IRMA intends to be an open-source platform designed to help identifying and analyzing malicious files. An important value with IRMA comes from you keep control over where goes / who gets your data. Once you install IRMA on your network, your data stays on your network. Each submitted files is analyzed in various ways. For now, we focus our efforts on multiple anti-virus engines, but we are working on other "probes" (feel free to submit your own)."

In short IRMA looks like an open-source VirusTotal-like system running on local servers, with the possibility to extend it like MASTIFF and Viper.

Main features and characteristics:

  • web interface to submit files for analysis and search results
  • modular architecture, probes can be distributed on Linux and Windows hosts for various scanning engines and load balancing
  • HTTP/JSON API and command-line interface
  • developed in Python 2.x
  • platform: Linux, Windows for probes
  • extensible with plugins
  • Scan results are stored in a MongoDB database for later analysis

Plugins available in the current version as of 21 July 2014:

  • Generic analysis: VirusTotal lookup by hash, identify known files (NSRL database)
  • EXE files: Static Analyzer (adapted from Cuckoo Sandbox)
  • Antivirus on Linux: Clam Antivirus, Comodo Antivirus, Eset Nod32 Business Edition, F-Prot, McAfee VirusScan Command Line Scanner, Sophos
  • Antivirus on Windows: Kasperksy, McAfee, Sophos, Symantec

//irma.quarkslab.com/preview.html
IRMA screenshot from http://irma.quarkslab.com/preview.html

Links:


Workbench

Workbench is an open-source framework to store and analyze all sorts of data related to an incident (e.g. files and PCAPs). The user interface is based on IPython notebooks. Workbench seems to have a lot of very interesting features similar to the other frameworks above. But to be fair, the current documentation is still a work in progress and I have not spent enough time trying it or digging into the code yet.

Note from the authors: "The project is new and looking for contributors and alpha users!".

Links:

 


Other File Scanning Frameworks

This section lists other open-source frameworks providing similar scanning features, but with different purposes. Part of their code could be reused for malware analysis frameworks.

Ragpicker

Ragpicker is a python tool to crawl websites providing malware samples, download them to build a collection, run a number of analysis plugins, and generate reports. Analysis plugins are quite similar to MASTIFF and Viper.

Plugins in current version 0.05.2 as of 14/07/2014:

  • File information: type, size, MD5, SHA-1, SHA-256
  • File Type identification: python-magic, file command
  • Antivirus: AVG, Avira, BitDefender, ClamAV, F-Prot
  • Archives: Zip and Rar are extracted automatically
  • EXE: PE header, imports, resources (pefile), suspicious API functions, digital signature verification, anti-debug detection, VM detection, packer detection (PEiD), unpacking for UPX/MEW/FSG (ClamAV), imphash, etc
  • PDF: pdfid
  • Other: YARA, Embedded files carving (hachoir-subfile), VirusTotal, sandboxing (Cuckoo)

ExeFilter

ExeFilter is a file scanning and cleaning framework developed in python, extensible with plugins. Its main purpose is to filter incoming files from removable devices, e-mails or web browsing, according to a configurable white list of allowed formats. It also cleans file formats that may embed active content (HTML, PDF, MS Office, etc), and processes archives recursively.

Plugins in current version 1.1.4-alpha6 as of 14/07/2014:

  • File Type identification: file extension + plugin parsers
  • Antivirus: ClamAV, F-Prot
  • PDF: disable JavaScript, embedded and attached files, launch/open actions (pdfid, origapy)
  • MS Office: remove VBA macros, detect OLE Package objects, detect encryption
  • RTF: remove OLE Package objects
  • HTML: remove javascript
  • other supported formats: Text, JPEG, PNG, BMP, GIF, AVI, WAV, MP3, MIME/EML, XML, MS Office Open XML

Please do not hesitate to contact me if you have comments or if you know another tool similar to the ones described in this article.