Origapy is a Python interface to Origami, a PDF parser written in Ruby. It provides access to pdfclean.rb, in order to sanitize PDF files by disabling all active content (javascript, launch actions, embedded files, etc). Because Origami is a full PDF parser, it is much more effective than PDFiD (when sanitizing/disarming PDF files), but also quite slower.

Origapy uses a simple Python/Ruby bridge based on pipes, as described on this page.

WARNING: this is still work in progress. The current version of the Origami parser may trigger errors on some PDF files.

Changelog

  • 2010-09-12 v0.09: updated Origami engine to v1.0.0-beta3
  • 2009-10-02 v0.08: updated Origami engine to v1.0.0-beta1
  • 2009-09-30 v0.07: detects when a file is clean or cleaned, raise an exception when an error occurs

License

Origapy and Origami are open-source, published under GPL v3.

Download

Pick the attached file below.

Requirements

  • Python 2.x
  • Ruby 1.8.x

Install

Unzip and run install.bat on Windows, or "python setup.py install" on other platforms.

Usage

import origapy
pc = origapy.PDF_Cleaner()
pc.clean('file.pdf', 'cleaned.pdf')

 Alternatives