Origapy is a Python interface to Origami, a PDF parser written in Ruby. It provides access to pdfclean.rb, in order to sanitize PDF files by disabling all active content (javascript, launch actions, embedded files, etc). Because Origami is a full PDF parser, it is much more effective than PDFiD (when sanitizing/disarming PDF files), but also quite slower.
Origapy uses a simple Python/Ruby bridge based on pipes, as described on this page.
WARNING: this is still work in progress. The current version of the Origami parser may trigger errors on some PDF files.
Changelog
- 2010-09-12 v0.09: updated Origami engine to v1.0.0-beta3
- 2009-10-02 v0.08: updated Origami engine to v1.0.0-beta1
- 2009-09-30 v0.07: detects when a file is clean or cleaned, raise an exception when an error occurs
License
Origapy and Origami are open-source, published under GPL v3.
Download
Pick the attached file below.
Requirements
- Python 2.x
- Ruby 1.8.x
Install
Unzip and run install.bat on Windows, or "python setup.py install" on other platforms.
Usage
import origapy
pc = origapy.PDF_Cleaner()
pc.clean('file.pdf', 'cleaned.pdf')
|