Opened 5 years ago

Last modified 5 years ago

#17734 new enhancement

Use PDF.js to sanitize saved PDFs

Reported by: cypherpunks Owned by: tbb-team
Priority: Medium Milestone:
Component: Applications/Tor Browser Version:
Severity: Normal Keywords: PDF, sanity, exploits, deanon
Cc: Actual Points:
Parent ID: Points:
Reviewer: Sponsor:


PDF files often have malicious content within itself, which can be used to compromise the security of the system. Rendering PDF file with PDF.js is often slow and broken, which makes the users to open the files with native readers. Unfortunately, there is no good sanitizers: they are mostly written in script languages (s.a. Python and Ruby) and require their runtime. It will be very useful to have a tool to remove malicious content from downloaded PDF implemented in JS right in browser. Fortunately, Firefox already has PDF parsing library inside its PDF.js engine.

  • Use PDF.js to parse PDF into internal representation, but do not render it.
  • Decompress and destream it.
  • Remove all potentially malicious tags (this should be tweakable in popup window similar to "Clear Recent History"): JS, fonts, flash (and other objects calling plugins), 3d, forms, signatures, remote content, anything else not needed for rendering directly.
  • Recreate PDF file from the internal representation recomputing all the recomputable fields to destroy memory corruption exploits.

First I asked abou it in PDF.js bug tracker, they refused because it is not the goal of that project.

Child Tickets

Change History (1)

comment:1 Changed 5 years ago by cypherpunks

Summary: Use asm.js to sanitize saved PDFsUse PDF.js to sanitize saved PDFs
Note: See TracTickets for help on using tickets.