I was just wondering... If someone asked you to ensure that a large PDF file is clean of any evil things.
How would you do this.
(assume the file passed virus scanner, and it legitimately contains some JS content - so scanning source for existence of it is not enough).
This is a curiosity/non urgent question for those with time on their hands to share their most secret white hat ways.
Trol (trolling the Forum since OMG ago)
But that protect you from bad PDF, it doesn't tell you IF PDF was BAD.
Also, if you then forward such PDF into the wild, it could contaminate others.
Soooo, all the great white hats with too much time on your hands, How do you (if you do) ensure that PDF does not contain a new cool exploit?
The only real solution to this would be to quarantine all attachments until such time that definitions are available to scan. This of course is disruptive to business workflow, so it's not a real good solution. Of course, you could always manually look at every PDF that comes in, if you have nothing else to do with your time, I really don't have the time to do that myself.
JS is not the only "bad" think in a PDF. Most dangerous ones out there actually exploit the reader and you can't make sure it's clean unless you open it in a hex editor... and even then, if it's large, you're probably screwed. So no, you can't protect yourself - UPDATE your reader software and hope for the best!!!!
Here's the method that I use in analysing malicious PDFs:
I use the tools pdfid and pdf-parser from here. I the past I have also used pdftk, but Im finding that less useful recently.
Thank you Lupin for reply, between your information and that from streaker (xorred also my thanks) my escapade into PDF documents might end up being successful (since it is goal/subject selected for fun - i also define success which is handy)
Something relevant that I just found on the Internet Storm Center blog, Lenny Zeltser's guide to analysing malicious documents, including PDFs!
There are some usage command lines for some of the tools I mentioned (not the same command lines I have used, but still useful nonetheless).
There are also a number of tools listed there I hadn't head of before, as well as a guide to analysing Microsoft Office documents, which I haven't had to do so far.
Analyzing Malicious Documents Cheat Sheet by Lenny Zeltser
Edit: Documented my PDF analysis process in more detail on my blog here.