Imagine you have a large PDF document that you want to submit to some three-letter government agency. Besides having the document as one big PDF, the agency also asks you to submit the abstract and the biography as two separate PDFs.
Well, you are running out of time, and your coauthors are still working on their sections. You even wrote a make file to automate the document generation because you don’t want to run LaTeX multiple times manually to resolve all the references…
But how do you do make these two PDF files efficiently, preferably in an automatic way? Well, if you have Acrobat (the full version), then you can use Document -> Extract Pages to save the relevant pages into separate PDFs. But that’s not easy to automate. And what if you didn’t shell out the \$\$\$ to buy Acrobat?
Here is the good news. To extract pages from a PDF, we can use the free software pdftk. Suppose you know(*) the abstract happens to be on pages 2 and 3 and the biography spans from page 30 to the end. Here is an example usage for our imaginary situation (dont_ask is used to suppress the prompt to overwrite existing files):
pdftk foo.pdf cat 2-3 output abstract.pdf dont_ask
pdftk foo.pdf cat 30-end output biography.pdf dont_ask
Besides page extraction, pdftk can also catenate PDFs and perform several other PDF magic tricks. You can discover all these from reading this page. For example, you can discover that you can compute the number of pages of a PDF by
pdftk foo.pdf dump_data output - | grep NumberOfPages | cut -d' ' -f 2-
(*) How you can know these ranges automatically is another story to be written later. Hint: use pdftotext.
17:19 on July 2nd, 2005
even better, use the pdfpages package in tex. as easy as:
\usepackage{pdfpages}
\includepdf[specify pages, mode, how many per page]{file.pdf}
…
pdflatex foo.tex
Done !
18:57 on July 7th, 2005
Thanks to both of you. I had a different but related problem recently – I needed to combine two PDFs into one document. I eventually found a solution, but it wasn’t as elegant as either of yours. Fortunately, the people reading the document didn’t care.
1:31 on July 8th, 2005
i should add that you can join multiple files using the same mechanism. merely add multiple \includepdf commands
14:03 on October 6th, 2008
Hi,
I am having trouble with ading page number to the pdf after merging them. Any suggestion?
Thanks,
Tabbey
19:42 on August 30th, 2009
Tabbey, I’m pretty sure you can’t with pdftk, the page numbers are part of the document formatting, the best you could do is create a page number “image” and overlay the image ontop of the PDF.