[Lowerbounds, Upperbounds]

Algorithms are everywhere.

Imagine you have a large PDF document that you want to submit to some three-letter government agency. Besides having the document as one big PDF, the agency also asks you to submit the abstract and the biography as two separate PDFs.

Well, you are running out of time, and your coauthors are still working on their sections. You even wrote a make file to automate the document generation because you don’t want to run LaTeX multiple times manually to resolve all the references…

But how do you do make these two PDF files efficiently, preferably in an automatic way? Well, if you have Acrobat (the full version), then you can use Document -> Extract Pages to save the relevant pages into separate PDFs. But that’s not easy to automate. And what if you didn’t shell out the \$\$\$ to buy Acrobat?

Here is the good news. To extract pages from a PDF, we can use the free software pdftk. Suppose you know(*) the abstract happens to be on pages 2 and 3 and the biography spans from page 30 to the end. Here is an example usage for our imaginary situation (dont_ask is used to suppress the prompt to overwrite existing files):

pdftk foo.pdf cat 2-3 output abstract.pdf dont_ask
pdftk foo.pdf cat 30-end output biography.pdf dont_ask

Besides page extraction, pdftk can also catenate PDFs and perform several other PDF magic tricks. You can discover all these from reading this page. For example, you can discover that you can compute the number of pages of a PDF by

pdftk foo.pdf dump_data output - | grep NumberOfPages | cut -d' ' -f 2-

(*) How you can know these ranges automatically is another story to be written later. Hint: use pdftotext.

5 Comments

  1. Suresh
    17:19 on July 2nd, 2005

    even better, use the pdfpages package in tex. as easy as:

    \usepackage{pdfpages}

    \includepdf[specify pages, mode, how many per page]{file.pdf}

    pdflatex foo.tex

    Done !

  2. dmolnar
    18:57 on July 7th, 2005

    Thanks to both of you. I had a different but related problem recently – I needed to combine two PDFs into one document. I eventually found a solution, but it wasn’t as elegant as either of yours. Fortunately, the people reading the document didn’t care. :)

  3. Suresh
    1:31 on July 8th, 2005

    i should add that you can join multiple files using the same mechanism. merely add multiple \includepdf commands

  4. Tabbey
    14:03 on October 6th, 2008

    Hi,

    I am having trouble with ading page number to the pdf after merging them. Any suggestion?

    Thanks,
    Tabbey

  5. Tabbey, I’m pretty sure you can’t with pdftk, the page numbers are part of the document formatting, the best you could do is create a page number “image” and overlay the image ontop of the PDF.