[Lowerbounds, Upperbounds]

Algorithms are everywhere.

Sometimes I am given a PDF and I wish I could change its margins. For example, when I print out a conference version of a paper to study in detail, usually there isn’t much space on the sides to write my own ideas. I heard Fermat had a solution for this type of issue :P , but in the modern days, I use sticky notes. The same situation surprisingly arises even if you use PDF annotating softwares. Free or not free (no link for them), they simply do not support the “enlarge canvas” feature yet. All you can do is to insert electronic sticky notes. So much for the metaphor!

Now there is actually a technology called “reflowable PDFs”, which are PDFs that contain enough information to support reflowing its content to fit any width. You can see this page for a screenshot of a reflowed document. Reflowing works wonder for text-dominant documents like novels, but the PDF has to be specially prepared for reflowing to really work well. (Try View->Zoom->Reflow in Adobe Reader 8.)

But if the given PDF is not reflowable, we can still use some graphics editor and edit the PDF interactively. After all, a PDF is mostly a vector graphics file, modulo some potential embedded bitmaps, and so you can edit it like any vector graphics file. I’ve seen it done in Illustrator, and I guess some of the free software competitors like Inkscape or Scribus can do this kind of editing too. But using a command line tool would be far easier.

For a long while, I actually know how to change the margins of a PDF. (Turns out lawyers have essentially the same problem.) The trick is to use the pdfpages package with pdfLaTeX. You may recall that this package allows us to include specific pages of a PDF into our own document. Magically, it has a scale option (that is really inherited from graphicx)… This gives us the first batch file: pdf-rescale.bat. Execute

pdf-rescale.bat foo.pdf 0.8

and you will get foo-0.8.pdf, which is foo.pdf shrunk to 80%. I’ve found that 80% is a good default and so the third argument is actually optional. You can also specify a scale of larger than 1 for some journal articles formatted for smaller paper sizes. I can also imagine fancier applications in which you combine this idea with the geometry package (see this post) to control of the final outcome.

But since we are not changing the actual size of the paper, shrinking the content for a larger margin actually means, well, shrunk content. Conference proceedings are already typeset in a small enough font. Since the proceedings are in two columns, I had the idea to print each column on its own page. That will definitely give us ample space. After a lot of different attempts since last year, I finally managed to get the second batch file: pdf-1c.bat. Execute

pdf-1c.bat foo.pdf

and you will get foo-1c.pdf, which is foo.pdf but with one column per page. No kidding.

The following PNG illustrates these scripts using the title page of an old paper. The upper-left shows the original, and the upper-right is after shrinking to 80%. The lower-left and lower-right are the two pages from the one-column version. I have also attached the source PDF from which I generated the PNG. The key feature is that this PDF retains the text of the original—try search for the word “degree” and you will see. (At one point my solution was to assemble a bunch of PNGs representing each single column into a huge PDF. Although you can annotate on it electronically, you cannot search in it. It also prints slowly.)

Finally, I also note that it is in fact possible to prepare reflowable PDFs using pdfLaTeX or dvipdfm (there seems to be trouble if going through a PS), but I will save it for a later story. For now, you can experience reflowing a LaTeX PDF with this PracTeX article, which is actually on the use of the pdfpages package. Having played with reflowing for a while, I would say that the real deal-breaker for reflowable PDFs lies in the mathematical expressions. You can see this TUGboat article for some excellent macros on fine-tuning mathematical expressions. Now, try to reflow it… :P

P.S. If you write the corresponding bash scripts, please send them to me so that I can, with full acknowledgement, post them here. The only reason why I stick to batch files is to avoid extra dependencies.

Change PDF Margin Scripts: pdf-rescale.bat pdf-1c.bat

Update: Joshua Dunfield has ported the rescaling script to Linux. See the comments.

Name: Srinivas Bollapragada
University: General Electric Global Research Center
Date: Friday, November 16, 2007
Time: 1:30 to 3:00 pm
Location: 388 Posner Hall

Title: Scheduling Commercials on Broadcast Television

Abstract:

Television networks sell advertising slots to clients by the shows on which the commercials air. The networks determine the exact location in the show that a commercial will air at a later stage, usually close to the airdate of the show. There are several criteria the networks must meet in scheduling commercials in a show. The schedule should be such that no two commercials promoting competing products from different clients air in the same break. The audience ratings tend to be higher at the start and end of a commercial break than during the middle of the break. Therefore, advertisers generally prefer the first and last positions in a commercial segment, to those in the middle. TV networks normally promise their clients an equitable rotation of commercials among the positions within a commercial break. The scheduling of commercials on shows is traditionally done manually and is a cumbersome, time-intensive, and error-prone process. We formulate the commercial scheduling problem as an integer program and develop near-optimal heuristics for automatically scheduling the commercials to meet all the requirements. We implemented our algorithm at the National Broadcasting Company (NBC). In addition to reducing sales personnel costs by automating the scheduling of commercials, our work has increased customer satisfaction by minimizing errors in meeting customer requirements.

Title: On subword containment
Speaker: Irina Gheorghiciuc
When: November 15, 16:30-17:30
Where: Wean Hall 5302

Abstract:

We consider words with letters from a q-ary alphabet A. The k-th subword complexity of a word w in A* is the number of distinct subwords of length k that appear as contiguous subwords of w.

We analyze subword complexity from both combinatorial and probabilistic viewpoints. Our first main result is a precise analysis of the expected
k-th subword complexity of a randomly-chosen word w in A^n. Our other main result describes, for w in A*, the degree to which one understands the set of all subwords of w, provided that one knows only the set of all subwords of some particular length k. This is joint work with Mark Ward.

November 16, 2007
Doru Balcan
12:00 PM, 1507 Newell-Simon Hall

Speaking Skills Talk

Title: Characterization of Robust Linear Coding Solutions

Abstract:

Many linear encoding approaches focus on representing information by approximating the underlying statistical density of the data, like PCA or ICA, or by developing encoding/decoding algorithms with desirable computational and representational properties, such as Fourier and wavelet-based codes. However, if the coefficients” precision is limited, optimality of the representation cannot be guaranteed. The issue of optimality under limited precision is a common practical concern, and it is also relevant to biological neural representations, where the coding precision of individual neurons has been reported to be as low as a few bits per spike. In this talk, I will present a new coding scheme called Robust (Linear) Coding, that makes use of arbitrarily many coding units to minimize reconstruction error. One characteristic of Robust Coding is that it can introduce redundancy in the code to compensate for channel noise, unlike PCA or ICA, which aim to reduce redundancy. We can completely analyze the under- and overcomplete cases, and identify necessary and sufficient conditions of the optimal encoder/decoder pair in each case. In the process, we find an exact formula for the lower bound of the error function, as well as an efficient procedure for computing the optima.

When: 2007-11-14 12:00 noon
Where: NSH 1507

Manfred Warmuth, UCSC

Online Algorithms for Principal Component Analysis

We design an online algorithm for Principal Component Analysis. In each trial the current instance is centered and projected into a probabilistically chosen low dimensional subspace. The regret of our online algorithm, i.e. total expected quadratic approximation error of the online algorithm minus the total quadratic approximation error of the batch algorithm, is bounded by a term whose dependence on the dimension of the instances is only logarithmic.

We first develop our methodology in the expert setting of online learning by giving an algorithm for learning as well as the best subset of experts of a certain size. This algorithm is then lifted to the matrix setting where the subsets of experts correspond to subspaces. The algorithm represents the uncertainty over the best subspace as a density matrix whose eigenvalues are bounded. The running time is O(n2) per trial, where n is dimension of the instances.

Joint work with Dima Kuzmin.

November 14, 2007
Virginia Vassilevska
3:00 PM, 1507 Newell-Simon Hall

Thesis Proposal

Title: Efficient Algorithms for Path Problems in Weighted Graphs

Abstract:

We consider some fundamental path problems on weighted directed graphs and give efficient algorithms for them. The simplest problem we discuss is that of finding a triangle of maximum weight in a node-weighted graph. We give strongly polynomial truly subcubic algorithms, i.e. ones running in O(n^{3-eps}) time for constant eps>0. Moreover, we can handle the all-pairs version of the problem: for all pairs of nodes i,j in the graph, return a maximum weight triangle passing through i and j. Our approach is then extended to obtain algorithms for other all-pairs path problems: finding all-pairs bottleneck distances and all-pairs minimum nondecreasing paths in a directed edge-weighted graph. Our techniques even allow us to compute O(log n) bits of each entry of the distance product of two n x n matrices in truly subcubic time.

We also consider the single source versions of path problems. For the single source minimum nondecreasing paths problem we obtain the first linear time algorithm in the RAM model of computation. In the comparison based model of computation, our algorithm can be made to run in O(m loglog n) time on graphs with n nodes and m edges. For the single source, single destination bottleneck paths problem on undirected graphs we give a linear time algorithm in the comparison model of computation. Such an algorithm has been known to exist. We hope to extend our ideas to directed graphs, or the more general single source version of the problem.

We propose some ideas for further research among which is the problem of CFL-reachability, and some purely combinatorial and parallel algorithms for the path problems studied so far.

Thesis Committee:
Guy Blelloch, Chair
Manuel Blum
Anupam Gupta
Uri Zwick, Tel Aviv University