Archive for November, 2007

Theory Seminar 2007-11-30

Friday, November 30th, 2007
3:30pm
WEH 7220

Title: Efficient algorithms for agnostic learning (aka learning with arbitrary noise)

Adam Kalai
Georgia Tech

Abstract:
The talk will give an overview of theoretical results for learning from extremely n01sy data, including a boosting procedure and algorithms for learning decision trees, halfspaces, and parity functions.

Comments

Theory Lunch 2007-11-28

Who: Kanat Tangwongsan
Where: NSH 1507
When: 2007-11-28 12:00

L_p-Norm Set Cover

Abstract:

In many optimization problems, a solution can be viewed as ascribing a “cost'’ to each client, and the goal is to optimize some aggregation of the per-client costs. We often optimize some $L_p$-norm (or some other symmetric convex function or norm) of the vector of costs—though different applications may suggest different norms to use. Ideally, we want to obtain a solution that optimizes several norms simultaneously.

We will examine a natural problem in this framework: the $L_p$ set-cover problem. Suppose we pick sets $S_1, S_2, ldots, S_t$ (in that order), and define the {em cover time} of an element $e$ to be $C_e = \min{i | e in S_i}$, the index of the first set that covers $e$. Minimizing the maximum cover time $|C|_infty$ gives us the classical min-set cover, for which the greedy algorithm is an $O(log n)$ approximation (which is the best possible). Minimizing the average (or total) cover time $\sum_e C_e = |C|_1$ gives the min-sum set-cover (mssc) problem, for which the greedy algorithm gives a 4-approximation (which also is tight). This leads us to a natural question:

How well does the greedy algorithm perform for the $L_p$ set-cover problem, where the objective function is $|C|_p = (\sum_e C_e^p)^{1/p}$ for $1 \leq p \leq \infty$?

In this talk, we will give tight results for this problem: the greedy algorithm \emph{simultaneously} gives an $O(p)$-approximation for $L_p$-Set-Cover for all values of $1 leq p < infty$ (even for the weighted version), and that there are simple examples where this is tight for all values of $p$. In fact, we also show that for every fixed value of $p$, no efficient algorithm can obtain an approximation guarantee better than $\Omega(p)$ under suitable complexity assumptions. We will show how to apply our analysis techniques to the more general {\em submodular set cover} and prove some results for the so-called {\em pipelined set cover} problem.

This is joint work with Daniel Golovin, Anupam Gupta, and Amit Kumar.

Comments

ACO Seminar 2007-11-29

Title: Log-Concave Random Graphs
Speaker: Alan Frieze
When: November 29, 16:30-17:30
Where: Wean Hall 5302
Abstract:

We propose the following model of a random graph on $n$ vertices. Let $F$ be a distribution in $R_+^{n(n-1)/2}$ with a coordinate for every pair $ij$ with $1 \le i,j \le n$. Then $G_{F,p}$ is the distribution on graphs with $n$ vertices obtained by picking a random point $X$ from a distribution with a log-concave density $f$ and defining a graph on $n$ vertices whose edges
are pairs $ij$ for which $X_{ij} \le p$. The standard Erd\H{o}s-R\’{e}nyi model is the special case when $F$ is the indicator function for the We thus obtain an interesting connection between convex geometry and random graphs.

When $f$ is down-monotone we can determine the connectivity and matching thresholds up to a constant factor. When $f$ is the indicator function for a right simplex then we can obtain more precise results on connectivity and the diameter.

If $X$ is used to define weights for an optimization problem and $f$ is “column independent” then we can whp solve the ATSP asymptotically.

Joint work with Santosh Vempala and Juan Vera

Comments

Thesis Proposal 2007-11-29

Title: Optimization under Uncertainty (Thesis Proposal)
Speaker: Vineet Goyal
When: November 29, 10:30am
Where: Room 388, Tepper School

Abstract

Most optimization problems in real life do not have accurate estimates of the problem parameters at the optimization phase. Stochastic optimization models have been studied widely in the literature to address this problem. The expected value optimization is reasonable in a repeated decision making framework. However, it does not sufficiently guard against the worst case future in more risk averse or one-shot applications. The broad purpose of this thesis is to study optimization approaches under uncertainty that overcome this shortcoming of a traditional stochastic optimization model.

We consider new models of uncertainty namely, the “demand-robust” model and the “chance constrained” model and introduce these in the framework of general covering problems. Here, uncertainty is considered in the right hand side of the constraints and is referred to as the demand uncertainty. In the two-stage model of “demand-robustness”, motivated by recent work in two-stage stochastic optimization problems, we are interested in finding a solution such that the worst case cost over all realizations of uncertainty is minimized. We prove a general structural lemma about special types of first stage solutions and provide approximation algorithms for covering problems such as Steiner tree, min-cut, minimum multi-cut, vertex cover and facility location. While some of the algorithms draw from the rounding approaches developed recently for stochastic optimization problems, we also show new applications of the old metric rounding techniques and develop novel charging arguments using Gomory-Hu cut trees for cut problems.

The robust optimization approach guards against the worst case future but tends to be overly conservative if there are some outlier scenarios. To overcome this, we consider a chance constrained model where we are given a reliability level p and the idea is to select a “p fraction” of the scenarios and find a robust solution on the selected scenarios. The remaining (1-p) fraction of the scenarios are considered as outliers and can be ignored. We consider both one-stage and two-stage chance constrained covering problems with demand-uncertainty. When the demand-uncertainty follows an independent distribution, we show that one-stage chance constrained covering problems reduce to a weighted version of well studied partial covering problems and we obtain constant approximations for the Steiner tree and facility location problems in this model. We also show hardness results for both one-stage and two-stage chance constrained problem in a “correlated” demand-uncertainty model.

In both the above models, we consider uncertainty in the right hand side of the constraints. We extend our work to consider uncertainty in the constraint matrix referred to as data uncertainty and propose to study a chance constrained knapsack problem where the size of each item is random. Finally, we propose to use the theoretical framework to develop a practical solution methodology to solve the planning problem for post-disaster logistics. This problem combines
aspects of both demand and data uncertainty. We aim to implement our solution to solve the real-life problem of earthquake preparedness of Turkey and Israel where seismic risk is a major concern.

Comments

Ipe 6.0pre30 works with pdfTeX 1.4

A month ago I wrote about how to install MiKTeX 2.5 concurrently with 2.6 so that I can run Ipe 6.0pre28 on my main production machine.

Well, Ipe 6.0pre30 has just been released yesterday. From the announcement email, the major features of this release include:

* Ipe now works with pdftex 1.40 (in MikTeX 2.6 and texlive 2007).

* Pascal-Nicolas Becker, Frederik Hermans, and Damian Schmidt
implemented the missing forms of intersection snapping (for arcs
and Bezier curves), with bug fix by Jonathan Backer.

* Fixed the broken user interface (fields were too small to contain
the text) on Unix (removed a static QFont) (bug #191).

* Figtoipe now distributed separately.

I note that the source and the windows packages are already available at the LuaForge download site.

Thanks Otfried!

Comments

Thesis Oral 2007-11-29

November 29, 2007

Srinath Sridhar

5:30 PM, 8220 Wean Hall

Thesis Oral

Title: Algorithms for Analyzing Intraspecific Sequence Variation

Abstract:

Analysis of human genetic variation has gained significant momentum due to the success of many large-scale sequencing projects. Within the last five years, about 4 million single nucleotide polymorphisms (SNPs) have been genotyped over hundreds of individuals belonging to several ethnic groups. Recently, researchers have made significant progress in detecting and cataloging copy number polymorphisms (CNPs) as well. These large-scale efforts have now made it possible to analyze genetic variation within a single species, mainly humans, at very fine-scales. In our work, we address a fundamental question: how can we use these genetic differences to make inferences about the recent history of a species and its genome?

In this work, we primarily focus on two methods to analyze SNP variation data within humans. We develop maximum parsimony phylogenetic tree reconstruction algorithms that are specifically catered to work on SNP data. Such a phylogeny should cluster closely related individuals (perhaps an ethnic group) together. Therefore, these techniques are widely used to answer questions of human migration. Recently, these methods have also been applied to find regions in the human genome that are rapidly evolving. A second method to understand genetic variation is to directly cluster the SNP data based on the property that individuals within a cluster would share similar genotypes (and phenotypes).

Mathematically, we work with two variants of the phylogeny reconstruction problem, both of which are NP-complete. The first variant is equivalent to finding a Steiner minimum tree on a hypercube and the second is a generalization of this problem. We solve the two variants in polynomial time when the size of the phylogeny (Steiner minimum tree) is small. For the clustering part, we work with two related problems. We solve the first problem by finding the max-cut of a graph, a well-known NP-complete problem. We can show that if the graph is generated with the clusters planted, then our algorithm returns the max-cut in polynomial time given enough data. The second problem is more general in that individuals (vertices) can fractionally belong to either of the clusters. We solve this problem by attempting to maximize the likelihood function using an initial solution that we show to be analytically correct under some theoretical assumptions.

Thesis Committee:
Guy E. Blelloch, Co-Chair
Russell Schwartz, Co-Chair
R. Ravi
Eran Halperin, International Computer Science Institute, Berkeley

Comments

Theory Seminar 2007-11-16

Friday November 16th, 2007
3:30pm
WEH 7220

Satisfiable k-CNF Distributions above the Threshold

Danny Vilenchik
Tel-Aviv University

Abstract: k-CNF formulas with m clauses over n variables show a phase transition phenomenon in the following aspect. There exists d=d(k,n) such that almost all formulas with m/n > d are not satisfiable whereas most formulas with m/n < d are. While random k-CNFs below the threshold received much attention in recent years, above-threshold distributions over satisfiable k-CNFs were far less studied. One possible reason is that it is not clear how to define such distributions in a natural way, while keeping the problem approachable in some sense.

In this talk we will survey recent developments in this area. We shall concentrate on three distributions: the planted k-SAT distribution, the uniform distribution over satisfiable k-CNF formulas (in the regime m/n > d(k,n)), and an “on-line” version of the uniform distribution.

In all cases we are able to show that unlike the typically complicated structure of the solution space of below-threshold formulas, the above threshold formulas have a simple, basically single-solution structure. We also present some algorithmic ideas that are useful for solving certain clause-density regimes of these distributions.

Based on joint works with: Amin Coja-Oghlan, Uriel Feige, Abraham Flaxman, Michael Krivelevich, and Benjamin Sudakov.

Comments

Changing PDF Margins With The pdfpages Package

Sometimes I am given a PDF and I wish I could change its margins. For example, when I print out a conference version of a paper to study in detail, usually there isn’t much space on the sides to write my own ideas. I heard Fermat had a solution for this type of issue :P , but in the modern days, I use sticky notes. The same situation surprisingly arises even if you use PDF annotating softwares. Free or not free (no link for them), they simply do not support the “enlarge canvas” feature yet. All you can do is to insert electronic sticky notes. So much for the metaphor!

Now there is actually a technology called “reflowable PDFs”, which are PDFs that contain enough information to support reflowing its content to fit any width. You can see this page for a screenshot of a reflowed document. Reflowing works wonder for text-dominant documents like novels, but the PDF has to be specially prepared for reflowing to really work well. (Try View->Zoom->Reflow in Adobe Reader 8.)

But if the given PDF is not reflowable, we can still use some graphics editor and edit the PDF interactively. After all, a PDF is mostly a vector graphics file, modulo some potential embedded bitmaps, and so you can edit it like any vector graphics file. I’ve seen it done in Illustrator, and I guess some of the free software competitors like Inkscape or Scribus can do this kind of editing too. But using a command line tool would be far easier.

For a long while, I actually know how to change the margins of a PDF. (Turns out lawyers have essentially the same problem.) The trick is to use the pdfpages package with pdfLaTeX. You may recall that this package allows us to include specific pages of a PDF into our own document. Magically, it has a scale option (that is really inherited from graphicx)… This gives us the first batch file: pdf-rescale.bat. Execute

pdf-rescale.bat foo.pdf 0.8

and you will get foo-0.8.pdf, which is foo.pdf shrunk to 80%. I’ve found that 80% is a good default and so the third argument is actually optional. You can also specify a scale of larger than 1 for some journal articles formatted for smaller paper sizes. I can also imagine fancier applications in which you combine this idea with the geometry package (see this post) to control of the final outcome.

But since we are not changing the actual size of the paper, shrinking the content for a larger margin actually means, well, shrunk content. Conference proceedings are already typeset in a small enough font. Since the proceedings are in two columns, I had the idea to print each column on its own page. That will definitely give us ample space. After a lot of different attempts since last year, I finally managed to get the second batch file: pdf-1c.bat. Execute

pdf-1c.bat foo.pdf

and you will get foo-1c.pdf, which is foo.pdf but with one column per page. No kidding.

The following PNG illustrates these scripts using the title page of an old paper. The upper-left shows the original, and the upper-right is after shrinking to 80%. The lower-left and lower-right are the two pages from the one-column version. I have also attached the source PDF from which I generated the PNG. The key feature is that this PDF retains the text of the original—try search for the word “degree” and you will see. (At one point my solution was to assemble a bunch of PNGs representing each single column into a huge PDF. Although you can annotate on it electronically, you cannot search in it. It also prints slowly.)

Finally, I also note that it is in fact possible to prepare reflowable PDFs using pdfLaTeX or dvipdfm (there seems to be trouble if going through a PS), but I will save it for a later story. For now, you can experience reflowing a LaTeX PDF with this PracTeX article, which is actually on the use of the pdfpages package. Having played with reflowing for a while, I would say that the real deal-breaker for reflowable PDFs lies in the mathematical expressions. You can see this TUGboat article for some excellent macros on fine-tuning mathematical expressions. Now, try to reflow it… :P

P.S. If you write the corresponding bash scripts, please send them to me so that I can, with full acknowledgement, post them here. The only reason why I stick to batch files is to avoid extra dependencies.

Change PDF Margin Scripts: pdf-rescale.bat pdf-1c.bat

Update: Joshua Dunfield has ported the rescaling script to Linux. See the comments.

Comments (6)

OR Seminar 2007-11-16

Name: Srinivas Bollapragada
University: General Electric Global Research Center
Date: Friday, November 16, 2007
Time: 1:30 to 3:00 pm
Location: 388 Posner Hall

Title: Scheduling Commercials on Broadcast Television

Abstract:

Television networks sell advertising slots to clients by the shows on which the commercials air. The networks determine the exact location in the show that a commercial will air at a later stage, usually close to the airdate of the show. There are several criteria the networks must meet in scheduling commercials in a show. The schedule should be such that no two commercials promoting competing products from different clients air in the same break. The audience ratings tend to be higher at the start and end of a commercial break than during the middle of the break. Therefore, advertisers generally prefer the first and last positions in a commercial segment, to those in the middle. TV networks normally promise their clients an equitable rotation of commercials among the positions within a commercial break. The scheduling of commercials on shows is traditionally done manually and is a cumbersome, time-intensive, and error-prone process. We formulate the commercial scheduling problem as an integer program and develop near-optimal heuristics for automatically scheduling the commercials to meet all the requirements. We implemented our algorithm at the National Broadcasting Company (NBC). In addition to reducing sales personnel costs by automating the scheduling of commercials, our work has increased customer satisfaction by minimizing errors in meeting customer requirements.

Comments

ACO Seminar 2007-11-15

Title: On subword containment
Speaker: Irina Gheorghiciuc
When: November 15, 16:30-17:30
Where: Wean Hall 5302

Abstract:

We consider words with letters from a q-ary alphabet A. The k-th subword complexity of a word w in A* is the number of distinct subwords of length k that appear as contiguous subwords of w.

We analyze subword complexity from both combinatorial and probabilistic viewpoints. Our first main result is a precise analysis of the expected
k-th subword complexity of a randomly-chosen word w in A^n. Our other main result describes, for w in A*, the degree to which one understands the set of all subwords of w, provided that one knows only the set of all subwords of some particular length k. This is joint work with Mark Ward.

Comments

Student Seminar 2007-11-16

November 16, 2007
Doru Balcan
12:00 PM, 1507 Newell-Simon Hall

Speaking Skills Talk

Title: Characterization of Robust Linear Coding Solutions

Abstract:

Many linear encoding approaches focus on representing information by approximating the underlying statistical density of the data, like PCA or ICA, or by developing encoding/decoding algorithms with desirable computational and representational properties, such as Fourier and wavelet-based codes. However, if the coefficients'’ precision is limited, optimality of the representation cannot be guaranteed. The issue of optimality under limited precision is a common practical concern, and it is also relevant to biological neural representations, where the coding precision of individual neurons has been reported to be as low as a few bits per spike. In this talk, I will present a new coding scheme called Robust (Linear) Coding, that makes use of arbitrarily many coding units to minimize reconstruction error. One characteristic of Robust Coding is that it can introduce redundancy in the code to compensate for channel noise, unlike PCA or ICA, which aim to reduce redundancy. We can completely analyze the under- and overcomplete cases, and identify necessary and sufficient conditions of the optimal encoder/decoder pair in each case. In the process, we find an exact formula for the lower bound of the error function, as well as an efficient procedure for computing the optima.

Comments

Theory Lunch 2007-11-14

When: 2007-11-14 12:00 noon
Where: NSH 1507

Manfred Warmuth, UCSC

Online Algorithms for Principal Component Analysis

We design an online algorithm for Principal Component Analysis. In each trial the current instance is centered and projected into a probabilistically chosen low dimensional subspace. The regret of our online algorithm, i.e. total expected quadratic approximation error of the online algorithm minus the total quadratic approximation error of the batch algorithm, is bounded by a term whose dependence on the dimension of the instances is only logarithmic.

We first develop our methodology in the expert setting of online learning by giving an algorithm for learning as well as the best subset of experts of a certain size. This algorithm is then lifted to the matrix setting where the subsets of experts correspond to subspaces. The algorithm represents the uncertainty over the best subspace as a density matrix whose eigenvalues are bounded. The running time is O(n2) per trial, where n is dimension of the instances.

Joint work with Dima Kuzmin.

Comments

Thesis Proposal 2007-11-14

November 14, 2007
Virginia Vassilevska
3:00 PM, 1507 Newell-Simon Hall

Thesis Proposal

Title: Efficient Algorithms for Path Problems in Weighted Graphs

Abstract:

We consider some fundamental path problems on weighted directed graphs and give efficient algorithms for them. The simplest problem we discuss is that of finding a triangle of maximum weight in a node-weighted graph. We give strongly polynomial truly subcubic algorithms, i.e. ones running in O(n^{3-eps}) time for constant eps>0. Moreover, we can handle the all-pairs version of the problem: for all pairs of nodes i,j in the graph, return a maximum weight triangle passing through i and j. Our approach is then extended to obtain algorithms for other all-pairs path problems: finding all-pairs bottleneck distances and all-pairs minimum nondecreasing paths in a directed edge-weighted graph. Our techniques even allow us to compute O(log n) bits of each entry of the distance product of two n x n matrices in truly subcubic time.

We also consider the single source versions of path problems. For the single source minimum nondecreasing paths problem we obtain the first linear time algorithm in the RAM model of computation. In the comparison based model of computation, our algorithm can be made to run in O(m loglog n) time on graphs with n nodes and m edges. For the single source, single destination bottleneck paths problem on undirected graphs we give a linear time algorithm in the comparison model of computation. Such an algorithm has been known to exist. We hope to extend our ideas to directed graphs, or the more general single source version of the problem.

We propose some ideas for further research among which is the problem of CFL-reachability, and some purely combinatorial and parallel algorithms for the path problems studied so far.

Thesis Committee:
Guy Blelloch, Chair
Manuel Blum
Anupam Gupta
Uri Zwick, Tel Aviv University

Comments

Word Capitalization in Emacs and PowerPoint

One of my favorite keyboard shortcuts in PowerPoint is Shift-F3. It rotates the capitalization of the current word among lowercased, Capitalized and UPPERCASED. Note that you do not have to select the word, and the cursor does not have to be at the beginning of the word either(*).

But the title of this post clearly said Emacs, didn’t it? Well, since I missed this feature in Emacs, I just implemented it in a pinch and bound it to Shift-F3. :P The code is in this page in EmacsWiki. I have only tested it on GNU Emacs, and so I can only hope that it works in XEmacs.

(*) Actually, you rarely have to select a word in PowerPoint in order to format it. Commands like Bold Ctrl-B and Italic Ctrl-I will act on the current word when there is no selection. Also, similar behavior is in fact present in all Office applications. I just picked PowerPoint.

Comments

CS Pedagogy Colloquium 2007-11-06

Date: Tuesday November 6th, 2007
Place: NSH 3305
Time: 3PM - 4PM

Title: Assessment-centered instruction: Using assessment to support educational effectiveness

Dr. Anne Fay, Director of Assessments
Eberly Center for Teaching Excellence & The Office of Technology for Education (OTE)

Abstract:

Research in the learning sciences has changed our view of education from a teacher-centered to a learner-centered perspective. Designing our instructional practices with the learner at the center changes significantly how we teach, what we teach, and where and when learning takes place. Fundamental to this learner-centered view is the role of assessment, whereby assessment and instruction are tightly interwoven and interdependent activities, and assessment is no longer an afterthought in the instructional cycle. In fact, in a learner-centered approach, that is, thinking in terms of how students learn and what we want them to know and be able to do, assessment should guide our instructional practices, and not the other way around. In this talk I will explore how our new understanding of learning can be used to change and expand our instructional practices and change how we think about and engage in assessment activities. I will present examples of instructional innovations and assessment practices to illustrate how integrating assessment of student learning into our instructional practices can make teaching and learning more effective and efficient.

Comments

Theory Seminar 2007-11-09

Friday November 9th, 2007
WEH 7220
3:30pm

Algebrization: A New Barrier in Complexity Theory

Scott Aaronson (MIT)

Any proof of P!=NP will have to overcome two barriers: relativization and natural proofs. Yet over the last decade, we have seen circuit lower bounds (for example, that PP does not have linear-size circuits) that overcome both barriers simultaneously. So the question arises of whether there is a third barrier to progress on the central questions in complexity theory.

In this talk we present such a barrier, which we call “algebraic relativization” or “algebrization.” The idea is that, when we relativize some complexity class inclusion, we should give the simulating machine access not only to an oracle A, but also to the low-degree extension of A over a finite field or ring.

We systematically go through basic results and open problems in complexity theory to delineate the power of the new algebrization barrier. We first show that all known non-relativizing results — both inclusions such as IP=PSPACE and MIP=NEXP, and separations such as MA-EXP not in P/poly — do indeed algebrize. We next show that most open problems — including P versus NP, P versus BPP, and NEXP versus P/poly — will require non-algebrizing techniques, of which we currently have not a single example. In some cases algebrization seems to explain exactly why progress stopped where it did: for example, why we have superlinear circuit lower bounds for PromiseMA but not for NP.

We also exhibit a surprising connection between algebrization and communication complexity. Using this connection, we give an MA-protocol for the Inner Product function with O(sqrt(n) log(n)) communication (essentially matching a lower bound of Klauck), and describe a pure communication complexity conjecture whose truth would imply P!=NP.

Joint work with Avi Wigderson.

Comments

Intelligence Seminar 2007-11-06

2007-11-06, 3:30pm in Wean 5409
Projection Pursuit, Gaussian Scale Mixtures, and the EM Algorithm
Sanjoy Dasgupta
University of California - San Diego

The EM algorithm for fitting Gaussian mixture models is one of the most widely-used clustering methods. Yet, surprisingly little is known about its behavior. Under what conditions does it converge to a good local optimum? What are good ways to initialize it, and to merge/remove intermediate clusters? Such questions are difficult to answer with the mathematical tools that have traditionally been applied to EM.

I will describe an alternative way of analyzing EM: by a probabilistic analysis. This reveals, first of all, that many common methods of initializing EM produce highly suboptimal results even in ideal clustering scenarios. On the other hand, I’ll show that a particular variant of EM will provably recover a near-optimal clustering, provided that the clusters are adequately separated and that their distributions are of a certain fairly general form.

The type of cluster distributions allowed is motivated by a new result in projection pursuit, along the lines of the (somewhat mistaken) folklore that “almost all projections of high-dimensional data look Gaussian”.

Specifically, I’ll show that for any D-dimensional distribution with finite second moments, there is a precise sense in which almost all of its linear projections into d < D dimensions look like a scale-mixture of spherical Gaussians (concentric spherical Gaussians with the same center).

The extent of this effect depends upon the ratio of d to D, and upon the “eccentricity” of the high-dimensional distribution.

Comments

Continuous LaTeX Compilation Using latexmk

Now that we know how to precompile the preamble to cut down the running time of latex, here is a tool that furthers the goal of saving time in the document production process—latexmk.

As its name suggests, latexmk is a “make” utility for LaTeX files. In particular, it understands when a particular document requires varying number of runs to resolve all references, with a bibtex and/or makeindex throw in at the right moment in-between the runs. Recent versions even support multiple bibliographies and the like. In short, assuming no source errors, if you execute latexmk foo.tex, you will get a fully compiled foo.dvi without any manual intervention. (PDF users: add the -pdf option. nomencl(ature) and glossaries package users: see below.)

Now latexmk would be a great piece of software to recommend just for the features I have mentioned above. Indeed, over the years I have seen various Makefile solutions using the make utility, some of which are fairly sophisticated. There are also direct competitors like texify and the like. (See this article in EmacsWiki.) While I find that latexmk fits my workflow in terms of ease of use and customizability, you may also want to check out these other intelligent options too. They all beat the “latex, latex, bibtex, latex, latex” batch script.

But latexmk is better than that—it has support for continuous/automatic compilation: if you run latexmk -pvc main.tex, then whenever it detects that one of the dependent source files of main.tex has been changed—be that a TeX, bib, an included graphics, or any file you declare to be a source, latexmk will automatically recompile it for you. That means you don’t need to start the compilation manually after changes and an oven-fresh dvi/pdf will be delivered to your previewer as soon as it is ready.

Finally, here are some notes on latexmk:

  • You really want the latest version, even though the version number may say “beta”. The script is very mature. Also note that MiKTeX ships with latexmk.exe, which is a wrapper to call Perl on TEXROOT\scripts\latexmk\perl\latexmk.pl
  • , which is the script you want to replace.

  • You can set up your .latexmkrc to override defaults or even define new dependencies. Below I have attached some interesting bits of my .latexmkrc just for an example. For instance, you can avoid launching a previewer even though you are using the -pvc option, and you can support the nomencl package and the like by declaring a custom dependency. And since this is a Perl script, you can use any Perl constructs. You can read the documentation for more possibilities.

    $latex = 'latex -src-specials -parse-first-line -c-style-errors';
    $pdflatex = 'pdf' . $latex;
    $dvi_previewer = 'exit'; # don't start a previewer for me
    @default_files = ('main');
    $clean_ext = # space separated string
    join(' ', qw( fmt acn acr alg gls glo glg ist nls nlo nlg brf out pdfsync rel));
    push @cus_dep_list, "nlo nls 0 nlo2nls"; # nomenclature
    sub nlo2nls { system("makeindex $_[0].nlo -s nomencl.ist -o $_[0].nls -t $_[0].nlg"); }
    push @cus_dep_list, "acn acr 0 acn2acr"; # glossaries and acronym hack
    sub acn2acr { system("makeindex $_[0].acn -s main.ist -o $_[0].acr -t $_[0].alg"); }
    push @cus_dep_list, "glo gls 0 glo2gls"; #MiKTeX 2.6 has a broken makeglossaries.exe
    sub glo2gls { system("makeindex $_[0].glo -s main.ist -o $_[0].gls -t $_[0].glg"); }

  • The -silent option suppresses the output of latex and show you only the errors in a succinct format. Here is an example transcript when I run latexmk -silent foo.tex with two planted errors.

    No specific requests made, so default to dvi by latex
    Latexmk: Run number 1 of rule 'latex'
    Latexmk: Running 'latex -src-specials -parse-first-line -c-style-errors -interaction=batchmode -c-style-errors "foo.tex"'
    This is pdfTeX, Version 3.141592-1.40.4 (MiKTeX 2.6)
    entering extended mode
    foo.tex:3: Undefined control sequence
    foo.tex:4: Undefined control sequence
    Latexmk: Use the -f option to force complete processing.

Have fun with latexmk!

P.S. I should note one fine point: if you have the habit of saving your file after every single sentence, then perhaps continuous compilation can be a bit counterproductive. Emacs users can consider tuning their auto-save-timeout and auto-save-interval instead.

Comments

OR Seminar 2007-11-09

Name: Osman Oguz
University: Bilkent University
Date: Friday, November 9, 2007
Time: 1:30 to 3:00 pm
Location: 388 Posner Hall

Title: Cardinality Cuts: New Universal Cutting Planes for Integer Programming

Abstract:
We present new valid inequalities for 0-1 programming problems that work in similar ways to well known cover inequalities. The differences exist in three aspects. The first is in the generation of the inequalities. The method used in generation of the new cuts involves solving linear programs only. Second difference is the more general applicability, i.e., being useful for problems like TSP and general integer programming problems. The third aspect is superior effectiveness as indicated by the computational experiments.

Comments

Theory Lunch 2007-10-07

November 7, 2007
12:00 PM, 1507 Newell-Simon Hall

Virginia Vassilevska

Title: Nondecreasing Paths in a Weighted Graph or: How to Optimally Read a Train Schedule

Abstract:

A travel booking office has timetables giving arrival and departure times for all scheduled trains, including their origins and destinations. A customer presents a starting city and demands a route with perhaps several train connections taking him to his destination as early as possible. The booking office must find the best route for its customers. This problem was first considered in the theory of algorithms in 1958 by George Minty, who reduced it to a problem on directed edge-weighted graphs: find a path from a given source to a given target such that the consecutive weights on the path are nondecreasing and the last weight on the path is minimized. Minty gave the first algorithm for the single source version of the problem, in which one finds minimum last weight nondecreasing paths from the source to every other vertex. In this talk we present the first linear time algorithm for this problem. The algorithm uses some nice data structures and is surprisingly simple and elegant.

Comments

· « Previous entries