Archive for May, 2007

The Microtype Package

The short of this post is: if you use a recent LaTeX distribution, then by simply adding \usepackage{microtype} to the preamble, your document will look subtly nicer and the number of overfull/underfull hboxes will go down. In two-column publications, this package is really wonderful.

The tricks implemented in this package are called “microtypography”, which refers to techniques that enhance typography at a very small scale. (Tweaking \baselinestretch doesn’t count. :P )

One such technique is character protrusion—the ability to slightly extend certain characters, like punctuations, into the right margin. When used properly, the right margin will actually look “more justified” than justified. (The example above protrudes the hyphens completely. See below for a more moderate protrusion example that demonstrates the “more justified” look.)

Another technique is font expansion—the ability to dynamically tweak the width of the fonts within a line very slightly. The effect of this technique is more even density across a page due to better line breaking, but it is by design meant to be virtually imperceptible. (You can see it in before-after comparisons below.)

Aesthetics aside, these two techniques can cut down on the number of bad hboxes because they give extra flexibility to the line-breaking algorithm in TeX. Their effectiveness clearly varies from document to document, but the extra elbow room cannot hurt and so you may as well use it in every document. I especially like it in two-column publications based on my experiments.

A whole thesis has been written about the techniques that lead to this package, and the package also comes with an extensive manual that describes other microtypographic techniques implemented by the package. I must note that if you use a PDF viewer that supports PDF 1.5 (current Foxit and Sumatra fans may need Adobe Reader), you can even interact with a “live demo” of character protrusion and font expansion on page 4 of the manual. This is a highly recommended exercise. The manual itself is, of course, typeset with both features turned on as an example of the package, and so at least you get to see character protrusion in action if you look for it.

Finally, system requirements: first of all, the package is smart and simply adding \usepackage{microtype} will “just work”. But you really want a recent pdfTeX backend, and you can upgrade to version 1.4 to reap all the benefits in your PDFs. Note that pdfTeX can also generate DVIs. I know, it sounds weird, but for instance, recent versions of MiKTeX actually generate DVIs via pdfTeX. (microtype was a big part of the reason I upgraded from 2.4, and that led me to undergo the conversion from Yap to dviout.) Font expansion is not available in DVI though, but character protrusion does work.

Comments

The University of TED

I can’t find enough superlatives for the TED conferences. Great talks, great speakers! I listened to literally all the posted talks and keep going back to check for new ones. I start to feel like an undergrad in the “University of TED”, and I am not skipping a single class! :P

Really, check it out!

P.S. With so many good talks, it’s actually hard to make a recommendation. But assuming that we all care about education, this talk by Sir Ken Robinson on Do schools kill creativity? should get you hooked.

Comments

Theory Lunch 2007-05-23

SPEAKER: Karl Wimmer
TIME: Wednesday 12-1pm, May 23, 2007
PLACE: NSH 1507
TITLE: Approximation by DNF: Examples and Counterexamples

ABSTRACT:

Say that f : {0,1}^n -> {0,1} eps-approximates g : {0,1}^n -> {0,1} if the functions disagree on at most an \eps fraction of points. This talk will survey two results about approximation by DNF and other small-depth circuits:

(1) For every constant 0 < eps < 1/2 there is a DNF of size 2^{O(sqrt{n})} that \eps-approximates the Majority function on n bits, and this is optimal up to the constant in the exponent.

(2) There is a monotone function F : {0,1}^n -> {0,1} with total influence (AKA average sensitivity) I(F) <= O(log n) such that any DNF or CNF that .01-approximates F requires size 2^{Omega(n / log n)} and such that *any* unbounded fan-in AND-OR-NOT circuit that .01-approximates F requires size Omega(n / log n). This disproves a conjecture of Benjamini, Kalai, and Schramm (appearing in [BKS99,Kal00,KS05]).

This is joint work with Ryan O’Donnell.

Comments

Debugging Bad \hboxes with AUCTeX

When you use AUCTeX to compile a LaTeX document that contains an error, the compilation will fail and AUCTeX will remind you in the minibuffer area: LaTeX errors in `*foo output*'. Use C-c ` to display. By pressing C-c `, which invokes the next-error function, you can navigate to the source locations of the errors and fix them one by one. It surely is one of the most convenient features of AUCTeX.

But AUCTeX also has a lesser known feature in this regard: by pressing C-c C-w, which by default binds to TeX-toggle-debug-boxes, you can tell AUCTeX whether the debugger should consider underfull/overfull \hboxes as errors too. When preparing for the final(*) revision of a document, this feature really makes locating the bad \hboxes a breeze!

(What’s remaining are of course the \vboxes, but there doesn’t seem to be any good way to locate them using the output messages alone.)

(*) It is usually not a very productive idea to fix bad boxes before the document content is finalized.

Comments

Theory Lunch 2007-05-16

SPEAKER: Aaron Roth.
TIME: Wednesday 12-1pm, May 16, 2007.
PLACE: NSH 1507
TITLE: Selfishness without Nash: Two Alternatives to Price of Anarchy

ABSTRACT:

Price of Anarchy aims to measure the cost to a system when decisions are made by local, selfish agents, rather than mandated by a central authority. How do selfish agents behave? Traditionally, it is assumed that they play according to a Nash equilibrium — but this is unrealistic, because in general, Nash equilibria can be both hard to compute and to coordinate. We study a much weaker and more realistic assumption about selfish behavior, introducing the term “Price of Total Anarchy” and show that in several broad classes of games, these weaker assumptions are enough to derive guarantees for the price of total anarchy that match the price of anarchy exactly, even though play provably does not necessarily converge to Nash equilibria. We do not assume that agents play using any particular algorithm, or that they have more than local information about the game state. We also show that our guarantees are resilient to the presence of Byzantine players — additional agents about whom we make no assumptions, who may be nonrational or adversarial. When studying -anarchy-, it seems essential to account for players who do not behave according to any fixed set of assumptions, but this has been largely ignored up until now. We also survey a recent independent generalization of price of anarchy by Goemans, Mirrokni, and Vetta, the “price of sinking”, and use Valid Games (which have been analyzed to get tight bounds for their price of anarchy, price of total anarchy, and price of sinking) as a lens through which to observe the differences between models.

Joint work with: Avrim Blum, Katrina Ligett, Mohammad Taghi Hajiaghayi

Comments

Godel Prize 2007 Announced

http://www.eatcs.org/activities/awards/goedel2007.html

The 2007 Gödel Prize for outstanding journal articles in theoretical computer science is awarded to:
Alexander A. Razborov and Steven Rudich

for their paper
“Natural Proofs”, Journal of Computer and System Sciences, Vol. 55, No. 1, 1997, pp. 24-35.

It was first presented at the Twenty-sixth Annual ACM Symposium on Theory of computing, Montreal, Quebec, Canada. 1994, pp. 204 - 213.

I note that Natural Proofs have been written a couple times before on the blogosphere, and I further note that the prize must have been announced at least a week ago from seeing a link into the latter post. Ha!

Comments

OR Seminar 2007-05-11

Name: R. Ravi
University: Tepper School of Business, Carnegie Mellon University
Date: Friday, May 11, 2007
Time: 1:30 to 3:00 pm
Location: Cooper Auditorium

Title: Iterative Relaxations

Abstract:
Inspired by Kamal Jain’s iterative rounding method for designing approximation algorithms, Singh and Lau have recently designed a variant that has been instrumental in designing the best-possible approximation algorithm for degree bounded minimum-cost spanning trees in undirected graphs. In this talk, I will illustrate its application to the directed counterpart of the problem, by providing an alternate proof of Edmond’s theorem characterizing the directed spanning tree polyhedron, and adapting it to give a bicriteria approximation algorithm. This work is in collaboration with Singh, based on ideas from his work with Lau.

The talk will serve as a preview to a new course to be taught in the Fall 07 (47-853 Special Topics in Combinatorial Optimization, Mondays and Wednesdays 3.30-5.30 Room 318 GSIA, during Mini 1: August 27 - October 15, 2007) by Singh and myself. In this class, we will attempt to re-derive alternate proofs of well known linear characterizations of various famous integral polyhedra using this iterative relaxation technique, which can be considered a primal rounding method.

Comments

Thesis Proposal 2007-05-15

May 15, 2007
Maria-Florina Balcan
10:00 AM, 4623 Wean Hall

Thesis Proposal

Title: New Theoretical Frameworks for Machine Learning

Abstract:

This thesis develops and analyzes new theoretical frameworks for emerging paradigms of Machine Learning including Semi-supervised, Active, and Similarity-based Learning. These are areas of significant practical importance and activity in Machine Learning, yet standard Learning Theory frameworks such as PAC or Statistical Learning Theory models do not capture very well the key issues involved. In addition to developing new models and algorithms for these directions, this thesis also presents new applications of techniques from Machine Learning Theory to emerging areas of Computer Science at large, such as Auction and Mechanism Design.

In Machine Learning, there has been growing interest in using unlabeled data together with labeled data due to the availability of large amounts of unlabeled data in many applications, and a number of algorithmic approaches to this have been developed. However, the underlying assumptions of these methods are often quite distinct and not captured by standard theoretical models. This thesis introduces an extension of the PAC model to fit Semi-supervised Learning that can be used to model and analyze most of the different approaches taken so far. This includes issues such as “Under what conditions will unlabeled data help and by how much?” and “How much data should I expect to need in order to perform well?”. In the context of Kernel methods, this thesis shows how random projection techniques can be used to convert a given kernel function into an explicit, distribution dependent set of features, which can then be fed into more general (not necessarily kernelizable) learning algorithms. In addition, this work shows how such methods can be extended to more general pairwise similarity functions. This provides a formal theory that matches the standard intuition that a good kernel function is one that acts as a good measure of similarity. Another direction in the thesis involves developing algorithms for Active Learning. In particular, this dissertation includes the first Active Learning algorithm that works in the presence of arbitrary forms of noise, as well as margin based Active Learning algorithms.

This dissertation also develops new connections between Machine Learning and Mechanism Design. Specifically, this thesis presents the first general framework in which machine learning methods can be used for reducing mechanism design problems to standard algorithmic questions for a wide range of revenue maximization problems in an unlimited supply setting.

Thesis Committee:
Avrim Blum, Chair
Manuel Blum
Tom Mitchell
Yishay Mansour, Tel Aviv University
Santosh Vempala, Georgia Tech

Comments

Theory Lunch 2007-05-09

SPEAKER: Christine Chung
TIME: Wednesday 12-1pm, May 9, 2007
PLACE: NSH 1507
TITLE: Stochastically stable states in load balancing and atomic congestion games

ABSTRACT:

Price of Anarchy (POA) results assume players in a game will arrive at a Nash equilibrium. Nash equilibrium has been criticized for unrealistically assuming that players are infinitely rational and have common knowledge of rationality. We show that POA upper bounds can apply even when boundedly rational decision-making agents use simple strategy selection rules to arrive at what economists Foster and Young termed “stochastically stable” states. We also show that in some games these stochastically stable states are the Nash equilibrium states that are socially optimal.

Comments

Thesis Proposal 2007-05-11

May 11, 2007
Mugizi Rwebangira
10:00 AM, 4623 Wean Hall

Thesis Proposal

Title: Techniques for Exploiting Unlabeled Data

Abstract:

In many machine learning application domains obtaining labeled data is expensive but obtaining unlabeled data is much cheaper. For this reason there has been growing interest in algorithms that are able to take advantage of unlabeled data. In this proposal we develop several methods for taking advantage of unlabeled data in classification and regression tasks.

Specific contributions include:

-A method for improving the performance of the graph mincut algorithm of Blum and Chawla by taking randomized mincuts. We give theoretical motivation for this approach and we present empirical results showing that randomized mincut tends to outperform the original graph mincut algorithm, especially when the number of labeled examples is very small.

-An algorithm for semi-supervised regression based on manifold regularization using local linear estimators. This is the first extension of local linear regression to the semi-supervised setting. In this proposal we present experimental results on both synthetic and real data and show that this method tends to perform better than methods which only utilize the labeled data.

-An investigation of practical techniques for using the Winnow algorithm (which is not directly kernelizable) together with kernel functions and general similarity functions via unlabeled data. We expect such techniques to be particularly useful when we have a large feature space as well as additional similarity measures that we would like to use together with the original features. This method is also suited to situations where the best performing measure of similarity does not satisfy the properties of a kernel. We present some preliminary experimental results of this approach.

Thesis Committee:
Avrim Blum, Co-Chair
John Lafferty, Co-Chair
William Cohen
Xiaojin (Jerry) Zhu, University of Wisconsin

Comments

Thesis Proposal 2007-05-04

“INTEGER PROGRAMMING, A TECHNOLOGY”
Anureet Saxena

Friday, May 4, 2007
1:30 pm
153 Posner Hall

Mixed Integer Programming addresses the problem of minimizing (or maximizing) a linear function over a set of linear constraints such that some or all of the decision variables are constrained to be integers. Over the last five decades, mixed integer programming has emerged as a formidable optimization technology and has thereby opened research avenues which did not exist before. This thesis examines some of these new research opportunities in the form of four essays.

Read the rest of this entry »

Comments

ACO Seminar 2007-05-03

Title : Efficient Distributed Approximation Algorithms for Minimum Spanning Trees
Speaker : Gopal Pandurangan, Department of Computer Science, Purdue University
When : 4:30-5:30pm May 3rd
Where : PPB 300

ABSTRACT : The Minimum Spanning Tree (MST) problem is one of the most important and commonly occurring primitive in the design and operation of data and communication networks. There are almost optimal distributed algorithms for this problem. However, these algorithms require relatively large amount of communication or time, and are fairly involved; this makes these algorithms impractical for emerging technologies such as peer-to-peer and sensor networks. The focus of this talk is a very simple scheme called the Nearest Neighbor Tree (NNT) scheme for constructing an O(log n)-approximate MST in a weighted graph. We use the scheme to design efficient distributed approximation algorithms for the MST problem under various settings:

(1) A randomized variant of the scheme can be applied to design a communication-efficient distributed approximation algorithm in a complete weighted metric network. Our result shows that the expected communication complexity of our algorithm is significantly better than the best distributed algorithm that finds the MST.

(2) We show that our scheme can be used to construct a fast distributed MST approximation algorithm in arbitrary networks. Our algorithm is existentially optimal (up to polylogarithmic factors) with respect to both time and communication complexity. Significantly, our result also shows that there can be an exponential time gap between exact and approximate MST computation. Another consequence of our result is that an approximate MST in unit-disk graphs (which are popular models of wireless networks) and in random weighted networks can be found in almost optimal time.

(3) Our scheme can also be used in ad hoc wireless networks to construct low-energy spanning trees in a energy-efficient manner. For uniform distribution of nodes, we show that our algorithms give a constant approximation; we also show that the energy needed to construct these approximate spanning trees is within a constant factor of the minimum possible energy that is needed to construct the optimal energy tree. The time and communication complexities of our algorithms are nearly the best possible.

(Joint works with Maleq Khan and V.S. Anil Kumar.)

Comments

Theory Seminar 2007-05-04

Friday May 4th, 2007
WEH 7220
3:30pm

Title: Testing properties of graphs and functions

Balazs Szegedy
University of Toronto

Abstract: We give an analytic approach to property testing and a new characterization for testable graph properties. This is a joint work with Laszlo Lovasz.

Comments

Thesis Oral 2007-05-04

May 4, 2007
Adam Christopher Wierman
11:00 AM, 5409 Wean Hall

Thesis Oral
Title: Scheduling for Today’s Computer Systems: Bridging Theory and Practice

Abstract:

Scheduling is a fundamental technique for improving performance in computer systems. From web servers to routers to operating systems, how the bottleneck device is scheduled has an enormous impact on the performance of the system as a whole. Given the immense literature studying scheduling, it is easy to think that we already understand enough about scheduling. But, modern computer system designs have highlighted a number of disconnects between traditional analytic results and the needs of system designers. In particular, the idealized policies, metrics, and models used by analytic researchers do not match the policies, metrics, and scenarios that appear in real systems.

The goal of this thesis is to take a step towards modernizing the theory of scheduling in order to provide results that apply to today’s computer systems, and thus ease the burden on system designers. To accomplish this goal, we provide new results that help to bridge each of the disconnects mentioned above. We will move beyond the study of idealized policies by introducing a new analytic framework where the focus is on scheduling heuristics and techniques rather than individual policies. By moving beyond the study of individual policies, our results apply to the complex hybrid policies that are often used in practice. For example, our results enable designers to understand how the policies that favor small job sizes are affected by the fact that real systems only have estimates of job sizes. In addition, we move beyond the study of mean response time and provide results characterizing the distribution of response time and the fairness of scheduling policies. These result! s allow us to understand how scheduling affects QoS guarantees and whether favoring small job sizes results in large job sizes being treated unfairly. Finally, we move beyond the simplified models traditionally used in scheduling research and provide results characterizing the effectiveness of scheduling in multiserver systems and when users are interactive. These results allow us to answer questions about the how to design multiserver systems and how to choose a workload generator when evaluating new scheduling designs.

Thesis Committee:
Mor Harchol-Balter, Chair
John Lafferty
Bruce Maggs
Alan Scheller-Wolf
Ward Whitt, Columbia University

Comments