Archive for December, 2005

Quicksort Alike

Recently I had to “analyse” randomized Quicksort over a dinner table. Well, I wasn’t prepared to do it using only a pair of chopsticks and a bowl. So I “cheated”. :P

Instead of picking a random element and accept it as the pivot, we let the partitioning algorithm test if the random element is a balanced pivot, i.e., it sits between the 25-percentile and the 75-percentile. This test takes exactly linear time. And if not, the algorithm will pick another random element and re-test.

Clearly the probability of picking a balanced pivot is 1/2 and so the partitioning algorithm will finish in an expected constant number of trials. The rest of the proof is easy and using five more minutes I “proved” the expected $O(n \log n)$ bound with surprisingly little hand-waving. I also ate a lot of garlic bok choy in the process.

Now here are some simple questions that arise from this incident:

  • What about finding the median as the pivot? (And we have a choice of randomized and deterministic median algorithms.)
  • What about the concentration? Is this $O(n \log n)$ “with high probability”? (For instance, what is the probability that it takes 42 trials before a balanced pivot is found?)
  • Can we use a similar view to analyze the running time of the unmodified version of Quicksort? (Try to come up with a clustering of the nodes in the recursion tree and analyze the running time spent in each cluster of nodes.)

More homework questions…

Comments

Thesis Oral 2005-12-20

Konstantin Andreev
Approximation algorithms for network design and graph partitioning problems

Time: 10:30 a.m. on Tuesday, December 20th
Location: NSH 3305

Abstract:

Optimizing and efficiently using the bandwidth in a computer system is an important step in trying to improve the system’s performance. This thesis looks at three problems motivated by real world challenges. The over-riding theme in these analytical models is choosing locations and paths to provide service under various constraints and objectives.

The first example is the reliable delivery of live multimedia streams over the Internet. One of the most appealing applications of the Internet is the delivery of high-quality live audio and video streams to the desktop at low cost. The traditional centralized approach to delivering live streams suffers from well known scalability and reliability issues. Therefore the need for a distributed infrastructure consisting of a large number of servers deployed across the Internet arises. The purpose of a streaming overlay network is to transport bits from the encoder to the end user in a manner that alleviates the bottlenecks and takes care of connection losses. The overlay network studied in my thesis consists of three types of components, each globally distributed across the Internet. We assume that the loss rate between any pair of nodes in the network is known, and losses between different pairs are independent, but show extensions in which some losses may be correlated. We present a polynomial time approximation algorithm for this problem. The algorithm finds a solution that satisfies capacity and reliability constraints to within a constant factor of optimal, and cost within a logarithmic factor.

The second example is my work on simultaneous source location. At the design level we addressed the problem of choosing a set of sources in a general network in such a way that a given set of demands can be satisfied. We called this problem simultaneous source location. We gave an exact algorithm for trees using a dynamic programming approach. We show how this algorithm can be combined with a result of Harald Raecke to give a solution that has the exact number of nodes, but can exceed edge capacities by at most a polylogarithmic factor. We also investigated this problem for a graph of bounded treewidth. Many graphs arising from natural applications have bounded treewidth and problems that are in general intractable can be solved in polynomial time when restricted to graphs of bounded treewidth. We showed that simultaneous source location is NP-hard even on graphs of bounded treewidth. However we showed that one can design an efficient approximation algorithm. We gave a polynomial time approximation scheme (PTAS) with at most a $1+\epsilon$ violation of capacities, where can be made arbitrary small. We also showed a PTAS with exact capacities and $\tau+1$-approximation on the number of sources, where $\tau$ is the treewidth of the graph.

The third problem I considered is a question that is related to the management of communication links and arises in areas like parallel computing and chip design. The definition of the $k$-balanced graph partitioning problem is to divide the vertices of a graph into $k$ almost equal size components so that the capacity of edges between different components is minimized. This problem is a natural generalization of several other problems such as minimum bisection, which is the (2,1)-balanced partitioning problem. We present a bicriteria polynomial time approximation algorithm with an $O(\log^{3/2} n)$-approximation. We also show a hardness of approximation result.

Comments

Theory Lunch 2005-12-14

Speaker: Daniel Golovin

Time: Wednesday 12-1pm

LOCATION: NSH 1305

Title: Strongly History Independent Hashing

Abstract:
We present the first strongly history independent (SHI) hash table that is fast, space efficient, and supports deletions. A hash table that supports deletion is SHI if it has a canonical memory representation up to randomness. That is, the string of random bits and current hash table contents (the set of (key, object) pairs in the hash table) uniquely determine its layout in memory, independently of the sequence of operations from initialization to the current state. Thus, the memory representation of a SHI hash table reveals exactly the information available through the hash table interface, and nothing more. Our SHI hash table stores n objects of equal size in O(n) slots of space, where a slot is enough space to store a (key, object) pair, and executes insertions, deletions, and searches in expected O(1) time.

Other Information: This is joint work with Guy Blelloch. This talk is intended for a general CS audience.

Comments

AI Seminar 2005-12-13

WHEN: 3:30pm Tue., Dec. 13, 2005

WHERE: Wean 5409, Carnegie Mellon University

WHO: Dr. C. Lee Giles, The Pennsylvania State University, University Park, PA

TITLE: Next Generation CiteSeer

ABSTRACT: CiteSeer, a public online computer and information science search engine and digital library, was a radical departure from the traditional methods of academic and scientific document access and analysis. CiteSeer, now hosted at Penn State, has over 700,000 documents and has become one of the most popular academic document search engines in science. The current CiteSeer model, with some difficulty, is also portable and was recently extended to academic business documents (SMEALSearch). CiteSeer is based on these features: actively acquiring new documents, automatic citation indexing, and automatic linking of citations and documents. The new Google Scholar does similar citation indexing and linking. Why has CiteSeer been so popular and how should it progress? We discuss this and the Next Generation CiteSeer project, which will emphasize CiteSeer as a research tool, research service and researcher facilitator. It will explore new intelligent algorithms for providing improved and new indexes, enhanced document access, expanded and automatic document gathering, collaboratories, new data and metadata resources, active mirroring, and web services. As example, we discuss our new work on automatic acknowledgement indexing, which provides insight into the impact of acknowledged individuals and funding agencies.

URL: http://clgiles.ist.psu.edu/

BIO: Since 2000 Dr. C. Lee Giles has been the David Reese Professor at the School of Information Sciences and Technology. He is also Professor of Computer Science and Engineering, Professor of Supply Chain and Information Systems, Director of the Intelligent Systems Research Laboratory, and Associate Director of Research at the eBusiness Research Center at the Pennsylvania State University, University Park, PA. He has been associated with Princeton University, the University of Pennsylvania, Columbia University, the University of Pisa and the University of Maryland.

FACULTY HOST: William Cohen

Comments

Student Seminar Series 2005-12-09

Title: Extreme Value Theory and the Max k-Armed Bandit Problem

Speaker: Matt Streeter, CSD

Location: NSH 1507
Time: 12-1 pm
Date: Friday, December 9th, 2005

Abstract:

The National Climatic Data Center has recorded the monthly precipitation in Pittsburgh since 1895. Given this data, what can we say about the probability that the precipitation this January will exceed some high threshold, say 10 inches? (The current record of 8.2 inches was set in 1937.) Is the question even well-posed?

Extreme value theory attempts to make meaningful statements about the probability of as-yet-unprecedented events. A surprising result is that, under fairly mild statistical assumptions, the probability is actually well-defined. Specifically, under certain regularity conditions, the maximum of n independent and identically distributed random variables must asymptotically follow one of three “generalized extreme value” (GEV) distributions. Estimating the parameters of the GEV distribution allows one to draw inferences about its upper tail, including portions of the tail for which there is no observed data.

In the first half of this talk I will give a self-contained introduction to extreme value theory, including a sketch of the proof of its main theorem. I will then present the “max k-armed bandit problem”, a previously-studied problem that is motivated by extreme value theory and has applications to combinatorial optimization. The problem asks: given k “payoff” distributions, each a GEV distribution with unknown parameters, how can we sample the distributions so as to maximize the expected maximum payoff obtained after n samples? I will outline an asymptotically optimal algorithm for the problem. Specifically, I will describe a strategy that obtains expected maximum payoff within o(1) of that obtained by a strategy
that plays optimally knowing the true parameters of each distribution.

Joint work with Steve Smith.

Comments

Theory Lunch 2005-12-07

Speaker: Hubert Chan

Time: Wednesday 12-1pm

LOCATION: NSH 3305

Title: Sparse Spanners for Doubling Metrics

Abstract:

Graph spanners have applications in communication networks, distributed computing and computational geometry. We motivate our problem by considering the following application in wireless networks. Suppose we have a set of nodes representing wireless access points. Data packets can be sent from one node to another directly if the two nodes are directly connected via a dedicated frequency. Otherwise, data packets have to travel through intermediate nodes from the source to the destination. Every pair of nodes (x,y) has some distance d(x,y), which represents the time to send a data packet between the nodes x and y if there is a direct connection between them. The goal is to construct a network by assigning a small number of dedicated frequencies for directly connected pairs (which form a sparse spanner) such that for every pair of nodes (u,v), the time to send a data packet between them is at most some small multiplicative factor (typically less than 1.5) of d(u,v). We show that if the distance function obeys some bounded growth property (namely, induces a doubling metric), then it is possible to construct such a network with a linear number of direct connections. The bounded growth property is general enough to include many common distance functions, such as constant dimensional Euclidean distances. We also show that the network has additional nice properties, such as each node having a constant number of connections. Moreover, if we allow the number of connections to be nearly linear $O(n log^* n)$, then for every pair of nodes, there is a short path between them that uses only a constant number of intermediate nodes. This is useful when congestion and latency are taken into account. This is joint work with Anupam Gupta.

Comments

ACO Seminar 2005-12-08

TITLE: Approximating k-MultiCut Problem
SPEAKER: Mohit Singh
WHEN: December 8, 4-5:30pm
WHERE: PPB 300

ABSTRACT: Given an undirected graph, a set of $t$ pairs of vertices, the multi-cut problem is to find the minimum set of edges to remove which separate all pairs. The $k$-multicut problem is a generalization of the multicut problem in which one is given a target $k \leq t$ and asked to remove only $k$ pairs. We use techiniques from combinatorial optimization: Lagrangian relaxation and Total unimodularity, to obtain constant factor approximation for the $k$-multicut problem on trees. This also leads to $O(log^2 n log logn)$-approximation for the $k$-multicut problem on general graphs, where $n$ is the number of vertices in the graph. We also give bicriterion approximation algorithm for the $k$-multicut problem. Our techniques also give a simple $2$-approximation algorithm for the multicut problem on trees matching the best known algorithm of Garg, Vazirani and Yannakakis’94.

Joint Work with Viswanath Nagarajan and Daniel Golovin. To appear in SODA’06.

Comments

OR Seminar 2005-12-09

Friday, December 9, 2005
1:30 p.m. Posner Hall 151

Francois Margot
Associate Professor of Operations Research, Tepper School of Business

Code Construction with ILP

This talk covers on-going work to solve one of the oldest and most famous problems in combinatorial coding theory, the Football Pool Problem. A more technical description of the problem is finding a ternary code of minimum cardinality with covering radius one. The smallest open case, for code words of length 6, still resists all solution attempts, despite a lot of efforts.

The talk covers techniques proposed by others and their integration within several ILP formulations used as preliminary steps in a solution approach. Partial results obtained are promising and improving the best known lower bound on the solution seems possible.

Improving the formulation of the ILPs is an open research problem that has received little attention so far. Some research directions will be mentioned.

The solution of the ILPs (binary and non binary) uses isomorphism pruning techniques to good effect. Yet, extensive computing resources will be required for a complete solution. Code using Condor and MW giving access to large parallel grid computing has been implemented.

Comments

Theory Seminar 2005-12-02

Secure Network Coding via Filtered Secret Sharing
Cliff Stein, Columbia University

December 2nd, 2005
Wean 7220, 3:30PM

Abstract

We study the problem of using a multicast network code to transmit information securely in the presence of a “wire-tap” adversary who can eavesdrop on a bounded number of network edges. We establish a close connection between secure linear network coding and a new variant of the secret sharing problem, which we call “filtered secret sharing.” Using this connection, we establish new trade-offs between security, capacity, and bandwidth of secure linear network coding schemes.

Our positive results show that by giving up a small amount of capacity, it is possible to dramatically reduce the bandwidth requirements of secure linear network coding. Our negative results show that within the framework we consider, unless capacity is relaxed, the bandwidth requirements can be prohibitively high. These results are obtained by showing that the problem of making a linear network code secure is equivalent to the problem of finding a linear error-correcting code with certain generalized distance properties.

Joint work with Jon Feldman, Tal Malkin and Rocco Servedio

Comments

Logic Colloquium 2005-12-01

Towards a machine-checked proof of the enumeration of tame plane graphs

Tobias Nipkow, TU Munich

4:30 PM
Tursday 1 December
Baker Hall A53

Hales’ proof of the Kepler conjecture defines a class of tame plane graphs, enumerates that class by computer and checks (again by computer) that none of these graphs constitute a counterexample to the conjecture. This talk reports on work in progress to verify that Hales’ program for enumerating tame plane graphs is correct, i.e. does not miss any such graph. This is part of Hales’ Flyspeck project to check his whole proof by computer.

After a brief introduction to tame graphs and their role in Hales’ proof, the enumeration procedure is explained, and its correctness proof its sketched. Throughout the talk we concentrate on the issues that arise when checking mathematical arguments by machine, in particular the challenge of writing programs that are both efficient enough to perform complex searches and enumerations but simple enough that machine-checked correctness proofs become feasible.

This work was done while visiting the University of Pittsburgh and builds on the PhD thesis by Gertrud Bauer.

Comments