Archive for March, 2006

CS50

http://www.cs50.cs.cmu.edu/

This April, from 19 Wed to 22 Sat, will be our department’s celebration of the 50-year anniversary of Computer Science at Carnegie Mellon. Now many of us know that our department is old but probably not that old… so how could that be possible?

Late spring of 2006 will mark the 50th anniversary of computer science research and education at Carnegie Mellon: the University’s first computer, an IBM 650, was delivered to the basement of the business school in July 1956. At that time, Herb Simon recruited Allen Newell and Alan Perlis to help found a center of computer science activity that established Carnegie Mellon as one of the premier institutions for computer science in the world.

I did ask around… very few of us have seen that IBM 650. :P

As noted in the previous post about MillerFest, there are a lot of things going on during that week. Let me show you the full schedule because the short list below will surely miss something.

First, I note that the Language Technology Institute and the Human Computer Interaction Institute will both be celebrating their anniversaries. So there will be a lot of seminars going on. Lots.

There is also another workshop in honor of Ed Clarke. I TA’ed for Ed once and he’s absolutely one of the finest professors to work with. And I can tell you that while Ed did not invent the Internet, he did start the whole field of using model checking in formal verification with his student Allen Emerson back in 1981. You would think that Finite Automata is “easy”. Hmm, I did have that thought before I came to CMU but I started to think again once I met Ed.

There is also the Women@SCS Outreach Roadshow, as well as the always-exciting tour to Entertainment Technology Center. (I learn something new every time I go to ETC. So go to the tour!)

Besides all these, we also have two celebrations for Sharon Burks and Mark Stehlik for their service to the department. To me, Sharon is to our department what the Oracle is to the Matrix. She literally knows everything!

Finally, there is no shortage of luminary speakers for the general CS50 event. To pick two that are more familiar to the Theory community, we have Leslie Valiant and Charles Leiserson. Time to meet my Grandpa.

Oh, and to show you that we are really a CS department, we have a PGP Key Signing Event. :P (I want to give exactly 42 cents to whoever came up with that idea.)

Now you may think that everything happens in the SCS buildings that week. Well, turns out CMU has an annual event called the Spring Carnival, which starts right with CS50. Tons of booths, games, food and rides. All outdoors!

I already ordered extra camera batteries for that weekend. The question is: can I really perform superposition to make myself “present” in all these events, many of which are parallel? Or perhaps the photons may find their way to register their effect on my camera sensor without my being “present” in the room? :P

P.S. To show you how digital photography can make it hard for me to fall asleep at night: with my 2-year-old Nikon D70, each photo will “cost” me about 9MB of storage without counting backup storage. (6MB RAW + 3MB JPG) And if I really upgrade to a Nikon D200, 10+4 MB. Now if you think that harddisk storage is cheap (definitely not as cheap when you do it robustly with RAID and such)… No, I don’t want to think about the time it takes to postprocess the photos. I guess I would keep my D70 for the moment.

Comments

MillerFest 2006

(Maverick’s note: There are just about a million events occurring on campus that weekend during CS50. See my next post.)

You’re Invited to Celebrate Gary Miller’s 60th Birthday at MillerFest 2006!

April 19-20, 2006
Carnegie Mellon University, Pittsburgh, PA

We are celebrating Gary Miller’s 60th birthday this spring with a banquet and workshop. The banquet will be held Wednesday evening, April 19th, at the Pittsburgh Athletic Association, with the workshop occurring on Carnegie Mellon’s campus the following day. The workshop will feature presentations related to Professor Miller’s research achievements, including computational number theory, randomized algorithms, parallel algorithms, and scientific computing.

For more information, please see the workshop web site:
http://www.aladdin.cs.cmu.edu/workshops/millerfest/

** REGISTRATION IS REQUIRED **

There is no fee to attend the workshop if you register by Monday, April 3rd. There is small fee to attend the banquet, which promises to be an enjoyable evening featuring reminiscences about the life and career of Professor Miller.

** HOTEL DEADLINE EXTENDED TO APRIL 3RD **

Out-of-town visitors should make their hotel reservations by Monday, April 3rd,* to take advantage of the pre-negotiated workshop rate of \$105 per night (\$119.70 including taxes) at the Holiday Inn near campus. To make your reservations, call the the Holiday Inn (100 Lytton Avenue, Pittsburgh, PA 15213) at 412-682-6200 and be sure to ask for the ALADDIN Workshop rate.

You may also be interested to know that MillerFest is being held in conjunction with CS50, which celebrates fifty years of computer science at Carnegie Mellon. Please see http://www.cs50.cs.cmu.edu/ for a complete schedule of CS50 events.

We look forward to seeing you at MillerFest!

Susan Hrishenko, ALADDIN Center Assistant Director
for MillerFest organizers Guy Blelloch, Alan Frieze, and Shanghua Teng

Comments

Database Seminar 2006-04-18

A relative-error CUR decomposition for matrices and its data applications

Michael W. Mahoney
Yahoo Research

4/18 11am
Wean 8220

Much recent work in the Theoretical Computer Science, Linear Algebra, and Machine Learning has considered matrix decompositions of the following form: given an m-by-n matrix A, decompose it as a product of three matrices, C, U, and R, where C consists of a few (typically a constant number of) columns of A, R consists of a few (typically a constant number of) rows of A, and U is a small carefully constructed matrix that guarantees that the product CUR is “close” to A. Applications of such decompositions include the computation of compressed “sketches” for large matrices in a pass-efficient manner, matrix reconstruction, speeding up kernel-based statistical learning computations, sparsity-preservation in low-rank approximations, and improved interpretability of data analysis methods. In this talk we shall discuss various choices for the matrices C, U, and R that are appropriate in different application domains. The main result will be an algorithm that computes matrices C, U, and R such that the (Frobenius) norm of the error matrix A - CUR is almost minimal. We will also discuss applications of this algorithm (and related heuristics) in the bioinformatics of human genetics, in recommendation system analysis, and in medical imaging.

Comments

Thesis Oral 2006-04-03

Speaker: Abraham Flaxman

Time: Monday 10:30-11:30 AM

Place: Wean 5320

Title: Average-case analysis for combinatorial problems

Abstract:

This thesis considers the average case analysis of algorithms, focusing primarially on NP-hard combinatorial optimization problems. It contains a catalog of distributions frequently used in average-case analysis and collection of mathematical tools that have proven useful in studying these distributions. The bulk of the thesis consists of case-studies in average-case analysis of algorithms, where algorithms for 3-SAT, Subset Sum, Strong Connectivity, Stochastic Minimum Spanning Tree, and Uncapacitated Facility Location are analyzed on random instances.

Comments

Theory Seminar 2006-04-06

When: Thursday, April 6, 10:00 a.m.

Where: 3305 Newell-Simon Hall

MOHAMMAD MAHDIAN, MIT/Microsoft Research

Abstract:

The Internet is probably the most important technological creation of our times. It provides many immensely useful services to the masses for free, including such essentials as web search, web mail, and web portals. These services are expensive to maintain, and depend upon advertisement revenue to remain free. One of the most effective ways to allocate advertisement space on the web is through auctions. In this talk, I will discuss some of the theoretical challenges in the design of online advertisement auctions, and will present some of our recent results addressing these issues. In particular, I will focus on mechanism design for budget-constrained bidders, multi-unit auctions for perishable goods with unknown supply, and dynamics of bid optimization.

Comments

An Idea on Theory Exam Questions

In a discussion with a fellow student about grading, we came up with the following method which can make some Theory exam question with longer proofs easier to grade. I believe that, when used with caution, it can be a good tool to have.

Let me stress the word “caution”, since it is not a good idea to use this method on all problems on all exams.

The basic idea is to give the proof away, but in a scrambled fashion. Suppose the proof can be broken into $k$ “units of thoughts” (each unit is probably one or two statements). We will first append some $n-k$ junk units to the end of the proof, yielding a total of $n$ units. Then, we take a random permutation of the $n$ units and give the list to the students. The answer to such an exam question would then be an ordered sequence of numbers indicating the relevant units in the list.

There are many variations on this idea.

First we need to pick $n$. That clearly should depend on $k$. (Somewhere around $1.5k$ to $2k$ may be a good idea.)

Then we may decide to reveal the value of $k$ or not. (Revealing makes it a lot easier.)

Also, some of the junk units may actually be mathematically wrong but “reasonably sounding” in the context of the proof. We call these bombs. If the answer has a bomb, it automatically gets a zero.

And should we choose to add bombs, then maybe it is also an interesting twist to allow the students to pick out exactly all the bombs. For some classes, being able to pick out exactly all the bombs in a question should really qualify for a full credit for it. (Oh, Number Theory…)

Now that leaves the question of partial credits, which can be a highly subjective matter in the traditional “write-a-proof” format. But in this “pick-out-the-proof” format, it may become easier if the junk units are prepared carefully.

Finally, notice that the permutation needs not be the same for different students. This can be good for rooms with little elbow room. Grading is now a bit harder but should not be too bad.

Comments (5)

Theory Lunch 2006-03-29

Speaker: Mugizi Robert Rwebangira

Time: Wednesday 12-1pm

Place: NSH 1507

Title: A Random-Surfer Web-Graph Model

Abstract:

In this work we provide theoretical and experimental results on a random-surfer model for construction of a random graph. In this model, a new node connects to the existing graph by choosing a start node uniformly at random and then performing a short random walk. We show that in certain formulations, this results in the same distribution as the preferential-attachment random-graph model, and in others we give a direct analysis of power-law distribution of degrees or “virtual degrees'’ of the resulting graphs. We also present experimental results for a number of settings of parameters that we are not able to analyze mathematically.

Comments

OR Seminar 2006-03-31

Speaker: Robert Weisnamtel, University of Magdeburg
Date: Friday, March 31, 2006
Time: 1:30 to 3:00 pm
Place: 388 Posner Hall
Title: A Mixed Integer Linear Programming Approach for the Optimal Synthesis of Chemical Processes

Abstract:

This research is motivated by feasibility questions arising in two different areas of chemical engineering: reactive distillation processes and chromatographic processes. In both situations, there is a need to compute global bounds on the optimal value of a general polynomial mixed integer programming problem. We present our approach to tackle this question. It is based on polyhedral insights on the underlying nonlinear models and makes use of combinatorial substructures that are inherent in optimization problems of this kind.

Comments

Theory Seminar 2006-03-28

When: Tuesday, March 28, 10:00 a.m.

Where: 3305 Newell-Simon Hall

NIR AILON, Princeton University

Abstract:
The information revolution has made massive amounts of data available from a myriad of sources such as biology, finance, e-commerce, advertising and the web. Handling these high-dimensional, often noisy and inconsistent datasets has become an important focus of algorithmic research.

(1) Rank Aggregation deals with combining inconsistent rankings suggested by different voters. This old problem from social choice theory has recently attracted the attention of computer scientists in surprising new contexts such as search engines and information retrieval systems. I will present new, improved algorithms, and discuss related results and follow-up work on cluster aggregation, hierarchical clustering and phylogeny.

(2) Sublinear algorithms compute functions by considering only a small sample of the input. This is especially important when the input is massive. I will talk about new results in online data reconstruction and nearest-neighbor searching in this context.

Time permitting, I will briefly touch upon other topics I am interested in such as self-improving algorithms, computational geometry and complexity.

BIO: Nir Alon is a Ph.D. student in the Department of Computer Science at Princeton University, expecting to graduate this summer. He is working under the supervision of Prof. Bernard Chazelle in the theoretical computer science group. He holds a Bachelor’s degree in Computer Science and a Master’s degree in Mathematics, both from Tel-Aviv University, Israel.

Comments

Creating an Annotated Bibliography

Today I learned a new LaTeX trick. If for whatever reason you want a citation to appear in the bibliography of a document without actually citing it in the document, you can use the command \nocite to force BibTeX to include the item in the bibliography.

Now you may ask why on earth one may want to do this. Wouldn’t such not-exactly-cited citations confuse the readers? Hmm, well, you may be right. But anyway, Google has shown me a perfectly reasonable use of it: you can use this command to make annotated bibliography quite easily.

http://www-math.cudenver.edu/~billups/courses/ma5779/annotated_biblio graphy.html

Comments

Theory Lunch 2006-03-22

Speaker: Jure Leskovec

Time: Wednesday 12-1pm

Place: NSH 1507

Title: Realistic models for graphs over time

Abstract:

I will present our work on modeling temporal graph evolution. First, I will motivate the work by showing some surprising measurements on real data, and then we will see the graph models and their properties.

We studied a wide range of real graphs/networks, and we observed some surprising phenomena. First, graphs densify over time, with the number of edges growing super-linearly in the number of nodes. Second, the average distance between nodes shrinks over time, in contrast to the conventional wisdom that distance should increase slowly as a function of the number of nodes (like O(log n) or O(log(log n)).

Next, we will explore models of graph generation that cause a graph to systematically densify, and to experience a decrease in effective diameter even as its size increases. We propose a graph generator that is mathematically tractable and matches this collection of properties. The main idea is to use a non-standard matrix operation, the Kronecker product, to generate graphs that we refer to as “Kronecker graphs'’. We show that Kronecker graphs naturally obey the properties of static and evolving graphs.

I will conclude with a long list of open questions. :)

This is joint work with Jon Kleinberg and Christos Faloutsos.

Comments

Theory Lunch 2006-03-08

Speaker: Mohammad Taghi Hajiaghayi

Time: Wednesday 12-1pm

Place: NSH 1507

Title: Plane embeddings of planar graph metrics

Abstract:

Embedding metrics into constant-dimensional geometric spaces, such as the Euclidean plane, is relatively poorly understood. Motivated by applications in visualization, ad-hoc networks, and molecular reconstruction, we consider the natural problem of embedding shortest-path metrics of unweighted planar graphs (planar graph metrics) into the Euclidean plane. It is known that, in the special case of shortestpath metrics of trees, embedding into the plane requires $\Omega(pn)$ distortion in the worst case, and surprisingly, this worst-case upper bound provides the best known approximation algorithm for minimizing distortion. We answer an open question posed in this work by proving that some planar graph metrics require $\Omega(n^{2/3})$ distortion in any embedding into the plane, proving the first separation between these two types of graph metrics. We also prove that some planar graph metrics require $\Omega(n)$ distortion in any crossing-free straight-line embedding into the plane, suggesting a separation between low-distortion plane embedding and the well-studied notion of crossing-free straight-line planar drawings. Finally, on the upper-bound side, we prove that all outerplanar graph metrics can be embedded into the plane with $O(pn)$ distortion, generalizing the previous results on trees (both the worst-case bound and the approximation algorithm) and building techniques for handling cycles in plane embeddings of graph metrics.

Comments

Theory Seminar 2006-03-10

Approximating Surfaces with Meshes

Ken Clarkson
Lucent Bell Labs

Friday March 10, 3:30PM
Wean 7220

How hard is it to approximate a smooth surface M with a piecewise-linear mesh? When M is the boundary of a convex body, remarkably tight bounds are known for the smallest Hausdorff distance possible when using a mesh with n simplices. In the case of more general surfaces, much less is understood. I’ll show that the smallest distance, when M is a d-manifold, is O(S/n)^{2/d}, where S is the integral over M of the square root of the Gaussian curvature. (The constant factor here depends only on the dimension.) Also, under some reasonably general conditions on the surface and the mesh, this expression is also a lower bound, up to a constant factor. The upper bound construction distributes the vertices of the mesh in an “epsilon-net”, in a metric based on directional curvature. The lower bound relates the volume of a simplex to its interpolation error.

Comments

Ipe and the Mouse Wheel

These days I use a ThinkPad X41 Tablet primarily and I have not been using a mouse for years. So it was a pleasant surprise when I used Ipe on a desktop one day and notice that the scroll wheel on the mouse actually does something quite useful: it zooms the canvas.

Now the TrackPoint system on ThinkPads actually supports scrolling as well, but it does not work by default in many applications because of some technical issues that you don’t want to know. In any case, to configure your ThinkPad so that its middle scroll button works, insert this line to your Windows\System32\tp4table.dat and then reboot.

*,*,ipe.exe,*,*,*,WheelStd,0,9

Update: Actually all Qt applications don’t have scrolling in my ThinkPad. To add scrolling to all Qt Widgets, add this line instead:

*,*,*,*,*,QWidget,WheelStd,0,9

Comments

Compiling Ipe 6.0pre26 on Windows

Recently I have looked into compiling Ipe on Windows. It’s really quite easy but here is a post on it for the public service anyway. The paths I use below are just examples.

  1. Download the source archive (currently version 6.0 preview 26) and expand it to C:\Checkouts\ipe. At this point there should be an ipe-6.0pre26 subdirectory C:\Checkouts\ipe\ipe-6.0pre26\.
  2. Duplicate this subdirectory into C:\Checkouts\ipe\current\ so that you have two parallel copies of the source tree. They are currently identical but we will change the one in this “current” directory. Below, we will call C:\Checkouts\ipe\current\ “the Ipe directory”. We want to keep ipe-6.0pre26 intact.
  3. While you are at the Ipe website, download the binary distribution too. You will need some files from there later.
  4. Download and install a MinGW distribution that has GCC 3.4.2. I use Dev-C++ since I have it already installed (and I can recommend that), but you can also get the vanilla MinGW instead. The installation must be done prior to the next step.
  5. Download Qt 4.1.1 Open Source edition and install it. During the installation, identify C:\Dev-Cpp\ as the MinGW directory to the installer.
  6. Download zlib 1.2.3 and freetype 2.1.0 from GnuWin32 and install them to C:\GnuWin32\.
  7. Modify Ipe\src\config.pri to reflect the paths below. Following our example, this is how the last section of config.pri should look like:
    win32 {
    DEFINES += WIN32
    ZLIB_INCLUDE = C:/GnuWin32/include
    ZLIB_LIBS = -LC:/GnuWin32/lib
    FREETYPE_INCLUDE = C:/GnuWin32/include/freetype2
    FREETYPE_INCLUDE += C:/GnuWin32/include
    FREETYPE_LIBS = -LC:/GnuWin32/lib
    }
  8. Turns out the current source distribution left out two small files accidentally. Get them here and put them in Ipe\src\ipe\.
  9. If you have Cygwin installed, find a way get rid of sh.exe on your path, say by renaming it. The goal is to make sure that the Cygwin shell does not get executed when you type sh in the command prompt. This is important. (Or if you know a way to force the make in MinGW32 to prefer cmd.exe to sh.exe, I would love to know.)
  10. Change directory to Ipe\src\.
  11. Execute qmake. This step takes very little time.
  12. Execute make. This can take a couple minutes.
  13. After the compilation has finished, you should have a subdirectory called Ipe\build\.
  14. From the binary distribution, copy the data\ subdirectory to Ipe\build\data\.
  15. For good measure, you should also copy the DLLs that are depended upon by your current build in Ipe\build\bin\. They are freetype6.dll, mingwm10.dll, QtCore4.dll, QtGui4.dll, zlib1.dll.
  16. This is part of the directory structure and I think it may help clarifying what happened above.

    C:\Checkouts\ipe\
    C:\Checkouts\ipe\ipe-6.0pre26\ (intact copy)
    C:\Checkouts\ipe\current\ (our working copy)
    C:\Checkouts\ipe\current\src\config.pri (edit this first)
    C:\Checkouts\ipe\current\src\ipe\ (then add two missing files)
    C:\Checkouts\ipe\current\build\ (made by the build process)
    C:\Checkouts\ipe\current\build\data\ (copied from binary distribution)
  17. At this point, you should be able to run Ipe\build\bin\ipe.exe.

You are done!

Comments

Theory Seminar 2006-03-03

Seth Pettie
Friday, March 3rd
3:30pm
WEH 7220

Low-distortion Graph Spanners

A spanner H of an undirected, unweighted graph G is a subgraph such that the distance between two points w.r.t. H is not too far from their distance in G, where “not too far” is captured by a distortion function f. Specifically, H is an f-spanner if dist_H(u,v) is at most f(dist_G(u,v)). Nearly all work on spanners and related structures considered only multiplicative distortion, and, in general, provided an undesirable tradeoff between the size of H and the coefficient of distortion.

The “holy grail” question in this area is whether there exist arbitrarily sparse spanners whose distortion is additive and constant. It was known that any graph has an additive 2-spanner with O(n^{3/2}) edges. In this talk I’ll present a completely new method for constructing spanners based on an economics-inspired view of the problem. The main result is an additive 6-spanner with O(n^{4/3}) edges. In addition, the technique leads to a spectrum of arbitrarily sparse spanners whose distortion is additive and sublinear in the distance being approximated.

Comments

End-of-Line Conversion in Subversion

One nice feature of Subversion is its automatic end-of-line conversion capability. Regardless of what’s in the repository, you can ensure that files in your local working copy use the native EOL style.

That is, if you have it set up properly. :P

The trick is to use the properties feature of Subversion. For each file, Subversion allows us to attach a list of properties (key := value). Some properties have special meanings and svn:eol-style is the key that we are interested in today.

When the value of its key is set to native, Subversion will ensure that the file is in the native EOL style every time you update the file in your working copy. And when you check in the file, the conversion will happen automatically so that there are no accidents.

But isn’t it too tedious to have to setup this property for every file that you want automatic EOL conversion to apply? Well, this is when the Subversion local config file comes in.

The config file that I ask my fellows to use contains an auto-property section. Basically, Subversion allows us to automatically attach a property to files of any extension. For instance, to have all my TeX files to use native EOL:

*.tex = svn:eol-style=native

Lovely.

P.S. The auto properties are added when you add a file to version control. So, just replacing the config file will not affect the files that are already under version control. In that case, assuming you have the command line client, use

svn propset -R svn:eol-style native *.tex

to set the property on all TeX files (or other appropriate file types) in the current directory and its descendants.

Comments