[Lowerbounds, Upperbounds]

Algorithms are everywhere.

May 15, 2007
Maria-Florina Balcan
10:00 AM, 4623 Wean Hall

Thesis Proposal

Title: New Theoretical Frameworks for Machine Learning

Abstract:

This thesis develops and analyzes new theoretical frameworks for emerging paradigms of Machine Learning including Semi-supervised, Active, and Similarity-based Learning. These are areas of significant practical importance and activity in Machine Learning, yet standard Learning Theory frameworks such as PAC or Statistical Learning Theory models do not capture very well the key issues involved. In addition to developing new models and algorithms for these directions, this thesis also presents new applications of techniques from Machine Learning Theory to emerging areas of Computer Science at large, such as Auction and Mechanism Design.

In Machine Learning, there has been growing interest in using unlabeled data together with labeled data due to the availability of large amounts of unlabeled data in many applications, and a number of algorithmic approaches to this have been developed. However, the underlying assumptions of these methods are often quite distinct and not captured by standard theoretical models. This thesis introduces an extension of the PAC model to fit Semi-supervised Learning that can be used to model and analyze most of the different approaches taken so far. This includes issues such as “Under what conditions will unlabeled data help and by how much?” and “How much data should I expect to need in order to perform well?”. In the context of Kernel methods, this thesis shows how random projection techniques can be used to convert a given kernel function into an explicit, distribution dependent set of features, which can then be fed into more general (not necessarily kernelizable) learning algorithms. In addition, this work shows how such methods can be extended to more general pairwise similarity functions. This provides a formal theory that matches the standard intuition that a good kernel function is one that acts as a good measure of similarity. Another direction in the thesis involves developing algorithms for Active Learning. In particular, this dissertation includes the first Active Learning algorithm that works in the presence of arbitrary forms of noise, as well as margin based Active Learning algorithms.

This dissertation also develops new connections between Machine Learning and Mechanism Design. Specifically, this thesis presents the first general framework in which machine learning methods can be used for reducing mechanism design problems to standard algorithmic questions for a wide range of revenue maximization problems in an unlimited supply setting.

Thesis Committee:
Avrim Blum, Chair
Manuel Blum
Tom Mitchell
Yishay Mansour, Tel Aviv University
Santosh Vempala, Georgia Tech