[Lowerbounds, Upperbounds]

Algorithms are everywhere.

WHEN: 3:30pm Tue., Dec. 13, 2005

WHERE: Wean 5409, Carnegie Mellon University

WHO: Dr. C. Lee Giles, The Pennsylvania State University, University Park, PA

TITLE: Next Generation CiteSeer

ABSTRACT: CiteSeer, a public online computer and information science search engine and digital library, was a radical departure from the traditional methods of academic and scientific document access and analysis. CiteSeer, now hosted at Penn State, has over 700,000 documents and has become one of the most popular academic document search engines in science. The current CiteSeer model, with some difficulty, is also portable and was recently extended to academic business documents (SMEALSearch). CiteSeer is based on these features: actively acquiring new documents, automatic citation indexing, and automatic linking of citations and documents. The new Google Scholar does similar citation indexing and linking. Why has CiteSeer been so popular and how should it progress? We discuss this and the Next Generation CiteSeer project, which will emphasize CiteSeer as a research tool, research service and researcher facilitator. It will explore new intelligent algorithms for providing improved and new indexes, enhanced document access, expanded and automatic document gathering, collaboratories, new data and metadata resources, active mirroring, and web services. As example, we discuss our new work on automatic acknowledgement indexing, which provides insight into the impact of acknowledged individuals and funding agencies.

URL: http://clgiles.ist.psu.edu/

BIO: Since 2000 Dr. C. Lee Giles has been the David Reese Professor at the School of Information Sciences and Technology. He is also Professor of Computer Science and Engineering, Professor of Supply Chain and Information Systems, Director of the Intelligent Systems Research Laboratory, and Associate Director of Research at the eBusiness Research Center at the Pennsylvania State University, University Park, PA. He has been associated with Princeton University, the University of Pennsylvania, Columbia University, the University of Pisa and the University of Maryland.

FACULTY HOST: William Cohen

Title: Extreme Value Theory and the Max k-Armed Bandit Problem

Speaker: Matt Streeter, CSD

Location: NSH 1507
Time: 12-1 pm
Date: Friday, December 9th, 2005

Abstract:

The National Climatic Data Center has recorded the monthly precipitation in Pittsburgh since 1895. Given this data, what can we say about the probability that the precipitation this January will exceed some high threshold, say 10 inches? (The current record of 8.2 inches was set in 1937.) Is the question even well-posed?

Extreme value theory attempts to make meaningful statements about the probability of as-yet-unprecedented events. A surprising result is that, under fairly mild statistical assumptions, the probability is actually well-defined. Specifically, under certain regularity conditions, the maximum of n independent and identically distributed random variables must asymptotically follow one of three “generalized extreme value” (GEV) distributions. Estimating the parameters of the GEV distribution allows one to draw inferences about its upper tail, including portions of the tail for which there is no observed data.

In the first half of this talk I will give a self-contained introduction to extreme value theory, including a sketch of the proof of its main theorem. I will then present the “max k-armed bandit problem”, a previously-studied problem that is motivated by extreme value theory and has applications to combinatorial optimization. The problem asks: given k “payoff” distributions, each a GEV distribution with unknown parameters, how can we sample the distributions so as to maximize the expected maximum payoff obtained after n samples? I will outline an asymptotically optimal algorithm for the problem. Specifically, I will describe a strategy that obtains expected maximum payoff within o(1) of that obtained by a strategy
that plays optimally knowing the true parameters of each distribution.

Joint work with Steve Smith.