Spring 2009 CS-795: Ensemble Based Systems in Decision Making [INFS-795, IT-803]
-
Prerequisites: INFS-755, or equivalent knowledge.
Some programming experience is expected.
Students should be familiar with
basic probability and statistics concepts, linear algebra, optimization, and multivariate
calculus.
-
General Description and Preliminary List of Topics:
This course is about combining the "opinions" of an ensemble of experts (e.g., classifiers) with the objective of
computing a new emerging "opinion" that is better than the individual ones.
The task of improving classification accuracy by learning ensembles of classifiers is considered
as one of the most important directions in machine learning research.
Recent empirical work has shown that combining predictors can lead to significant reductions in
generalization error.
We will discuss popular ensemble
methods such as bagging, boosting, and AdaBoost. We will study the conditions under which combining multiple
experts is beneficial; how to construct the individual components; how to select a subset of "good" experts
from a large pool of possible components; and how to generate a consensus response from those of the individual
members. We will consider ensembles in which the components are supervised learners, unsupervised ones, or
learners with constraints. Challenges in each scenario will be discussed.
-
Course Format:
Material from books and research papers published in major conferences and journals will be
studied in depth. The course will include lectures by the instructor,
presentations by students, and discussions. Students are required to study
the material covered in class. No textbook is required. Research papers, and handouts will
be made available.
Grading will be based on homework assignments,
presentations, and a project. Homeworks will require
some programming.
The actual format of the course will ultimately depend on the number of
students enrolled.
Schedule of Classes
We meet in Innovation Hall, Rm. 136, M 7:20pm - 10:00pm
-
List of papers (UNDER CONSTRUCTION)
Ensembles of Classifiers
-
Clustering Ensembles
-
R. Caruana, M. Elhawary, N. Nguyen, and C. Smith,
Meta Clustering, International Conference on Data Mining, pp. 107-118, 2006.
-
Ensembles and Semi-supervised learning
-
Sources
UNDER CONSTRUCTION
-
UCI Machine Learning Repository is a repository of databases, domain theories and data generators that are used by the machine learning community for the empirical analysis of machine learning algorithms.
A beta version of a new and improved site is also available
-
UCI Knowledge Discovery in Databases Archive is an online repository of large data sets which encompasses a wide variety of data types, analysis tasks, and application areas
-
Weka is an open source Java package implementing many learning algorithms
-
YALE (Yet Another Learning Environment) is another open source Java package. It includes a GUI which allows automation of the whole data process from feature normalization to feature selection, learning and cross-validation
-
SVM light and
LibSVM
are two popular implementations of various SVM algorithms
-
TMG is a Matlab Toolbox that can be used for various tasks in text mining