Fall 2017: Theory and Applications of Data Mining [CS584]

General Description and Preliminary List of Topics

Data mining is the process of automatically discovering useful information in large data repositories. The course covers key concepts and algorithms at the core of data mining.

Topics include: classification, clustering, association analysis, anomaly detection.

Grading

Assignments: 15%
Midterm: 25%
Final: 25%
Project: 35%

Exams are closed book. Assignments must be performed individually. Group work is NOT allowed, unless otherwise stated by the instructor. Any deviation from this policy will be considered a violation of the GMU Honor Code In addition, the CS department has its own Honor Code policies. Any deviation from this is also considered an Honor Code violation.

Software and Data (will be extended, come back!):

  • UCI Machine Learning Repository is a repository of databases, domain theories and data generators that are used by the machine learning community for the empirical analysis of machine learning algorithms.
  • UCI Knowledge Discovery in Databases Archive is an online repository of large data sets which encompasses a wide variety of data types, analysis tasks, and application areas
  • More datasets
  • Resources: software and data
  • Weka is an open source Java package implementing many learning algorithms
  • MALLET is a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text.
  • SVM light and LibSVM are two popular implementations of various SVM algorithms