George Mason University

Department of Computer Science

CS 484: Data Mining

Fall 2019

Professor Jessica Lin


Course Description

Basic principles and methods for data analysis and knowledge discovery. Emphasizes developing basic skills for modeling and prediction and performance evaluation. Topics include system design; data quality, preprocessing, and association; event classification; clustering; biometrics; business intelligence; and mining complex types of data.

Class Time and Location

Tuesday/Thursday, 12-1:15pm
Sandbridge Hall 107

Instructor

Dr. Jessica Lin
Office: Engineering Building 4419
Phone: 703-993-4693
Email: jessica [AT] gmu [DOT] edu
Office Hours: Tuesday/Thursday 10:30-11:30am

Teaching Assistant

Li Zhang
Office Hours: TBA
Office: TBA

Prerequisites
Course Outcomes
Grading

Assignments: 40%
Project: 35%
Exam: 25%
          Extra credit possible (participation, quizzes, competition winners for Homework)
Exam

There will be one exam covering lectures and readings (in class, closed book). The exam must be taken at the scheduled time and place, unless prior arrangement has been made with the instructor. Missed exam cannot be made up.

Project

There will be one team project in the semester. The project grade consists of project proposal (including project pitch, 5% total of overall course grade), presentation (10%), and project report and code (20%).


Textbooks

Required: Introduction to Data Mining by Pang-Ning Tan, Michael Steinbach, and Vipin Kumar (click on the link for the companion website)

Topics
Honor Code Statement

The GMU Honor Code is in effect at all times. In addition, the CS Department has further honor code policies regarding programming projects, which are detailed here. Any deviation from the GMU or the CS department Honor Code is considered an Honor Code violation. All assignments for this class are individual unless otherwise specified.

Learning Disability Accommodation

If you have a documented learning disability or other condition which may affect academic performance, make sure this documentation is on file with the Office of Disability Services and then discuss with the professor about accommodations.

Tentative Schedule

Week
Date
Topic
Assigned
Due
Note / Reading
1
8/27
8/29
Introduction
HW1 background



Ch. 1
2
9/3
9/5
HW1 background
Data


HW0
Ch. 4.3 (KNN), 3.6-3.8
Ch. 2
3
9/10
9/12
Data
Classification



4
9/17
9/19
Classification (Decision trees)
Classification (Decision trees)

HW1
Ch. 3
5
9/24
9/26
Classification (Decision trees)
Classification (Overfitting, Model selection)





6
10/1
10/3
Classification (Naive Bayes Classifier)
Classification (Neural networks)




Ch. 4.4
Ch. 4.7
7
10/8
10/10
Classification (Model selection, bias/variance)
Clustering (k-means)


HW2 (new date)

Ch. 4.10.3, 4.11
Ch. 7.1-7.2
8
10/15
10/17
No Class (Fall Break)
Clustering (hierarchical clustering)




Ch. 7.3
9
10/22
10/24
Project pitch
Recommendation Systems

Project pitch
HW3

10
10/29
10/31
Clustering (hierarchical clustering)
Clustering (DBScan)

Project proposal

11
11/5
11/7
Clustering (evaluation)
Association Rule Mining


HW4 (11/9)
Ch. 7.4
Ch. 5.1 - 5.3
12
11/12
11/14
Review
Exam (tentative)





13
11/19
11/21
Association Rule Mining
Post-exam review


Ch. 5.7
14
11/26
11/28
Anomaly Detection
No Class (Thanksgiving)





15
12/3
12/5
Advanced Topic
TBA: (possibly presentations)



16
12/10
12/12

Presentations


Project