George Mason University
  Department of Computer Science

CS 584 - Theory and Applications of Data Mining

Dr. Jessica Lin

Spring 2016

 

HOME


 News & Announcements
1/18 (edited on 1/27): Welcome to the class! Please make sure that you fulfill the course prerequisites (C or better in CS310 and STAT 344). Students who do not satisfy the prerequisites will be dropped from the class on 1/28 (Thursday). If you don't have the form, please send me an email as soon as possible.
1/27: HW1 (Part 1) posted. The remaining part will be posted next week. Due date: 2/9
2/10: HW2 posted. The due date is 2/23.
2/26: HW3 posted. The due date is 3/11 at 11:59pm, on Blackboard ONLY.
3/7: Midterm study guide posted.
3/7: HW1 solutions posted on Blackboard.
3/21: Project guidelines posted. Proposal due 3/29.
3/31: HW4 posted. The due date is 4/12.
4/22: HW5 posted. The due date is 5/3.
5/4: Final study guide posted.

Course Description

Concepts and techniques in data mining and multidisciplinary applications. Topics include databases; data cleaning and transformation; concept description; association and correlation rules; data classification and predictive modeling; performance analysis and scalability; data mining in advanced database systems, including text, audio, and images; and emerging themes and future challenges.

Instructor:

Dr. Jessica Lin 

Office: Engineering Building 4419
Phone: 703-993-4693
Email: jessica [AT] cs [DOT] gmu [DOT] edu
Office Hours:  Tuesday 2-4pm or by appointment

TA

 Monjura Afrin Rumi
 Email: mrumi [AT] gmu [DOT] edu
 Office Hours: Monday 7:30-8:30pm, Thursday 3:30-4:30pm
 Location: Engineering Building 4456

Classes

Tuesday
4:30-7:10pm
Innovation Hall 206

Prerequisites:

Grade of C or better in CS 310 and STAT 344

Grading

Assignments: 20%
Project: 30%

Midterm: 20%
Final: 30%


Exams

There will be a midterm exam covering lectures and readings (in class, closed book). The final exam is comprehensive. Exams must be taken at the scheduled time and place, unless prior arrangement has been made with the instructor. Missed exams cannot be made up.

Honor Code Statement

The GMU Honor Code is in effect at all times. In addition, the CS Department has further honor code policies regarding programming projects, which are detailed here. Any deviation from the GMU or the CS department Honor Code is considered an Honor Code violation.

All assignments for this class are individual unless otherwise specified.

Learning Disability Accommodations

If you have a documented learning disability or other condition which may affect academic performance, please make sure this documentation is on file with the Office of Disability Services and then discuss with the professor about accommodations.

Textbooks

  Required: Introduction to Data Mining by Pang-Ning Tan, Michael Steinbach, and Vipin Kumar

 Recommended: Data Mining and Analysis by Mohammed Zaki (Here is the online pdf version.)

Topics
 
Ch.1: Introduction
Ch.2: Data
Ch.4: Classification
Ch.5: Classification: Alternative Techniques
Ch.6: Association Analysis: Basic Concepts and Algorithms
Ch.7: Association Analysis: Advanced Concepts
Ch.8: Cluster Analysis: Basic Concepts and Algorithms
Ch.9: Cluster Analysis: Additional Issues and Algorithms
Ch.10: Anomaly Detection


 Tentative Schedule

No Dates Topics Reading
Due
Notes
1 1/19 Introduction      Ch. 1


2 1/26
Class cancelled (snow day)
Ch. 2

HW1-1 posted
3
2/2
Data
Appendices A & B


4 2/9
Data, con't

HW1
HW2 posted
5 2/16
Classification 1
Ch. 4-5

 
6 2/23
Classification 2

HW2
 
7 3/1
Classification 3 (updated 3/1)



8 3/8
Spring Break (no class)

HW3
Blackboard (due on 3/11 at 11:59pm)
9 3/15
Midterm



10 3/22
Post-midterm review / project team discussion


 
11 3/29
Classification 4 / Clustering 1
   Ch. 8-9 Proposal HW4 posted
12 4/5
Clustering 2


 
13 4/12
Association Analysis 1
Ch. 6-7
HW4  
14 4/19
Association Analysis 2
Ch. 10

 HW5 posted
15 4/26
Anomaly Detection
Project Presentation


 
16 5/3
Project Presentation

HW5
 
17
5/10
Final Exam (4:30pm-6:30pm)

Project