George Mason University

Department of Computer Science

CS 674: Data Mining on Multimedia Data

Spring 2017

Professor Jessica Lin



 News & Announcements
1/19: Welcome to class!
1/31: If you registered late, and you haven't received an invitation to join the class Piazza account, please send me an email. The password for the website will be sent out via Piazza.
2/10: HW1 posted. Due date 2/22. The datasets will be posted later during the weekend.
3/1: HW2 posted. Due date 3/15 at 11:59pm.
3/1: Please note the new midterm date (3/29).
3/5: Project guidelines posted. Proposal due 3/19. Final project due 5/10.
3/29: HW3 posted. Due date 4/12.
4/14: HW4 posted. Due date 4/26.


Course Description

This course covers advanced techniques for managing, searching, and mining of various types of data such as text, web, images, time series, video, and audio. Issues related to handling such data will be discussed, including feature selection, high dimensional indexing, interactive search and information retrieval, pattern discovery, and scalability.

Class Time and Location

Wednesday, 4:30-7:10 pm
Innovation Hall 206

Instructor

Dr. Jessica Lin
Office: Engineering Building 4419
Phone: 703-993-4693
Email: jessica [AT] cs [DOT] gmu [DOT] edu
Office Hours: Wednesday & Thursday 3-4pm

Prerequisites
          B- or higher in CS584 or equivalent data mining course, or permission of instructor. Some programming skills required.
Grading

Assignments/in-class participation: 30%
Project: 25%
Midterm: 25%
Take-home final: 20%

Exams

There will be one midterm exam and one take-home final exam covering lectures and readings. The midterm exam must be taken at the scheduled time and place, unless prior arrangement has been made with the instructor. Missed exam cannot be made up.

Textbooks

  1. (Required) You will be given reading materials during the class.
  2. (Recommended)  Data Mining: Concepts and Techniques, 2nd Edition, Morgan Kauffmann Publishers, March 2006. ISBN 1-55860-901-6.

Honor Code Statement

The GMU Honor Code is in effect at all times. In addition, the CS Department has further honor code policies regarding programming projects, which are detailed here. Any deviation from the GMU or the CS department Honor Code is considered an Honor Code violation. All assignments for this class are individual unless otherwise specified.

Learning Disability Accommodation

If you have a documented learning disability or other condition which may affect academic performance, make sure this documentation is on file with the Office of Disability Services and then discuss with the professor about accommodations.


Tentative Schedule

Week
Date
Topic
Slide
Reading
Due
Note
1
1/25
Introduction
Time Series Mining
Introduction
Time Series 1
1, 2


2
2/1
Time Series Mining
Time Series 2 3, 4


3
2/8
Time Series Mining
Time Series 3


4
2/15
Time Series Mining
Time Series 4 5, 6


5
2/22
Time Series Mining
Time Series 5 7, 8
HW1

6
3/1
Text Mining
Text 1
9 (Ch. 1, 4, 5)


7
3/8
Text Mining
Text 2
10 (Ch. 13, 14)


8
3/15
Spring Break


HW2 (due 3/15) Project proposal (due 3/19)

9
3/22
Web Mining
Web



10
3/29
Midterm



11
4/5
Sentiment Analysis




12
4/12
Social Media Mining


HW3
13
4/19
Social Media Mining




14
4/26
Image Mining


HW4

15
5/3
Project presentation




16
5/10
Project presentation


Final project due



Reading

  1. Ratanamahatana, C. A., Lin, J., Gunopulos, D., Keogh, E., Vlachos, M. and Das, G. 2010. Mining time series data. Data Mining and Knowledge Discovery Handbook 2010, 2nd Edition. Eds. Oded Maimon, Lior Rokach. Springer. Pages 1049-1077.

  2. Ratanamahatana, C. A. and Keogh. E. (2004). Everything you know about Dynamic Time Warping is Wrong. Third Workshop on Mining Temporal and Sequential Data, in conjunction with the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2004), August 22-25, 2004 - Seattle, WA.

  3. Thanawin Rakthanmanon, Bilson Campana, Abdullah Mueen, Gustavo Batista, Brandon Westover, Qiang Zhu, Jesin Zakaria, Eamonn Keogh (2012). Searching and Mining Trillions of Time Series Subsequences under Dynamic Time Warping. SIGKDD 2012.

  4. Jessica Lin, Eamonn Keogh, Li Wei, and Stefano Lonardi. 2007. Experiencing SAX: a novel symbolic representation of time series. Data Mining and Knowledge Discovery, 15(2): 107-144

  5. Thanawin Rakthanmanon and Eamonn Keogh. Fast-Shapelets: A Scalable Algorithm for Discovering Time Series Shapelets. SDM 2013

  6. Anthony Bagnall, Aaron Bostrom, James Large, and Jason Lines. 2016. The Great Time Series Classification Bake Off: An Experimental Evaluation of Recently Proposed Algorithms. Extended Version. The Computing Research Repository (CoRR).

  7. Abdullah Mueen, Eamonn Keogh, Qiang Zhu, Sydney Cash, Brandon Westover (2009). Exact Discovery of Time Series Motifs. In the Proceedings of SIAM International Conference on Data Mining, pp. 473-484, SDM 2009.

  8. Eamonn Keogh, Jessica Lin, Sang-Hee Lee, and Helga Van Herle. 2006. Finding the most unusual time series subsequence: algorithms and applications. Knowledge and Information Systems, 11(1): 1-27.

  9. Aggarwal, Charu C., Zhai, ChengXiang (Eds.) Mining Text Data (can be downloaded for free on GMU campus). Springer.

  10. Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Introduction to Information Retrieval, Cambridge University Press. 2009. Springer.