CS/INFS 795

Special Topics in Data Mining Applications: 

Time Series Data Mining

Dr. Jessica Lin

Spring 2011


 
 

 News & Announcements
For the week of 1/31 only, my office hours will be changed to Wednesday (2/2) 3-4pm, and Thursday (2/3) 4-5pm.

Instructor:

Dr. Jessica Lin 

Office: Engineering Building 4419

Phone: 703-993-4693

Email: jessica [AT] cs [DOT] gmu [DOT] edu

Office Hours: Wednesday 2-4pm

Classes

Thursdays
7:20-10:00pm
Innovation Hall 132

Prerequisite:

CS 750 or equivalent. Some programming skills required for the final project.

Textbook (optional):

Data Mining: Concepts and Techniques, 2nd Edition, Morgan Kauffmann Publishers, March 2006. ISBN 1-55860-901-6.

Course Description:

Time series, or measurements taken over time in its traditional sense, is perhaps the most commonly encountered data type, encompassing almost every human endeavor including medicine, finance, aerospace, industry, science, etc. While time series data present special challenges to researchers due to its unique characteristics, the past decade has seen an explosion in time series data mining. This seminar provides an overview on state of the art research on mining time series data. Topics covered include data representation, similarity search, indexing, clustering, classification, anomaly detection, rule discoery, motif discovery, and visualization. Sequential pattern discovery on discrete, temporal data (web logs, customer transactions, etc). and mining of streaming time series will also be discussed.

Course Format:

The course will include lectures by the instructor, presentations from students, and class discussion. You will be asked to read research papers published in major conferences and/or journals (paper list TBA).

Grading

Grading will be based on participation, assignments, presentation(s), and a final project. You will be using Matlab in this class. Each week you are required to read two papers. Each student will present 1-2 papers in the semester.

 Participation/Attendance/Quizzes: 15%
 Assignments: 20%
 Presentation: 25%
 Project Proposal: 15%
 Project: 25%

Schedule

Assigned papers should be read prior to the class meeting (e.g. read papers #1 for the 2/3 class). 

Weeks Dates Topics Papers Presenter(s)
1 1/27 No Class    
2 2/3 Time Series Similarity Search/Indexing I 1  
3 2/10 Time Series Similarity Search/Indexing II 2, 3  
4 2/17 Symbolic Representation 4, 5  
5 2/24 Classification 6, 7  Sheri (6)
6 3/3 Clustering 8, 9  Paul (8), Muzammil (9)
7 3/10 Subsequence Clustering / Rule Discovery  10, 11  Philip (10), Rohan (11)
8 3/17 Spring Break

 
9 3/24 Motif Discovery (Project Proposal due) 12, 13  Jin-Ming (12), Sheri (13)
10 3/31 Anomaly Detection
14, 15  Raghu (14), Paul (15)
11 4/7 Visualization 16, 17  Stefan (17)
12 4/14 Social Media Analysis 18, 19, 20, 21*, 22*  Carl (18), Muzammil (19),  
 Carl (20)
13 4/21 Trajectory
23, 24
 Stefan (23), Philip (24)
14 4/28 Spatiotemporal
25, 26  Michael (25), Philip (26)
15 5/5 TBA    
165/10Project Presentations 1

165/12Project Presentations 2


Paper List (TBA)

 

1. Chotirat Ann Ratanamahatana, Jessica Lin, Dimitrios Gunopulos, Eamonn J. Keogh, Michail Vlachos, Gautam Das: Mining Time Series Data. Data Mining and Knowledge Discovery Handbook 2010: 1049-1077. 

2. (The first paper on time series mining) Rakesh Agrawal, Christos Faloutsos, and Arun N. Swami. 1993. Efficient Similarity Search In Sequence Databases. In Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms (FODO '93), David B. Lomet (Ed.). Springer-Verlag, London, UK, 69-84

3. Hui Ding, Goce Trajcevski, Peter Scheuermann, Xiaoyue Wang, and Eamonn Keogh. 2008. Querying and mining of time series data: experimental comparison of representations and distance measures. Proc. VLDB Endow. 1, 2 (August 2008), 1542-1552. 

4. Jessica Lin, Eamonn J. Keogh, Li Wei, Stefano Lonardi: Experiencing SAX: a novel symbolic representation of time series. Data Min. Knowl. Discov. 15(2): 107-144 (2007)

5. Alessandro Camerra, Themis Palpanas, Jin Shieh, Eamonn Keogh, "iSAX 2.0: Indexing and Mining One Billion Time Series," icdm, pp.58-67, 2010 IEEE International Conference on Data Mining, 2010

6. Milos Radovanovic, Alexandros Nanopoulos, Mirjana Ivanovic: Time-Series Classification in Many Intrinsic Dimensions. SDM 2010: 677-688

7. Li Wei and Eamonn Keogh. 2006. Semi-supervised time series classification. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD '06). ACM, New York, NY, USA, 748-753.

8. T. Warren Liao. 2005. Clustering of time series data-a survey. Pattern Recogn. 38, 11 (November 2005), 1857-1874.

9. P.P. Rodrigues, J. Gama, and J.P. Pedroso, “ODAC: Hierarchical Clustering of Time Series Data Streams,” Proc. Sixth SIAM Int'l Conf. Data Mining, pp. 499-503, Apr. 2006.

10. Eamonn J. Keogh, Jessica Lin: Clustering of time-series subsequences is meaningless: implications for previous and future research. Knowl. Inf. Syst. 8(2): 154-177 (2005)

11. Dina Goldin, Ricardo Mardales, and George Nagy. 2006. In search of meaning for time series subsequence clustering: matching algorithms based on a new distance measure. In Proceedings of the 15th ACM international conference on Information and knowledge management (CIKM '06). ACM, New York, NY, USA, 347-356.

12. Nuno Castro, Paulo J. Azevedo: Multiresolution Motif Discovery in Time Series. SDM 2010: 665-676

13. Abdullah Mueen and Eamonn Keogh. 2010. Online discovery and maintenance of time series motifs. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD '10)

14. Dan Preston, Pavlos Protopapas, Carla Brodley. Event Discovery in Time Series. SDM 2009.

15. Dragomir Yankov, Eamonn Keogh, and Umaa Rebbapragada. 2008. Disk aware discord discovery: finding unusual time series in terabyte sized datasets. Knowl. Inf. Syst. 17, 2 (November 2008), 241-262.

16. Jessica Lin, Eamonn J. Keogh, Stefano Lonardi, Jeffrey P. Lankford, Donna M. Nystrom: Visually mining and monitoring massive time series. KDD 2004: 460-469

17. Kumar, N.,  Lolla  N.,  Keogh, E.,  Lonardi, S. , Ratanamahatana, C. A. and Wei, L. (2005). Time-series Bitmaps: A Practical Visualization Tool for working with Large Time Series Databases. In proceedings of SIAM International Conference on Data Mining (SDM '05), Newport Beach, CA, April 21-23.

18. Daniel Gruhl, R. Guha, Ravi Kumar, Jasmine Novak, and Andrew Tomkins. 2005. The predictive power of online chatter. In Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining (KDD '05).

19. S. Asur and B. A. Huberman 2010 Predicting the Future with Social Media arXiv:1003.5699v1

20. Nilesh Bansal and Nick Koudas. 2007. BlogScope: a system for online analysis of high volume text streams. In Proceedings of the 33rd international conference on Very large data bases (VLDB '07)

21 (optional). Johan Bollen, Huina Mao, and Xiao-Jun Zeng. Twitter mood predicts the stock market. Journal of Computational Science, 2010

22 (optional). M. Platakis, D. Kotsakos, D. Gunopulos. 2008. Discovering Hot Topics in the Blogosphere. In Proc. of the 2nd Panhellenic Scientific Student Conference on Informatics, Related Technologies and Applications EUREKA 2008, pp. 122--132.

23. Jae-Gil Lee, Jiawei Han, and Xiaolei Li. 2008. Trajectory Outlier Detection: A Partition-and-Detect Framework. In Proceedings of the 2008 IEEE 24th International Conference on Data Engineering (ICDE '08). IEEE Computer Society, Washington, DC, USA, 140-149.

24. Anna Monreale, Fabio Pinelli, Roberto Trasarti, Fosca Giannotti: WhereNext: a location predictor on trajectory pattern mining. KDD 2009: 637-646

25. McGovern, Amy; Rosendahl, Derek H; Brown, Rodger A; and Droegemeier, Kelvin K. (2011) Identifying Predictive Multi-Dimensional Time Series Motifs: An application to severe weather prediction. Data Mining and Knowledge Discovery. Volume 22, Issue 1, pages 232-258

26. Shen-Shyang Ho, Wenqing Tang, W. Timothy Liu: Tropical cyclone event sequence similarity search via dimensionality reduction and metric learning. KDD 2010: 135-144