Digital Humanities Data Mining with Weka (Resources Page)
| Workshop Information |
| Instructor: Huzefa Rangwala
Time/Date: 9:30am-11:00am (June 15, 2012)
Room: Nguyen Engineering 1109
Link to the workshop: http://chnm2012.thatcamp.org/
|
| Summary of Workshop |
| Weka is a powerful platform that allows users to implement data mining algorithms, quickly and we will start with a gentle introduction to data mining. We will define data mining tasks, along with its application towards the rich datasets available from the digital humanities. We will then proceed with a hands-on tutorial on how to use WEKA to build interesting predictive or exploratory models.
|
| Software and Dataset Repositories |
| UCI Machine Learning Repository is a repository of databases, domain theories and data generators that are used by the machine learning community for the empirical analysis of machine learning algorithms.
|
| WEKA is an open source Java package that implements several data mining algorithms. It includes a GUI which allows for automation of several data mining tasks. |
| MALLET is a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text.
|
| CLUTO is a software package for clustering low- and high-dimensional datasets and for analyzing the characteristics of the various clusters. CLUTO is well-suited for clustering data sets arising in many diverse application areas including information retrieval, customer purchasing transactions, web, GIS, science, and biology. |
| YALE (Yet Another Learning Environment) is another open source Java package. It includes a GUI which allows automation of the whole data process from feature normalization to feature selection, learning and cross-validation
|
| SVM light and LibSVM are two popular implementations of various SVM algorithms
|
| TMG is a Matlab Toolbox that can be used for various tasks in text mining
|
| Rapid-I , a free commercial package that integrates with WEKA + HADOOP |
|
|
|