PhiloMine (release 0.1a)
Select Bibliography
ARTFL/DLDC
University of Chicago
We will be adding more.
General Data Mining
Collection of data mining tutorials by Andrew Moore
http://www.autonlab.org/tutorials/
"An Introduction to Data Mining" by Kurt Thearling, Ph.D.
http://www.thearling.com/dmintro/dmintro_2.htm
Classifiers (General)
Classifier Showdown (SVM, Bayes, perceptron neural networks)
http://blog.peltarion.com/2006/07/10/classifier-showdown/
Bei Yu, "An Evaluation of Text-Classification Methods for Literary Study."
(PDF 800 Kb) Submitted January 2007 for a Ph.D. in Library and Information Science from the University of Illinois.
Fabrizio Sebastiani,
Machine learning in automated text categorization
,
ACM Computing Surveys
, Volume 34, Issue 1 (March 2002).
Naive Bayes
Wikipedia article on Naive Bayes
http://en.wikipedia.org/wiki/Naive_Bayes
http://blog.peltarion.com/2006/07/10/classifier-showdown/
Support Vector Machines
Wikipedia article on Support Vector Machines
http://en.wikipedia.org/wiki/Support_vector_machine
Script to extract SVMLight features weights from generated models
http://www.cs.cornell.edu/People/tj/svm%5Flight/svm2weight.pl.txt
Using two-class classifiers for multiclass classification
http://ict.ewi.tudelft.nl/~davidt/papers/icpr_02_mclass.pdf
Multiclass SVM discussion
http://blue.utb.edu/hlei/Research/papers/Half_HalfSVM.pdf
Discussion of Feature Selection in SVM by Gabrilovich and Markovitch
http://www.cs.technion.ac.il/~gabr/papers/fs-svm.pdf
"Random Falsifiability and Support Vector Machines", by Ruiz and Lopex-de-Teruel
http://learn98.tsc.uc3m.es/~learn98/papers/abstracts/paper013/abstract.html
Vector Space
Building a Vector Space Search Engine in Perl
http://www.perl.com/lpt/a/713
WEKA
Description of the ARFF data format
http://weka.sourceforge.net/wekadoc/index.php/en:ARFF_%283.4.6%29
The WEKA Primer
http://weka.sourceforge.net/wekadoc/index.php/en:Primer
WEKA Troubleshooting
http://weka.sourceforge.net/wekadoc/index.php/en:Troubleshooting#Spaces_in_labels_of_ARFF_files