EncycMine1: NAME YOUR DATABASE
  Corpus One Corpus Two
Author
English Class (architecture or medicine)
Normal Class
Original Class
Type
HeadWord

Minimum Word Count per document:

Text Mining Function

  1. Differential Relative Rates (DRR)
  2. Classifier: Multinominal Naive Bayesian (MNB)
  3. Classifier: Weka Naive Bayes
  4. Classifier: Decision Tree (DT) Generate Graphic Tree
  5. Predict: Multinominal Naive Bayesian (MNB). Train on c1 predict on c2.
  6. Vector Space (experiment) Normalize Vectors (LogLinear).
  7. Classifier: SVMLight C:
  8. Classifier: SMO (Weka)
  9. Weka Information Gain

Runname:
Top Feature Display Limit:

or

Options and Configurations

  1. Balance instances. (applied to all tests).
    If the two corpora are not balanced by number of documents, performs a random selection from larger set. Important Note: hitting reload on a result from this option will select another test set (randomly), so you will get different results!!!
  2. Limit features in less than documents and more than of documents.
  3. Stem input words
  4. Restrict Document Feature Frequency: LowerBound UpperBound applies to MNB and DT.
    MNB performs better on relatively infrequent words (2-20?). DT splits on ranges, so 2-100 or so may be best. Set to 1 and 5000 to use all but most frequent words.
  5. Random Falsification (randomize instances, as a test of classification model).
  6. No select or select from bibliography.
    This will allow the user to select/unselect from the two subcorpus selected from the above form (not implemented).

Input Features to use separated by spaces, using full regular expressions:

Input Features to EXCLUDE from other criteria separated by spaces, using full regular expressions: Eg: d j v s c t i p f archit architecture architect

or

Powered by PhiloLogic