|
PhiloMine (release 0.1a)
Future Plans
ARTFL/DLDC University of Chicago
|
Current work planned includes (but is not limited to):
- development of Latent Semantic Index and/or Probabilistic LSI vector
space searching;
- adding some predictive functions to the all document flavor of
PhiloMine, which will probably be done when we have a data set that
would benefit from this;
- adding the stemmer to the base version of PhiloMine and possibly
a simple English stemmer;
- saving processed ARFF (Weka) files and possibly processed
vector space matricies, rather than recalculating them on each run
(code is mainly done);
- save SVMLite and Weka SMO models for local, non-WWW processing,
an option we may need to do larger runs on more powerful machines
once we have tested a feature set, parameters, etc. interactively.
Longer term notions include expanding available kinds of feature
sets, such as bi/tri grams, lemmas and port of speech. As currently
implemented, these function would simply be stored as alternative
base data in the same format as we have surface forms and should be
plug compatible.
As we continue working on our own research projects, we expect
that other notions will occur to us. Let us know if you have
any ideas, and even better, if you have any code.