This is a constantly growing list of things that we intend to fix for future revisions of PhiloLogic.
Add a routine to newextract (the bibliography generator) check to
see if each file has the basic elements required for loading as
a TEI/ATE or other file. This would check for a DIV level
object, a P level object, and possibly for some CDATA
contents. More later... This is to resolve the fact that
one can have directories of XML files that will contain a few
headers, including, and other stuff....
Martin Mueller (Oct 10, 2005) suggests a random hit function:
I think that adding a random sample feature would be very useful.
For any set of returns that runs in the hundreds, not to speak
thousands, it would be a terrific first orientation to have a random
sample. It might have a minimum size--say 50, and then increase as a
fraction of the total size until the sample size is such that
increasing the sample size won't add much.
As an example, I have a student who is interested in figuring out the
relationship of cognitive and ethical meanings in 'true' from Chaucer
to Shakespeare. There are close to 20,000 occurrences of tr[vu]e in
that period. For her, a random sample of 1,500 would let her figure
out in a day where the action is.
Not a TODO, but a kewl idea from Orion that I don't want to loose
track of....:
Somebody from the New York Times is asking people to submit addresses
of things from books, for them to add to a map of Places Mentioned In
Books, a "literary map of manhattan".
http://www.nytimes.com/2005/05/01/books/review/01COHENHO.html?ex=1272600000&en=9
093cefdfcdb6409&ei=5090&partner=rssuserland&emc=rss
(tinyurl: http://tinyurl.com/9ew8h)
Some of these have addresses ("The Talented Mr. Ripley") but most of
them don't. A fun project -- for someone with lots of text and a fast
search engine and Google Maps -- would be to map all of this
automatically, parsing out addresses or intersections or what have
you. Though of course it would be impossible to get everything that a
human could.
New results format: map geographically.
Note: I gave a very general
talk
on "Mapping Textuality" a few years ago. I would love to do this.
Very interesting.....something to think about.
IWW style requeries,
to re-present results in different ways, giving the user a "filter"
(Julia's expression) approach to result sets. Russ and I are thinking
of a dynamic results header as a drop-in block of code, which would keep
LATENTQUERYSTRING on the server and parse it for different result sets.
Carole Mah sez: Put something about the sort order of basic text search
results. These are in LOAD order. The general loader tries to sort out
the load in chronological order (year only). This could simply
be put in philosubs..... Oddly enuff, we won't always know about
that. Geez, you wudda thunk we would have something like that, eh?
Create the philohistory directory either on install or when
the Philo history function is run and does not find one. Check
to see if it reads the PHILOTMP directive.
Add sort by frequency to Terms button? There may be speed problems
with this. And I don't have a good idea about how to put the
switch in the interface (a global selection)?
02-22-05: Allow specification for the sort -T location in general philo
configuration, which will probably avoid the next. Note that it
can be set in loader.xmake changing
02-22-05: Trap for no space on device error on load. If we get this
as we are reading texts in, it simply stops loading the offending
batch and in certain circumstances will load the database without
noticing it is missing a batch.
Loading 999 ===> TEXTS/pharisjn.xml...
/usr/local/bin/sort: write failed: ./sortU6aO_m: No space left on device
02-22-05: and while we're at it, let's encourage a default database
directory that is NOT in the standard install location
(/var/lib/philologic/databases/).