All language arrays are copied into each database lib directory. Selection
of base language is done in philo-db.cfg.
Please note that we have NOT translated system generated search
forms. We have found that search forms and headers are frequently
heavily modified by users and administrators. We have also opted
not to support dynamic selection in the distribution, but this
would be a trivial function. If we find we need to do it, we will
add a the patch to the PhiloLogic wiki. If you add this, please let
us know.
Ajax note handler ... a configuration selectable note displayer
that uses Ajax to get the note from the server and taggle display
in the text rather than a pop-up browser window. Example:
An experimental OS-X GUI loader. For those allergic to command
line computing, this is an alternative to the command line loader and
offers options. Proof-of-concept at this point.
NON-TEI encoding scheme support: ATE, DocBook, and plaintext.
PhiloLogic 3.001
New internal search engine (search3). Resolves library
incompatibility bugs in new Linux releases noted in search2. Extensible
in new ways and supports full object searching. The Linux and
OS-X installations now have 64 bit index addressing, so this should
be able to handle about a terabyte of TEI encoded text data.
NOT text search operator:
Try "NOT christ jesus NOT christ" (no quotes) as a test in
docsouth or EEBO. Concise regex notation:
!chr.st.? jesus !chr.st.?
Searching for and in divs by type, head, as well as fields
extracted from opener/closer, author/signed, dateline,
salutation. The table for divs also has fields for id, n,
and lang -- being populated if found -- and placename,
classification and partofspeech (not populated at the
moment, future use). Merges biblio and object searches.
Full word searching on selected subdiv objects: lg, note, epigraph,
sp, and a couple of others. You can search on tag -- lg --
and type (hymn). Merges biblio and object searches.
Fields in this table are tag, type, n, id, who, lang, which
are being populated when data is found.
SQL subdoc object management. This includes dynamic terms buttons
which give frequencies of values with other values selected in
the same object level. This is also required support to standoff
nested object mark-up.
Automatic generation of "whizbang" search form templates with
examples drawn from your data.
Reimplemented "more hits" ... a sliding list of twenty blocks.
The block size and number of block are set in philo-db.cfg
No limit on search results ... well, a million. This is set in
the general philo configuration.
In single document searching, user may select any object.
Multiply included objects ... selecting a div1 and then
a div3 in that div1 are ... are filtered out to avoid
repeats.
KWIC resorting option on left and/or right contexts as well as
selected bibliographic information.
Extensive debugging information, enabled only from philo-db.cfg
as a security measure.
Standard support for ARTFL TEI Lite recommendations, including
metadata, notes, etc. Consult our local
encoding recommendations.
Metadata extraction in the poor man's extractor for TEI, MEP,
and CES. Textload is known to handle all three. ARTFL Text Encoding
(ATE) is in another set of recognizers.
Textload now has a configuration file in the philologic home,
which allows you to define parameters for the load.
Word count per document (standard) and FREQUENCY PACKAGE.
A standard installation of the frequency package. This reads
the word/document data generated by textload.
This is an integrated package which can be optionally built after
the database load. It requires SQL and may take significant time
to load.
Examples (loaded with command
DBDIR/frequencies/makefrequencies DBNAME):
ADD-ONS: Full support for various dictionary look-ups. Enabled
from configuration.
# Enable dictionary look-up function. Set to 0 to turn it off.
# See quickdickjs for further details.
# Options: 1 = ARTFL one look dictionary function with morphological
# package. Obviously for French.
# 2 = Oxford English Dictionary.
# 3 = ARTFL Websters Dictionary
# 4 = onelook.com
Additional code in "goodies" to hook-up PhiloLogic to TaporWare and to
force a similarity search for "dirty OCR applications".
Discussions: some gory details mainly for developers
Bugs: Doubtless!! I think we've cleared most of the the list.
Let us know.
NOTE: Some nested subdiv objects, most notably sp, lg, and stage tags,
may conflict with one another. This has to do with preset object
depths as a holdover from the PhiloLogic2 series. See FUTURE
DEVELOPMENTS below. Slight modifications to rules will fix most
of these, but long term requires deeper object index function.
Discussing: subdiv object report generator. Currently,
if you search for subdiv objects (in a selected set of docs or
whole database) it will simply report these by types and attributes
by document. Not sure how these should be handled. Suggestions?
textload.cfg has an instruction to dump pretty raw XPATHS in
div and subdiv tables. Might be useful.
Testing Needed: Well, everything, of course. But, internal
document navigation. This is based on a single object link table.
Seems to work. Using this for notes, and other internal cross references,
such as tables of contents, indexes, etc.
NOT Implemented at this time:
- stylometric statistic generation: See the todo list. Reason:
TEI data is far too variant to decide on what constitutes a
block -- paragraph. Sentence recognition is still pretty basic
and subject to variations. So the most interesting data for
stylometrics is too dependent on parser behavior to be very
reliable. A good idea, but probably not now....
Future Development in rough order of priority:
- Internationalization: we need to move all user level error
messages to an array and support multiple language interfaces.
- Extended Object Index Depths: the new underlying text search
module supports extended object indexing for searching and retrieval.
Once we have a canonical PhiloLogic3 running with the fixed object depth
indicies, we will be implementing and testing extended object depth
processing. This will require modifications to a variety of components.
[Related to this will be new objects from NLP systems (noun/verb phrases)
and possibly a word/object type field in the base indicies].
- XML-tools based text parser? Or a rewrite of the poor man's textload
to behave more like the poor man's metadata extractor? Probably as part of
Extended Object development. And probably as an option ... still lots of
big SGML databases out there and around here.
- Let us know if there are things we should add.