|
What is PhiloLogic?
PhiloLogic™ is the primary full-text search, retrieval and
analysis tool developed by the ARTFL Project
and the Digital
Library Development Center (DLDC) at the University of Chicago. This is a
Free Software
implementation of PhiloLogic for large TEI-Lite document
collections. The wide array of XML data specifications and the recent
deployment of basic XML processing tools provides an important
opportunity for the collaborative development of higher-level,
interoperable tools for Humanities Computing applications. The
sophistication and power of the TEI-XML encoding specification
supports the development of extremely rich textual data
representations that encourage, if not require, development of sets of
tools to exploit features of encoded text to perform particular
tasks. It may be the case that one general tool will never fit all
possible uses for encoded documents, but that a set of more
specialized, interoperable tools for end-user applications will
provide a mechanism for cost-effective deployment of end-user
applications.
As the ARTFL Project's contribution to the collaborative
development of these tools, PhiloLogic has been enhanced
to support a wide variety of TEI-Lite (XML and SGML) encoded documents
optionally using the Unicode character specification. We feel that
Humanities Computing applications are particularly well suited to open
source development by a community with wide ranging technical
abilities that is not well supported by the commercial sector. Our
goal is to provide as many features as possible while not requiring
significant administrative or development work to use effectively.
Originally implemented to support large databases of French
literature, PhiloLogic has been extended to support a wide variety of
textual and hypermedia databases in collaboration with numerous
academic institutions and, more recently, commercial
organizations. PhiloLogic is a modular system, in which a
textbase is treated as a set of coordinated or related databases,
typically including an object (units of text such as a letter, scene,
document, etc) database, a word forms database, a word concordance
index mapped to textual objects, and an object manager mapping text
objects to byte offsets in data files. Each of these databases is
stored and managed using its own subsystem.
Reasons to use PhiloLogic:
light, fast, robust, extensively used and tested
few dependencies, basic installation almost wholly self-contained
out of the box operation with many configuration options
TEI-Lite XML/SGML (and variants such as MEP and CES) with Unicode
support
support for plaintext, Dublin Core/HTML, and DocBook
MySQL back-end for bibliographic searching
optional XML-aware or non-XML bibliographic loaders
interoperability across certain systems
fault tolerant
open source
Check out our samples page for some demonstration databases loaded using PhiloLogic. Many thanks to the Brown University Women Writers Project, the Margaret Sanger Papers Project, the Victorian Women Writers Project, Martin Mueller's Nameless Shakespeare Project, and the British Women Romantic Poets Project for providing us with these texts. Access to three of these databases is still pending approval from the data providers.
You may also be interested in looking at extensions to PhiloLogic, which are also
available as open source releases.
PhiloMine provides an interactive
environment for a wide range of machine learning and text data mining functions.
PhiloLine is an extension which uses a
simple sequence alignment algorithm to detect similar passages.
|