Mots 15 is an interactive concordance or full-text retrieval system
built mostly out of off-the-shelf software. This document provides a
high-level overview of the system and lists some currently unsolved
problems and currently open opportunities.
1. Basic interfaces in a query system
1.1. Monoliths
At a very simple level, an interactive query system simply
accepts queries from a user, which return responses from the
data.

A monolithic query system.
In systems like Arras and Tact, the single monolithic software
package controls everything in the diagram.
1.2. Web interface
With the advent of graphical browsers for the World Wide Web,
however, it is possible to provide a fairly attractive interface at a
much lower cost than would otherwise be possible. It may still make
sense to devise special-purpose user interface software for specific
purposes, but we can go a long way without it, just relying on the
user to have chosen a Web browser they like reasonably well. The Web,
that is, exposes an interface between the user interface and the data
in the back end.[
1]

A Web-based query system.
This interface sets certain limits to our freedom—we must
now use HTML to describe what the user sees[
2] and the user's interactions with the
server are limited to what can be done using HTML forms—but
within those limits we can develop better user interfaces at a lower
cost than if we were building from scratch.
Even more important, we can now swap front- and back-ends in and out.
We can experiment with different user interfaces by writing different
front-end forms and HTML style sheets. In theory, we can also
experiment with different back ends by substituting one for the other
and using the same front end; in practice, the existing systems built
on this model don't easily allow for swapping different back ends in
and out, because the interface between the front end and the back end
varies with the specific product used as the back end. Because
different commercial products rarely support identical interfaces,
this means it's rarely possible to swap a new back end in with minimal
effort.
The Mots 15 system differs from the generic Web-based system
primarily by exposing a
generic query interface in
front of the back-end-specific query interface, in order to buffer the
front end and back end from each other.

Basic plan of MOTS query system.
Ideally, this generic query interface should follow some open
specification; ideally, it should provide all the functionality
we want (to keep life simple), and no more (so that it is easy to
build back ends if we want to do it ourselves); the exact choice
depends on the tradeoff between these incompatible goals.
Assuming that we have some suitable query language, and a way to
translate from it into the query language of the back end, then any
XML query engine may be used as back end.

Using sgrep as the MOTS back end.
A SQL dbms may be the most flexible back end. The design made by
MSM for this would involve a few light-weight scripts which run
‘on top of’ the SQL database system. The SQL
system itself would in this design produce not elements but element
pointers, which would be used to extract the elements from a saved
copy of the XML or SGML file.

Using SQL DBMS as the MOTS back end.
The task of translating from the open query language to the
proprietary back end query language is, of course, simplified if
the back end accepts the open query language itself.
2. Pieces of Mots 15
Mots 15 is designed to make it relatively simple to specify and
implement each piece of the system. The better we succeed in this
goal, the easier it will be for us to experiment with different
parts of the system, and the easier it will be for eventual users to
customize it for their own purposes. Eventually, the designers hope
that Mots 15 will grow into a library of reusable and customizable
pieces, which individuals and small projects can modify to make
useful special-purpose systems.
The Mots 15 design requires the following pieces of software:
- browser: an off-the-shelf Web browser; this
handles the actual display of results on the user's screen and
interaction with the user
- forms: one or more HTML forms which allow the
user to specify searches; these produce an HTML-forms data stream
which the parser hands to an appropriate CGI script
- form-to-query translator: a program to translate
the forms data into a query, expressed in the open query
language
- query-to-query translator: a program to translate
the query from the open query language into the query language
supported by the back end
- back end: a program, which accepts queries in
some (possibly proprietary) query language and returns as results some
set of SGML or XML elements[3]
- wrapper: a program which takes the results and
places them in two-level wrapper: (a) an outermost
mots:result element and (b) a mots:hit
element wrapped around each hit, each with attributes providing useful
information about the query and its results
- SGML-to-HTML translator: a program which takes
the wrapped results and translates them into HTML suitable for display
in the user's off-the-shelf browser
- transaction manager:
a CGI script to manage the query/response transaction, by
calling (or incorporating) the various other programs in this list;
it may also be responsible for session management
2.1. The user interface (forms design)
There is no obvious single right way to write a Web interface for
a full-text query system; we plan to write several, both to experiment
and to allow different users to have different interfaces.
If Mots 15 is ever widely deployed, we expect that much of the
user customization will involve modifying the forms interface by
rewriting the static HTML.
We expect to produce:
- a very simple user interface into which users type words (which
will be ANDed together and put into a search for paragraphs, speeches,
or lines)
- a more complex form which allows the user to select elements
within which to search, by generic identifier
- a form which allows the user to type in a search expression
using some particular query language (in the short term, the
‘open query language’ identified in the diagrams;
in the long term, possibly other query languages)
- as many others as we can think of or find rationales for
The definition of a new form should always include the specification
of the fields it uses and their meanings.
2.2. Form to query and query-to-query translation
The translation from a form to the standard query language may
be simple or complex, depending on the form. No generalizations
are possible.
Translation from the standard query language into the back end's
query language is apt to be relatively complex: a task for a programmer
rather than a power user.
There is no technical reason, but there is a design reason, not
to combine the form-to-query and the query-to-query translators.
If they are combined, then the front and back ends have direct
exposure to each other, which means in turn that it's harder to
substitute in a different back end or front end.
2.3. Hit wrappers
The
mots:result and
mots:hit element types
carry information about the query which it's useful to have; as
the Mots 15 system matures, we expect to gain a better understanding
of what information needs to go here. In the current system (0.5),
the elements and their attributes are:
- the mots:result element, which provides general
query information and has as attributes:
- query, which shows the open-query-language
query to which a response is being returned
- the mots:hit element, which is wrapped around each
hit, with attributes
- text: identifies the document from which the hit
came
- sourceid: gives the unique ID within the source
document
- canonical-reference: gives a canonical
reference to this location in the document, for display to the
user
2.4. Transaction management
It appears useful to have a central program which does nothing
but manage all of the others, and keep track of information which
some of them, but not all of them, need, including:
- style sheet to be used in formatting results (when more than one
is available)
- other user- or session-specific settings
A. References
Price-Wilkin, John.
“Using the World-Wide Web
to Deliver Complex Electronic Documents:
Implications for Libraries”.
Public-Access Computer Systems Review
5.3 (1994): 5-21.
http://jpw.umdl.umich.edu/pubs/yale.html.
Price-Wilkin, John.
“A Gateway between the World Wide Web and PAT:
Exploring SGML Through the Web”.
The Public-Access Computer Systems Review
5.7 (1994): 5-27.
Price-Wilkin, John.
“The
Feasibility of Wide-area Textual Analysis Systems in Libraries:
A Practical Analysis”.
Presented at Literary Texts in an Electronic Age:
Scholarly Implications and Library Services,
the 31st Annual Clinic on Library Applications of Data Processing
(University of Illinois at Urbana-Champaign).
April 10-12, 1994.
http://jpw.umdl.umich.edu/pubs/dpc.html.
Published in the
Proceedings of the Clinic.
“A Gateway between the World Wide Web and PAT:
Exploring SGML Through the Web.”
Price-Wilkin, John.
“Just-in-time Conversion, Just-in-case Collections:
Effectively leveraging rich document formats for the WWW”.
D-Lib Magazine
May 1997.
http://www.dlib.org/dlib/may97/michigan/05pricewilkin.html
B. Acknowledgements
I am grateful to a number of people for the help they have
given me in clarifying the ideas of Mots 15. First of all, of
course, to Claus Huitfeldt,
Paul Meurer, Sindre
Sørensen, and Kjersti Berg for agreeing that it's worth
trying and for their work in implementing it.
The fundamental idea of Mots 15 became clear in my head while I was
listening to a discussion organized by Geoffrey Rockwell and John
Bradley at ALLC/ACH '98 in Debrecen, Hungary. I am grateful to them
for provoking that clarity. They should not, however, be held
responsible for the result: their ideas on software development and
the right way to go about building interactive concordance systems are
rather different from mine, with some complicated patterns of
agreement and disagreement.
Mots 15 incorporates ideas on software development and on interactive
concordance and text analysis sytems in particular which I have
discussed over the years with a number of people. I am grateful to
Willard McCarty, Steve DeRose, and Fotis Jannidis for discussions
that have relatively obvious links to elements of this design.
Less obvious in detail, but pervasive, are my debts
to Lou Burnard.
An crucial debt is to discussions with Geoff Bilder,
to which I owe my conviction that the query language interface is
a crucial determinant; in that connection I also acknowledge a debt
to Susan Hockey, who organized the meeting at which those discussions
took place, and to Peter Batke (who provided the number 15 in the
name of the system).
The key ideas of the system, of course, I learned from John
Price-Wilkin years ago; he made them seem so natural that when I
formulated them again for myself I thought, for a while, that they
were new.
My thanks to all of these, and to all of the others I should
have named but have not.