Women Writers Online
About
Texts
Encoding
Site Index
Contact
The RWO project represented for the WWP an opportunity not only to add an important group of Renaissance materials to our online collection, but also to test and refine our encoding system with a corpus of earlier texts in a wide range of genres. During the period covered by the RWO grant, the WWP not only made substantial improvements on our encoding system based on our research with RWO texts, but we also streamlined our encoding infrastructure and added tools which increase the speed and accuracy of the encoding process. We also developed a customized online delivery system which provides a search and browsing environment suited to teaching and scholarly research.
The WWP uses the Text Encoding Initiative Guidelines for Electronic Text Encoding and Interchange (TEI), with TEI-conformant modifications as necessary to accommodate the idiosyncrasies of early modern texts. These modifications have been carefully documented and will be submitted to the TEI for possible inclusion in the next release of the TEI Guidelines. They fall into several categories:
In addition to the general structural markup of the text itself, the WWP finds it important to record and mark up various kinds of information which are essential to scholars working with primary sources in digital form, and which help provide a familiar environment for scholarly research. Most important of these is the metadata which preserves detailed bibliographic information on the source text, including Wing and STC number, source library and shelfmark, facts of publication and authorship. This information is recorded in the header for each document and is heavily exploited in our search interface. In addition, we add further documentation about the condition of the source text, including any areas which are damaged or illegible.
The WWP's encoding staff use a Unix-based environment with an SGML-aware text editor (Emacs with psgml) for our text encoding work. This basic environment provides constraints which guarantee that the encoded texts conform to the TEI document type definition, and it also provides guidance for the encoder by offering a list of legal TEI elements at any given place in the text. Encoders begin their work with a blank template which already includes standard information and a framework for creating a full TEI header for the document. In addition, the WWP has written several tools which assist the encoder by streamlining the encoding process or by automatically tagging certain kinds of textual features. These tools include:
As creators of richly encoded SGML data, the WWP is one of a number of projects currently facing the same problem: the fact that SGML publication software is still scarce and designed for industrial production settings rather than academic projects in the humanities. Tools for publishing SGML content on the World Wide Web (such as INSO's--or, since late 1999, Enigma's--DynaWeb) are even scarcer and are also not designed with scholarly uses in mind. The advent of XML is widely predicted to be a possible solution to these problems, but at the time the WWP was planning our initial publication, we had the choice of customizing an existing application or of designing one ourselves from scratch. Although the latter option would theoretically have given us more flexibility and control over the resulting product, there were a number of potential concerns. The expense of software development was first among these, particularly because the actual cost of creating a functional system from scratch was difficult to estimate with precision. We also knew that although we could probably develop an SGML-to-HTML transformation system fairly easily for our specific texts, we would not be able to make it general enough to allow for easy expansion, nor could we easily support the rapid content-based indexing provided by commercial software. Finally, creating a new application ourselves would necessarily be an all or nothing approach--we risked being caught with no delivery system at all if we encountered any serious problems. We had already experimented with DynaWeb and although its default interface and functionality were ill-suited for our purposes, we thought we could build a customized interface with most of the functionality we sought. The advantages of this approach were that we would be able to start using the system in its uncustomized state almost immediately, and add improvements as we developed them. Furthermore, if the project turned out to be a long-term success, we could design a custom application ourselves later on, possibly taking advantage of the arrival of XML-aware software and support systems.
Accordingly we decided to build a custom interface and based our delivery system on DynaWeb. In DynaWeb the underlying infrastructure of indexing, searching, and processing the encoded data (which is performed by DynaText, an SGML search engine) is separated from the display of this data on the web. The latter works by a system of style sheets which dynamically translate SGML data into HTML for web display. From the user's point of view, the data is simply HTML which can be viewed with a standard web browser. However, searches and word- or structure-based functions are passed back to the DynaText engine and performed on a preprocessed form of the SGML data, allowing for the exploitation of specialized markup. Thus for instance the user can limit a word search to verse drama, even though HTML has no ability to represent or flag particular genres. The advantages of this general solution for us were considerable: the user would not need any specialized software or skills, the purchasing institution would not need to install anything locally, and the value of our SGML encoding would not be lost by down-translation to HTML (as it would be in a static, one-time translation system). Also unlike systems like SoftQuad's Panorama, which downloads an SGML text to the user's computer and allows specialized processing to occur locally, DynaWeb can search and selectively display information from the entire corpus. Panorama requires custom software to be installed locally and can only really handle one document at a time, both disadvantages which ruled it out for the kinds of uses we wanted to encourage.
On top of this basic system, we created a custom interface which provides several important features:
Next to Early User Response
Back to Publication