WWP
The Project
Admin
NEH Final Report, 1997-2000 |
Although this report strictly speaking covers the three-year period of our most recent grant (including the one-year extension), the work which we have done during this time is the completion of research begun in previous grants. It may be helpful, therefore, to sketch the central strands of our work to give context for the completion and publication of Women Writers Online. The most important of these have been questions of text encoding methodology and of editorial methods.
The WWP's text encoding research has been an ongoing dialogue with the Text Encoding Initiative (TEI), whose publication in 1994 of the Text Encoding Initiative Guidelines for Electronic Text Encoding and Interchange (P3) laid the foundation for text encoding research worldwide. The WWP's commitment to international standards such as TEI has been paired with an equally strong interest in the particularities of women's texts and early texts generally, and the need to do them justice at the encoding level as well as the reading level. What we have discovered in our research is that the texts in our collection frequently depart from the generic and structural expectations articulated in TEI, and that modifications to the standard are necessary in order to represent these documents accurately. We make these modifications to our own encoding system, but we also publish accounts of our changes in journals and at conferences, and we will pass on the results to the TEI when they begin their next revision of the Guidelines.
In addition to researching adaptations of TEI, the WWP has also served as a source of information and training for newer projects seeking to use TEI for primary sources. We have provided advice, DTDs, documentation, and training to projects including the following:
Our complete documentation will be published at our web site within the next year.
There are several areas in which our research has added most significantly to the work of TEI, which we will sketch briefly here.
Since the WWP has a strong interest in preserving material details of the source document, recording renditional information has always been a crucial part of our encoding. Information such as typeface, case, font shifts, indentation, and alignment provide the reader with important clues to the document's rhetorical shape and the linguistic resources available (through printing customs and the visual vernacular of the text) for the communication of meaning. The TEI does provide a system for recording renditional information, consisting of a single data space--in effect, room for a single adjective describing all facets of rendition, using the "rend" attribute:
However, using this method is awkward if there is more than one kind of information to record at once: for instance if a word is both in bold and in italics.
In order to pack all of the necessary information into this limited space, the WWP developed a system called "rendition ladders" which structure the rend attribute into a sequence of keywords and values. These can be parsed automatically by appropriate software and can be extended as needed, providing a powerful and flexible strategy for accommodating renditional data of any sort. Thus a heading might be encoded as follows:
with the parentheses providing the delimiters between units necessary to allow automatic parsing of the individual keywords and values.
A number of the features which the WWP seeks to capture in our transcriptions are at the level of individual words or letters: for instance, abbreviations, printers' errors, old-style typography, wrong-font letters, and the like. In all of these cases, an individual letter or group of letters is flagged in some way and an alternate reading is supplied by the encoding. It is common practice among text encoding projects (and the TEI Guidelines also provide examples of this) to use these elements at the word level, e.g.:
This is entirely adequate for many situations, and particularly for lightly encoded texts where many of these features are not being marked at all. However, in earlier texts where all of these phenomena are much more frequent, an encoding project like the WWP which encodes quite intensively is bound to run into words where more than one of these elements is needed. In such cases, word-level encoding cannot be used, since it leaves uncertain how the interaction between the different elements and their respective alternate readings will be resolved.
What is significant here is not the insight required to notice and solve this problem (which is intellectually trivial) but the attention drawn to areas of text encoding in which the requirements of scholarly texts are quite different from ordinary online materials. Currently available SGML delivery software does not accommodate letter-level encoding, since it is designed largely for an industrial audience which has no need for such a feature. By illustrating the importance of letter-level encoding for scholarly purposes, the WWP provides additional impetus for the development of delivery software designed for scholarly use.
In an electronic collection of early printed texts, documentation of the source requires the identification not only of the edition but also of the particular copy used as the basis for transcription. This is important not only because of the likelihood of textual variants between individual copies of the same text, but also to enable scholars to identify the source text reliably so that they can consult it in person if they wish--a facet of electronic scholarship which reflects a desire to remind oneself that the object itself still exists.
The TEI makes extensive provision in the TEI header for documentation of the source at the edition level, but does not allow for identifying the individual copy (e.g. by library and call number), nor does it provide for bibliographic references such as Wing or STC numbers. The WWP has developed additional components of the header which allow us to offer more detailed documentation, including--if necessary--documentation of additional source texts consulted (e.g. in the case of partial illegibility) so that the attribution of every part of the transcription is always clear.
During the decade or so that the WWP has been active, scholarly conceptualizations of electronic sources have changed immensely. We have seen the terms of debate evolve as scholars come to terms with new kinds of sources and become acclimated to the idea of working in a new medium. We have also seen a sharp rise in familiarity with the basic concepts of text encoding, so that for many users--faculty, students, and librarians--the electronic text is no longer simply a black box but an intelligible system of content, encoding, metadata, search engine, and so forth. All of these factors have had considerable impact on attitudes about the editorial methods appropriate for electronic texts. Four essential issues have been particularly central to the shaping of the WWP's work and the resulting collection, which we sketch here in more detail.
One of the first questions to be raised as online collections of primary sources began to become more widespread was that of their status as editorial objects and scholarly products. Within the editorial continuum--with the critical edition at one end as the product of considerable scholarly labor and intervention, and the archive at the other end as a repository of untouched source material--online collections proved difficult to place. One of the purposes they were imagined to fulfill in relation to conventional print editions was to provide "all the material" so that instead of getting the result of a scholar's selection and intervention, the reader would be able to confront the primary materials for him/herself and arrive at independent editorial and critical judgments. Tightly coupled with this vision was the notion that electronic sources would be presented with a minimum of alteration--perhaps even through page images--to give the reader the most direct access possible to the primary source. This group of values tended to position the collection as some kind of "archive", and for many projects (including the WWP) the rhetoric of the archive provided a very important way of conceptualizing the work as having a longer shelf life and a greater significance. Since these online collections were expensive and experimental, it was important that they not seem to be yet another scholarly product like the rest, subject to changes in taste and method, but instead a more permanent (because more incontrovertible and more generalized) resource which could be the basis for future editorial work.
At the same time, the concept of the archive had limitations as a model for online resources. Most importantly, it made it difficult to describe the role of text encoding: if the resource was a "raw", unedited transcription, just the text and nothing more, then text encoding had to be seen as a completely objective, reproducible, and simplistic activity. In the case of HTML-encoded texts, this might have been the case, but for the projects using standards such as TEI or EAD--which clearly involved intellectual labor and interpretation of a very significant sort--it was hard to claim that no editorial decisions were being made. And indeed, there were good reasons to argue the contrary: in order to get scholars interested in this work and reap the benefit of their involvement, it was essential to show that text encoding at some level was simply another expression of the kinds of intellectual activities scholars have been undertaking for centuries. At a practical level too, it was only by making such an argument that projects were able to bridge the gap between "technical staff" and "scholars", and to develop the kind of deep expertise that could produce the high-quality scholarly research tools which are now appearing.
The Women Writers Project has argued at times for both the editorial and the archival status of its texts and methods, and certainly they do partake of both in different ways. Although we do consider our texts to be editions, and our encoding to be an editorial act, our approach is closer in some ways to documentary editing than to the Bowers/Tanselle tradition of critical editing. Each of our texts is transcribed from a single source document, without correction or emendation from any other edition or copy. Features that require some kind of editorial treatment (for instance, typographical errors which appear in the original, artifacts of early typography such as the long s or the interchange of i/j, u/v, and vv/w) are recorded both as they appear in the text and with an emended value which can be displayed or concealed at will, and the same approach could be used for critical editing as well. All editorializing decisions, in other words, are preserved in a form which distinguishes them from the transcription of the source document. In this way we are able to treat the text as an archival document and as an edition, without compromising the function of either.
To the extent that the "edition" is an intellectual model which singles out individual texts for particular attention--while the "archive" functions most characteristically as a collection whose scale and comprehensiveness are the key to its interest--the textbase also works to bridge this difference. The individual text functions effectively for the user within the context of a larger collection precisely because its encoding enables it to do so, by registering its relationships of similarity and difference, its location within taxonomies, its participation in separate and collective meaning. And conversely, the collection only functions as such because of the encoded information through which the user apprehends the patterns and anomalies which are present. Text encoding thus allows both the creator and the user to bridge the gap of scale which formerly determined the character of the edition.
Within the framework described above, the WWP has chosen an editorial approach which as much as possible privileges the source text as a historical artifact whose details of apparatus, rendition, spelling, and diction are of significant interest. We feel that even students--whose needs are often cited as a reason to modernize spelling and even syntax--are by and large resourceful enough to deal with an unmodernized text, and when given the opportunity to consider the question for themselves (for instance, in an in-class editing exercise) usually opt for preservation of the source detail. We also feel that to erase the kind of historical distance which old spellings make evident is to pretend that the text can function as a timeless aesthetic object--a concept which has served women authors very poorly in the past.
Full documentation of our editorial methods is available at our web site. In brief, they can be summarized as follows:
Modernization: The WWP does not modernize the text in any way.
Regularization: The WWP encodes certain features of original typography--including the interchange of i/j. u/v, and vv/w--using TEI's
Rendition: The WWP encodes many renditional details of the source document--including case, font, face, alignment, justification, indentation--using the rend= attribute (described in more detail above).
Emendation: The WWP encodes typographical errors in the original using TEI's
Annotation: The WWP does not currently provide any annotation in the usual sense for our texts. We have experimented with providing contextual essays for a subset of our collection, as part of our Renaissance Women Online project. These were clearly helpful to some users, particularly undergraduates, but they pose several problems. The first of these is that they risk becoming dated, particularly in the field of women's writing where new information and perspectives are emerging so quickly. In addition, such materials require a substantial commitment of effort from a large number of scholars, which can be difficult to fund and coordinate on a broad scale. Finally, it seems to us that our role in creating this collection should not be to provide commentary--which is always to some degree tendentious--but to focus our expertise on providing the text and allow others to comment. This approach does not rule out the creation of annotations by others, and indeed there is technology now being developed which will make it possible to create sets of annotations--for instance, a glossary or set of explanatory notes--and share or even publish them for others to use. This would be an excellent arrangement, since it would also solve the problem of how to address widely different audiences and their needs.