WWP Book
Women Writers Project: blockimage Women Writers Online blockimage About blockimage Texts blockimage Encoding blockimage Site Index blockimage Contact

A History of Encoding at the WWP

An Introduction

Text encoding is as fundamental to the WWP as women's writing; as the project's founders knew, only a powerful method of transcription and preservation could save the new collection of texts from being lost and forgotten all over again. The solution they chose was Standard Generalized Markup Language, a new international standard for text encoding. At the time, this choice represented a leap of faith on two counts. First, SGML was largely a product of the industrial-military sector (its biggest users are still airplane manufacturers and the Department of Defense), and applications for humanities projects were only just starting to appear. The most significant of these, the Text Encoding Initiative, was not formally founded until 1988. And second, although in many ways SGML and the TEI looked like promising ways of encoding primary textual data, there was as yet almost no software for printing, displaying, distributing, or using SGML data in an academic context. As a result, the WWP has had to be a researcher and a developer in text encoding as in textual studies.

Since that time, SGML (and now XML) has become a major force in the humanities domain, and the advent of the World Wide Web has given it a publicity that could not have been foreseen even by the most optimistic. The TEI has also proved very successful, with hundreds of projects using its guidelines. The WWP thus finds itself, after more than fifteen years, part of a thriving community of projects and researchers with whom we can exchange results and methods. We publish research on text encoding of primary source materials, and collaborate with other projects.

Some Historical Notes

We first began transcribing texts in 1988 using Waterloo Script, but shortly thereafter converted our encoding to use the newly emerging TEI Guidelines (version P1 and then P2). Our earliest approach to encoding primary sources was to capture as much information as possible about the appearance of the physical document, including information on type size, leading, indentation, font, and ligatures. The limitations of this approach became clear, however, both in the difficulty of deriving exact information from microfilm and photocopies, and in the difficulty in many cases of conceptualizing how this information would be used. This level of detail also added considerably to the already expensive process of transcription.

When a revised version of the TEI Guidelines (P3) was released in 1993, the WWP undertook a period of research in which we scrutinized our encoding and decided how to update it to meet the new guidelines. The new methodology which resulted placed less emphasis on the raw details of appearance, and more on what they indicated about textual structure and content. It also provided a much fuller and more accurate representation of the essential structures of the text, and a considered rationale for the level of detail we wanted to preserve. During this period of self-study, the textbase itself remained "frozen", since we were reluctant to encode further texts until we had arrived at a satisfactory new system. Printable versions of all existing texts were saved so that we could continue to provide printouts, even if changes to the encoding made it impossible to use our existing print routines.

As soon as possible (and, with hindsight, probably sooner than was prudent) we resumed encoding new texts and began the process of converting our earlier encoding to the new system. Since then, we have continued to make smaller changes to our encoding to adapt it to our constantly growing knowledge of early texts. The result of our research is a system of text encoding which is adapted to the special requirements of early texts and their users. We are now completing documentation for our work so that it can be used by other projects with similar goals.

In 2002 we converted our files to P4, the XML version of the TEI Guidelines. We anticipate converting to P5 once it is completed, probably in 2007.

brownlogo