Methodology for Transcription and Editing

General Principles

The WWP encodes works in English, or in English translation, by women before 1850.

In general, we encode entire texts rather than excerpts, and our transcription includes all front and back matter, including material which may not necessarily be by the author. We excerpt only in cases where the desired text is a very small part of the total published document, and where transcribing the entire document is currently impractical.

Our transcriptions are encoded in XML, following the specifications of the Text Encoding Initiative (TEI), with documented TEI extensions to accommodate the needs of our particular corpus and approach. We produce a diplomatic transcription of each text, preserving the original spellings, typographical errors, lineation, hyphenation, and other details of the text. We also record corrected readings for typographical errors, expansions for uncommon abbreviations, and regularized versions of old-style typography (such as the use of “i” for “j” and “u” for “v”). These alternatives permit some flexibility in display: using an appropriate stylesheet, we can display the text with or without original lineation, typography, errors, and so forth.

Theory of the text

We treat the text as a document more than as a work of literature: hence our approach emphasizes transcription of the full document rather than only the “work”, and preservation of renditional detail, original spelling, errors, rather than their effacement. In addition, each document is treated as a circulating cultural artifact, whose historical specificity is part of its value. As a result, we do not emend the text or create critical or synthetic editions; each encoded text is a transcription of a particular physical object.

XML and the TEI tend to imply a theory of the text which emphasizes its structure as an important ontological fact about the text’s existence. The WWP believes that this kind of encoding provides an intuitive and useful way for scholarly users to read and navigate the text; however, we do not insist on it as the only possible theory of the text.

Inclusion and exclusion criteria

The WWP’s inclusion criteria in principle are very broad: over the (very) long term, we hope to include all printed texts by women in English dating from 1850 or before. We also plan to include manuscript texts by women during the same period. Given present constraints on time and money, however, we give greatest priority to texts which are not currently available in print, which are most needed by scholars, and which can be encoded usefully within the constraints we face. As a result we have postponed the encoding of manuscript materials until we can devote attention to the methodological issues involved. Our acquisition committee assesses current scholarly interest in periods, genres, and issues, and gives priority to groups of texts which seem most likely to serve the scholarly community.

In addition to English texts written by women, the WWP textbase includes texts co-authored by men; texts of doubtful authorship, where the WWP feels there is good reason to believe the author was female; texts translated into English by women (the original author may be male, although these texts would have a somewhat lower priority); texts written by women in other languages and translated into English by men (again, with a slightly lower priority); historical accounts of trials or other events which claim to report women’s words more or less directly; narratives dictated by women to male transcribers (even where it seems likely that the transcription is not verbatim); texts written under a female pseudonym which have circulated as women’s writing (whether or not the author is actually female; again, with a somewhat lower priority).

The aim behind these choices is to give an inclusive cross-section of the written culture. Allowing the inclusion of dictated narratives, for instance, makes it possible to include texts by illiterate women (for instance, slave narratives) which would be excluded if we insisted upon a strict construction of authorship. Similarly, historical reports of women’s words (for instance, in the context of a witchcraft trial) give a view of women’s discourse which would otherwise be inaccessible. Texts in translation have circulated within the culture of English women’s writing and represent an important component of that culture. Categories like these need to be distinguished from writing which is straightforwardly “by women” (for instance, for purposes of linguistic comparison), but this can be accomplished by appropriate identification.

Choice of Edition

The WWP always transcribes from a specific copy of an early edition, contemporaneous with the author unless particular circumstances dictate otherwise (e.g. posthumous publication). Where possible and appropriate, we use the first edition. In cases where a later edition is of equal or greater scholarly importance (because of authorial revision, censorship, etc.), we also aim to encode the later edition, although we may not be able to do so immediately. Over the long term, our hope is to include all significant versions of our texts, although this will not be possible in the near future.

As a rule, the copy chosen for transcription is generally the only source of information for that transcription; the WWP does not provide a record of variants, emendations, etc. from other copies or editions. However, in cases of illegibility, the transcription may be supplemented with readings from other copies of the same edition. In very rare cases we may consult other editions where there is only a single flawed copy of the chosen edition available. The source of such readings will always be explicitly documented. In the future we hope to create linked transcriptions which allow for easy comparison between editions. See Principles of Transcription, below.

Texts which were encoded at the project’s inception were occasionally chosen on other grounds, since the project’s editorial goals have changed somewhat over the years. Transcriptions which do not currently follow the principles outlined above will be updated over time to conform to our current editorial practice.

Principles of Transcription

Treatment of textual variants

Textual variants from other editions than the one being transcribed are not included. In the future, we may provide links to transcriptions of variant editions to allow for comparison and possibly collation.

Hyphenation

Line-end hyphens are preserved. Soft hyphens are distinguished from hard hyphens, and are recorded using the TEI’s mechanism for recording end-of-line hyphenation. They may be displayed or suppressed depending on whether original lineation is expressed or not. In cases where it is unclear whether a line-end hyphen is hard or soft, we follow the hyphenation for that word used elsewhere in the same text; if the word does not appear elsewhere, we record a hard hyphen.

Typographical errors

Typographical errors in the original document are recorded, together with a corrected reading, using TEI’s mechanism for recording error.

Regularization

The WWP regularizes intraword spacing to a single space. We regularize space between words and any following punctuation to zero spaces.

Original typography and spelling

Old-style typography (including the interchange of i and j, u and v, and vv and w) is preserved, together with a normalized reading, using TEI’s mechanism for handling original readings.

The WWP does not currently record modernized spellings. Doing so poses a number of challenges, not least of which is financial; it would be a very large undertaking and one which would require special funding. In addition, however, there are conceptual challenges such as the frequent difficulty (especially in our oldest texts) of determining the correct modern equivalent. Modernization for many texts is closer to translation than to spelling correction; a word which appears to be a direct modern equivalent may in fact have a rather different meaning, and to substitute it directly may create a misleading impression of the text — particularly for readers less familiar with early texts, whom modernization is intended to help. Although offering a modernized reading may in some cases make a text more accessible to inexperienced readers, our experience shows that students are usually able to adjust, and may even learn more from contact with a less mediated version of the text.

Special characters and Unicode

The treatment of “special characters” (e.g. characters with diacritics) has changed substantially with the advent of Unicode, which makes it possible to represent nearly all of the printed characters in early modern printed books without difficulty. However, we occasionally encounter characters (such as alchemical symbols or private symbols) that are not included in the Unicode standard. For these we use the TEI’s mechanism for representing characters not in Unicode (see the TEI Guidelines for more detail).

Handwritten additions and deletions

Handwritten additions which are roughly contemporary with the text are transcribed in full. Deletions are encoded, with the original printed text being transcribed as content. If the deleted text is illegible, that fact is also encoded.

Features omitted from transcription

The WWP’s approach to transcription focuses on the linguistic text, and while we also provide some basic information about non-linguistic features of the text, we do so in a simplified way. There are also a number of features which we do not transcribe. These are omitted largely to enable us to encode more efficiently, and to focus on making more texts available rather than on giving exhaustive visual detail about correspondingly fewer texts. Finally, we understand that no transcription can ever capture strictly visual or graphical detail with sufficient accuracy to replace the original for certain kinds of study. Scholars who need information of this sort will need to consult the original in any case, and for us to attempt to duplicate that information here would be wasteful.

The WWP records the presence of illustrations, together with a brief description of the illustration and a transcription of any text which appears within the illustration. We also encode the presence of ornaments and ruled lines. We do not distinguish between different kinds of ornaments or rules. For our purposes, an illustration is any graphical feature which contains representational content; an ornament is any purely formal or abstract graphic (e.g. a border of acanthus leaves).

The WWP does not transcribe running headers and footers, with the exception of page numbers, signatures, catchwords, and press figures.

The WWP does not transcribe bookplates, modern handwriting, or modern library stamps. The omission of these features is indicated in the transcription using the TEI’s <gap> element.

The WWP does not transcribe smudges, foxing, dead insects, or other non-textual, non-graphical marks. No indication at all is made of their presence, unless they render text illegible or unclear.

Treatment of Document Rendition

With electronic texts, there is a large difference between the information that is recorded and the way the text is displayed (on the screen or in print output). The WWP records a great deal of renditional information, both directly (in a renditional attribute which records many details of the document’s original presentation) and indirectly (in the use we make of renditional information in deciding what a given textual element is). However, in displaying texts electronically, or in creating printed output, we are guided not only by the document’s original appearance but also by considerations of readability in the new format. Our aim in displaying the document is to present the same information that the original document conveyed — for instance, the presence of paragraphs and stanzas — but without necessarily using the same means of conveying it. Thus while different documents may use indentation or line spacing to show a paragraph break, we will typically use a single style sheet to display paragraphs. Similarly, different documents may use varying amounts of space to separate stanzas; our display will make it clear where the stanza breaks are, but will not seek to reproduce the exact amount of original spacing. Display can also be used to make the document more useful; for instance, by providing a large print version for the visually impaired, or by displaying notes in a separate window which scrolls with the main text rather than where they actually appear. Similarly, documents which are printed entirely in italics will be much more legible if displayed in roman instead.

The WWP seeks to capture as many as possible of the meaningful renditional features of the text. By “meaningful” we mean features which affect the reading of the text, where “reading” is understood to mean all aspects of reception, not simply the absorption of strictly denotational meaning. Meaningful renditional features are those which affect the way the reader knows what kind of textual feature she or he is looking at, and understands its relationship to other textual features.

Our transcription records most (though not all) significant details of the appearance of the source, including:

  • font shifts (roman, italic, blackletter)
  • capitalization and use of small capitals
  • text alignment with respect to page margins: left, right, center
  • relative indentation
  • line, column, and page breaks
  • rough positioning on the page (for marginal notes, annotations, and the like)
  • end-of-line hyphenation
  • wrong-font letters
  • turn-unders and turn-overs in verse
  • significant use of relative white space to delineate textual structure (stanzas, paragraphing, etc.)
  • inverted letters
  • the presence of dropped, raised, or decorated initial capitals

We do not record:

  • absolute or relative type size
  • font of punctuation
  • absolute line spacing or vertical white space
  • baseline irregularities
  • broken type
  • running heads (except page numbers)
  • kerning and word spacing irregularities (except where these may be significant to the determination of word boundaries)
  • swash characters
  • ligatures (except digraphs such as æ)

Documentation and Metadata

In addition to transcribing the full text of each document, the WWP also records certain kinds of metadata, or information describing the document and its transcription. This information is recorded in the TEI header. For more detail on TEI headers, see the TEI Guidelines.

Information about the source copy

The WWP records the author’s name (if known), the facts of publication about the original text, the location of the source copy used in our transcription including library catalogue number where possible, and the Wing or STC number where applicable.

Language and period

The WWP records the main language of the document and any other languages used in the document.

Genre

The WWP assigns a rough genre classification to each text.

Keywords

The WWP will over a period of time record topical keywords for each document or (in the case of multiple works published together) for each textual unit.

Details of encoding and editing

The WWP records information about the general editorial practices used in preparing the textbase, and also information about the specific practices used for the individual document, if it requires special treatment.