Special characters: brevigraphs and diacritical marks

Abstract

Using entity references to transcribe brevigraphs and characters with diacritical marks

special character transcription brevigraph entity superscript diacritical mark macron
abbr choice expan

Encoding Instructions (new P5 version)

Brevigraphs and diacritical marks may often resemble each other, and in some cases the same mark may mean different things in different contexts. For instance, a macron or acute accent may function as a diacritical mark in some texts, but in others it may indicate an omitted “n” or “m”. The WWP encodes these features as follows:

1. Letters with diacritical marks indicating pronunciation. We encode these with an entity reference. Thus for instance an “e” with an acute accent is encoded as é (a full list of these entities is included in the ISO Latin entity set and can also be found on the WWP training page at http://www.wwp.brown.edu/encoding/training/Entities/Entities.html).

2. Letters with associated marks (which may resemble diacritical marks, small attached letters, or small flourishes or squiggles attached to the letter) which indicate the omission of letters or an abbreviated form of a word or syllable. These appear almost exclusively in our earliest texts, in which the typography attempts to imitate the letterforms and abbreviations common in manuscript writing (for instance, “y” with a superscripted “t” attached, “p” with a curly hook, etc.) We encode these with an entity reference, and with choice, abbr and expan elements to indicate the omitted letters. The following is the proper encoding for the word “whom” in which the final “m” is indicated only by a macron over the “o”:

To <choice> <abbr>wh&omacr;m</abbr> <expan>whom</expan> </choice> should I give these cookies?     

Similarly, the following is the encoding for a “y” with a small attached, superscripted “t” which is an old abbreviation for the word “that”:

<choice> <abbr>&amp;ysupt;</abbr> <expan>that</expan> </choice>     

Although this encoding may seem redundant, its utility becomes clear if we consider the case of a brevigraph which can have more than one possible expansion. In such a case, the expansion could not be handled adequately simply by expanding the entity references; the expan element would be crucial. However, the entity reference indicates what the original character was, and allows it to be printed if necessary.

These special brevigraphs should not be confused with simple superscripted characters. For information on encoding superscripted characters, see 194. See 007 for inverted characters.

A list of brevigraphs for which the WWP has created entities:

&ysupe; y with superscripted e: usually for the abbreviation of "the"

&ysupt; y with superscripted t: usually for the abbreviation of “that”

&ysupu; y with superscripted u: usually for the abbreviation of "thou/you"

&wsupt; w with superscripted t: usually for the abbreviation of “with”

&wsupch; w with superscripted ch: usually for the abbreviation of “which”