Abbreviations

abbreviation expansion punctuation phrase-level encoding
abbr expan abbr expan choice

Encoding of abbreviations using abbr, including a list of common abbreviations which are not tagged, and treatment of punctuation

Abbreviations are one of several textual features in which two readings are captured: in this case, the abbreviation and its expansion. For purposes of this discussion, we consider abbreviations to include acronyms, contractions, brevigraphs, and any other cases where a shorter form of a word or phrase has been substituted for a longer form.

There is a wide range of textual features which can be loosely classed as some kind of abbreviation: some have become so firmly embedded in the language that they are never seen in expanded form and in fact their expanded form may be completely unfamiliar to readers (for instance, viz. for videlicet). Others have expansions so obvious that providing them explicitly is hardly useful (for instance, the expansion of p. to page). Somewhere in the middle lie the abbreviations whose expansion is likely to be helpful to readers. You should take stock of what kinds of abbreviations you have in your text, and which ones it makes sense to bother encoding. Common abbreviations which will never need expanding and whose meaning is clear to a modern reader (such as Mr., Dr., etc., NB, Ave., can’t, and so forth) probably need not be encoded at all. In very early texts, the use of macrons and other marks to indicate omitted letters or contractions should be encoded as abbreviations to permit consistent searching. (See Encoding brevigraphs for more detail.)

In general it makes sense to encode abbreviations at the level of granularity at which they will be expanded: in other words, expand an entire abbreviation as a unit rather expanding its component pieces. So for instance the word fo’c’s’le should be encoded (in P5) as

<choice>
<abbr>fo’c’s’le</abbr>
<expan>forecastle</expan>
</choice>

rather than

fo
<choice>
   <abbr>‘</abbr>
   <expan>re</expan>
</choice>
c
<choice>
   <abbr>’</abbr>
   <expan>a</expan>
</choice>
s
<choice>
   <abbr>’</abbr>
   <expan>t</expan>
</choice>
le

Abbreviations in P4

In P4, the TEI provides two choices for encoding such features, one which foregrounds the abbreviated form and one which foregrounds the expanded form. The two examples below show exactly the same information, but with a different textual emphasis:

I spoke to the <abbr expan="Duchess">D.</abbr> this morning.

I spoke to the <expan abbr="D.">Duchess</expan> this morning.

For most text encoding projects dealing with early printed texts, it is quite reasonable to give primacy to the source text as in the first case, and to treat it as an archival text to be presented unchanged to the reader. The function of the encoding is thus to provide elucidation in cases where the abbreviation is so obscure that a modern reader will be stumped, or where providing an expansion will allow for much better retrieval.

For some projects, notably documentary editing projects and others whose goal is to present a readable text to the reader (with the source text as a backup in case of questions), it may make more sense to treat the encoding as a regularization process as in the second case, and to use expan to foreground the reader-friendly reading, with the abbr attribute carrying the source reading in case it is needed.

The practical difference between these two approaches is slim; most search engines can be configured to read the value of attributes by default (e.g. to allow searching on the expan attribute of abbr). Similarly, it is not difficult with stylesheets to display either the content or the attribute value, so both readings can be made equally accessible and the choice may be left to the reader.

Abbreviations in P5

In P5, a new element called choice groups together both the abbr and expan elements, giving them equal weight in the encoding (see example 2). As in P4, either reading (or both) may be chosen for display, searching, analysis, and so forth. Readings enclosed within the choice element are taken to be mutually exclusive. The advantage of the choice approach is, first of all, that it allows either reading to contain further markup, such as encoding of rendition, language, or typographical errors. This was not possible in P4, because markup cannot be included within an attribute value. In addition, the use of choice allows for multiple expanded readings (as in example 4 below).

Examples

Example 1 (P4)

<abbr expan="Her Royal Highness">HRH</abbr>

Example 2 (P4)

The same text encoded to place emphasis on the expanded reading:

<expan abbr="HRH">Her Royal Highness</expan>

Example 3 (P5)

<choice>
   <abbr>HRH</abbr>
   <expan>Her Royal Highness</expan>
</choice>

Example 3 (P5)

<choice>
   <abbr>HRH</abbr>
   <expan>Her Royal Highness</expan>
   <expan>His Royal Highness</expan>
</choice>