Punctuation: general

Abstract

Transcription of punctuation, including treatment of hard and soft hyphens

transcription punctuation hyphenation soft hyphen delimiter

In general, the WWP transcribes punctuation using standard keyboard characters. However, where punctuation is used as a delimiter which might be altered with changes in display, as for instance in connection with a speaker’s name in a dramatic text, then it is encoded using an entity reference within a rend attribute:

<speaker rend="post(&colon;)">Hamlet/speaker

Hyphens are encoded using a standard keyboard character for hard hyphens, and using the standard ISO entity reference &shy; for soft hyphens. Hard hyphens are those which remain in the text regardless of where the line breaks occur, as for instance in hyphenated names such as DeBoer-Langworthy. Soft hyphens are those which result from line breaks, and which would disappear if the text were relineated, as for instance in the word “hy-phen” if it was broken across two lines. In cases where it is difficult to tell whether a line-end hyphen is soft or hard, the encoder will need to look through the rest of the text for other instances of the word in question. If it seems more likely to be a hard hyphen, it should be encoded as

<unknown desc="&shy;">-/unknown

whereas if it is more likely to be a soft hyphen , it should be encoded as

unknown desc="-?"&shy;/unknown

Hyphens in catchwords should always be encoded as hard hyphens, since a catchword is never involved in relineation (not being part of a line), and so will never need to be unhyphenated.

The WWP does not attempt to distinguish the different linguistic functions of the various marks of punctuation (for instance, periods as abbreviation marks versus sentence delimiters).

The WWP ignores instances where punctuation may appear in different font or size from the surrounding text. We will not highlight any marks of punctuation whose position inside or outside an element will give them a font different from what is indicated in the OT.