Unclear text
Abstract
Handling damaged, unclear, or illegible text, including missing or deleted letters, damage to the original, or unclarity in the reproduction, using sic, del, unclear, supplied, and gap
Encoding Instructions (old P4 version)
The WWP uses the elements specified in Chapter 18 of TEI to encode damaged, unclear, or illegible text. Most of these elements can be nested inside others, in cases where text moves through varying degrees of illegibility, or where deliberate deletions and other forms of obscurity overlap. The possibilities are too many to enumerate here, but when you come upon cases that seem to require them, you can use your own judgement about what is appropriate, based on the role of each of the following individual elements.
sic
Used to indicate printing problems or errors in the original text, and wrapped around the entire word, rather than just a single letter in question. This includes uninked letters and omitted letters. To use sic, you need to be fairly certain that the problem lies in the original. This is quite likely if you see a missing letter and the letters around it are fairly dark (so you know it’s not just the xerox quality); if you are working with a very faint xerox and it’s not certain whether it’s the original or the reproduction that’s at fault, use one of the other elements below.
Example: He wh m you seek is not here.
He <choice> <abbr>wh␣m</abbr> <expan>whom</expan> </choice> you seek is not here
Note the use of the ␣ to explicitly mark the space left between letters.
del
Used to indicate deliberately deleted text which is still legible. Since this element implies an assumption of human involvement, it should only be used when you are fairly certain that the deletion is deliberate and committed by human agency. If there’s just a big blot on the page it would be better to use one of the elements below. If text has also been added to replace the deletion, use add for the added text. If parts of the deletion are unreadable, these can be additionally tagged with gap, with reason="deleted" (see below).
Example: She was such a nasty ungovernable prophet [“ungovernable” crossed out but legible]
She was such a nasty delungovernable/del prophet.
unclear
Used to indicate that a passage of text is partially illegible, and that the tagged text is conjectural; signals the reader to regard the information provided as somewhat uncertain. You should use this where you can be reasonably sure your conjecture is accurate. If you really cannot be sure, use one of the elements below. For this element, we will use the reason attribute but not the cert attribute.
Example (bracketed letter is unclear; not sure whether it’s “these” or “those”):
When she spoke to th[ ]se prophets...
When she spoke to th<unclear reason="flawed reproduction">o/unclearse prophets...
(The choice of reading would be a best guess based on context.)
Values for reason on unclear:
"damaged": for cases where the original page has been damaged in some way (torn, folded, creased)
"obscured": for cases where the page is intact, but the text is obscured or unclear for some reason having to do with the original text (partial deletion, stain on the original page, poorly inked type)
"flawed-reproduction": for cases where the reproduction causes unclarity but we have reason to believe that the original is still fully legible (unclear gutters, edge cut off by xeroxing or filming, darkening which results in a black fog on the page [microfilm or xerox underexposure], an object superimposed on the original when filmed or xeroxed. We should assume that problems lie with the reproduction unless we are fairly sure they are problems with the original.
supplied
Used to indicate that a passage of text is completely illegible, and that the tagged text is supplied by the editor or transcriber based either on pure supposition or from some other source. If from another source (e.g. another copy of the text), this can be indicated using the source attribute. In our case, we would only indicate a separate source if we used evidence from a different copy of the source (i.e. a copy from a different library, not a different reproduction of the same copy). If we check a transcription against the source text, any readings from the source text can be added silently without using supplied.
The values for reason on supplied are more limited than those for gap, because we are supplying text whose accuracy we’re reasonably sure of, so the reader has less of a need for information about the problem. In the case of gap, our explanation is taking the place of what the reader really wants, so it needs to be more informative. The values for supplied are thus designed to let the reader know whether he/she can expect to be able to check the reading by consulting the original or another reproduction of it (in the case of “illegible” and “flawed-reproduction”) or whether the original itself is compromised.
The value “illegible” is used instead of “obscured” (which is used for unclear) because in the case of unclear the text is not fully illegible; “obscured” is intended to indicate that there is a diminished degree of visibility and confidence in the reading, while “illegible” indicates that the text cannot be read at all (hence text is being supplied from elsewhere or upon supposition).
Values:
"damaged": use this value where the physical text is damaged (torn, folded, burned)
"illegible": use this value where the page is intact but the original text is illegible for some reason other than damage (e.g. page is illegibly stained, letter is uninked, a bug was squashed on the page)
"flawed-reproduction": use this value where the text is illegible because of some problem with the reproduction technology (blurring, gutter problems, edge of page not copied)
gap
Used to indicate that a passage of text is completely illegible or omitted, and that no conjecture is being made about the omitted material. This is an empty element. It could be used where text is illegibly deleted or obscured, where pages are torn out or cut off by the xerox, or even where they are folded under or creased. In cases where our reproduction is at fault, we would use gap to indicate places which need to be checked, and would try to get a better reproduction which would eliminate the illegibility. See the entry on gap for the appropriate attribute values.