Encoding of quotations, distinction between use of said and quote, treatment of quotation marks
The WWP uses said to encode direct speech and reported thought. We use quote to encode material which a passage of text identifies as originating outside of itself (regardless of where the material actually originates), or which a speaker within the text identifies as originating outside of his/her current utterance, including proverbs, mottoes, sayings, etc. quote can carry a cit attribute if the source of the quotation is known.
The WWP does not use the direct, who, or type attribute on said.
When it is convenient to break a single quotation into multiple XML elements (to avoid overlap with other XML elements, e.g. verse lines), we identify each segment with an xml:id and use next and prev. We only use next and prev in cases where the quote or said elements are artificially broken to avoid overlap, not in cases where a quotation is interrupted by the text itself (for instance, with she said or other interventions).
If a text quotes itself, or if a character within a text quotes material from elsewhere within the text (for instance, a poem by another character), the same rules apply: the quoted material comes from outside of the current utterance, or outside of the current passage, and hence is encoded with quote. If it is also direct speech, it should be encoded with said as well.
Quoted speech in our corpus may be marked in a number of ways, or may even be left unmarked. In some cases this makes it difficult to be certain where a given quotation begins and ends. In addition, the conventions for signalling direct and indirect speech have changed over the centuries and our corpus contains transitional forms which may be hard to assign to one category or the other. Our strategy for deciding what instances of quoted speech to encode can be summed up as follows:
1. Encode all quoted speech which is renditionally distinct, regardless of whether it is direct or indirect speech. Rendition in this case includes the use of quotation marks, as well as the use of distinctive fonts (all caps, small caps, italics, black letter).
2. Also encode all instances of direct speech, whether renditionally distinct or not. Direct speech here means any speech which occurs in the first person singular or plural.
These two conditions can be expressed in a little table:
INDIRECT DIRECT
REND encode encode
NO REND don’t encode encode
3. Thus, examples in which the only indication of a quote is a phrase such as she said, without any renditional mark or any other clue in the text (such as a shift to the first person) should not be encoded using said. For example, in the sentence “She said that she would never taunt the chicken again” no said would be necessary.
4. In examples in which we are not sure exactly where the quoted material begins or ends, we encode the minimum text about whose quotedness we are certain. The rationale for this approach is that people who wish to find all instances of quoted speech so that they can compare them to verbal patterns present in non-quoted speech will want to know for sure that the material tagged with said is all quoted material. Also, people who are trying to locate all instances of quoted speech so that they can look at them with their eyes can still find these minimum-extent quotes, and then decide whether there is additional material to be considered. Since most quoted material is actually not uncertain, searching functions and so on will on the whole be supported.