Encoding Guide for Early Printed Books

document analysis schema

Classification and Naming

As indicated in the section on dividing the text, classification and naming of textual structures are important functions performed by the encoding of the text. If naming is considered as the basic identification of specific features in the text, classification is the process of grouping these more carefully by type and assigning a controlled vocabulary to the classes, using terms that represent our understanding of the relevant parts of the text.

These terms might reflect the terms used by the text itself (A Letter to the Reader, An Epistle to the Reader, Address to the Reader). Naming the divisions in this way provides some descriptive information—it could be used to create a list, for instance, of all the different kinds of sections you had named in the text. But to the extent that it is informal and unconstrained, it does not provide much analytical power. For instance, it does not provide a way of indicating that all three of these sections serve a very similar function in the overall structure of the text. It also allows for inconsistency and the proliferation of meaningless distinction, since values epistle, epistolary, and Epistle are all equally permitted.

Classification involves deciding on a formal naming convention for the structural elements in your text, based on whatever scheme is most appropriate given the nature of your interest in the text. You might decide to use terms from a standard genre thesaurus, or those used by another project with whom you will be sharing materials; you might decide to use a regularized version of the terms used by the text itself (eliminating near-duplicates such as plurals or spelling variations); you might create a taxonomy that represents analytical categories that are significant in your analysis.

The resulting controlled vocabulary of values for the type attribute of div has several very important functions. First, it can be added to your schema (using the TEI customization mechanism). This makes it possible to constrain your encoding so that only permitted values are used, eliminating inconsistencies and casual errors. It also makes it possible to document the meaning and range of application for each value, so that your encoders or colleagues can use them consistently, and so that collaborators can identify commonalities and understand the meaning of your classification. It can also help support the interface you may build for readers, for instance by providing a list of search terms rather than leaving it for the reader to guess what kinds of genres your texts contain.

This means that a very important task, early in the project planning, is to examine the texts to be encoded and decide on what the significant subdivisions are and what terminology you will use to describe them. These decisions can and should be revisited as you progress, since you may well encounter unforeseen textual structures.