Brown University    Women Writers Project    Research and Encoding    Training Materials    Document Analysis

This document last updated Thursday, 15-Mar-2007 14:04:17 EDT

Document Analysis Form

Julia Flanders

This document is not for actual use by encoders. Rather, it is to serve as an illustration of the issues involved. In addition, this may not be the most up-to-date version of the document; the most up-to-date version is probably still the Microsoft Word document.

What this is for

In order to be sure that we encode our texts consistently and address their complexities in a well-thought-out way, we do a basic preliminary analysis of each text before we start encoding it. This process helps identify difficult textual issues early and allows us to discuss and research them. It also helps the encoder conceptualize the structure of the text and the relationships between its parts, so as to tag it more accurately and consistently.

This form is a guide to help you think through the preliminary document analysis. As you begin it is expected that you may want to solve many of these questions in consultation with others; as you become more familiar with encoding issues you will be able to take more individual responsibility for developing solutions. At any point, if you come across an interesting encoding problem or issue, or something you'd like to get feedback or clarification on before your presentation, please post to WWPTAG-L.

Basic Information

  • What is the OT #?
  • Who is the Author?
  • What is the Title?

Document Analysis

Basic Structure

Sketch the basic structure of the document (as a tree or a chart or in whatever way makes sense) on the back of the last sheet. Include at least the first three levels of division inside <body>, or more if the structure is complex. This is intended to flush out difficult hierarchical issues and help you develop an encoding strategy before you start encoding.

Issues to think about: What structural components form the basic divisions of the text? What are first-level <div>s, second-level <div>s, and so on? Do we already have a named element, or a type attribute for <div>, that describes them or are they completely anomalous? Do similar <div>s always appear at the same level of the hierarchy, or are they sometimes nested at a different level?

In the case of poetry, analyse the lien groupings carefully and decide which attributes of <lg> will be appropriate. You may need to refer to Paul's document about Line Groups for a full list of possible line groups and how to encode them.

Genre

Identify the genre or genres to which your document belongs (circle all that apply):

  • Biography/Memoir,
  • Conduct Books/Domestic Manuals,
  • Culinary,
  • Diary/Autobiography,
  • Drama,
  • Essay/Nonfiction,
  • Letters,
  • Martyrology,
  • Medical/Scientific,
  • Miscellany,
  • Novel/Fiction,
  • Poetry,
  • Political,
  • Religious,
  • Travel/Foreign,
  • Women's Issues/Gender,
  • Other:
    If none of the literary genres above seems appropriate, identify a genre or
    genres which fits the work:_________.

Physical bibliographic issues

Does the document have pages missing from the source text, illegibility (either in the photocopy or in the source text), or damage? Is it a complete book, xeroxed from start to finish, or is it part of some larger work whose structure may need to be taken into account? Assess the extent and cause of any damage or illegibility and the appropriate treatment.

Check the pagination and collation (the sequence of signatures, as indicated by the signatures at the bottom of the pages) for accuracy or missing sections. If the text was published before 1750 (roughly; check with John), you will need to encode a complete collation, including the title page. It may help to do the collation on paper here first (check with John if you have any questions). If page numbers are missing or out of order, check to see whether the flow of the text is continuous (indicating error in the page numbering) or discontinuous (indicating an error in the printing, binding, or xeroxing).

Title Page

Think about how to encode the various parts of the title page, particularly for Renaissance texts. Issues to consider: how is the title itself divided? What other information is there, and how hsould it be encoded? Consider the content even more than the typography as an indication of how to encode it.

Textual Features

Linking and cross-referencing: does the document contain footnotes, endnotes, side notes (i.e. marginal notes), errata lists, subscriber lists, table of contents, index, internal cross-references, or other referencing mechanisms? Can these be accommodated using ordinary WWP methods, or do they pose any special challenges?

Characters

Does the document contain unfamiliar characters? Any characters or abbreviations which will require expansion (e.g. macrons, etc.)? (You may need to consult Jacque Russom about how these should be expanded).

Handwritten additions

If the document contains any handwriting, you will need to assess whose handwriting it is, if possible, and whether it needs to be encoded. If it does, then you will also need to assess whether it poses any structural problems; does it span across several elements? How will its position be indicated? Is it legible? Does it contain cross-outs or other additional complexities? Does it use any characters which will need special treatment or scholarly interpretation (for instance, letters which might be either capitals or lower case; contractions; marks which are not letters)?

Other Idiosyncrasies

Note any other features of the text that may need special treatment or further research.

The Project | The Texts | Research and Encoding
Contact | Site Index | Brown University