<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/css" href="../stylesheets/yaps-tei.css"?>
<?oxygen RNGSchema="../schema/yaps.rnc" type="compact"?>
<?oxygen SCHSchema="../schema/yaps.sch"?>
<TEI xmlns="http://www.wwp.brown.edu/ns/yaps/1.0" xmlns:xi="http://www.w3.org/2001/XInclude">
  <teiHeader>
    <fileDesc>
      <titleStmt>
	<title>Advanced Markup Concepts</title>
	<author xml:id="JF">Julia Flanders</author>
      </titleStmt>
      <editionStmt>
        <edition>Texas A &amp; M University</edition>
      </editionStmt>
      <publicationStmt>
        <distributor>Women Writers Project (via website)</distributor>
        <address>
          <addrLine>wwp@Brown.edu</addrLine>
        </address>
        <date when="2009-04-18"/>
        <availability status="restricted">
          <p>Copyright 2007 Syd Bauman, Julia Flanders, and Brown WWP</p>
	  <p>This TEI-encoded XML file is available under the terms of
	  the <ref target="http://creativecommons.org/licenses/by-sa/3.0/">Creative
	  Commons Attribution-ShareAlike 3.0 (Unported)</ref>
	  license.</p>
        </availability>
        <pubPlace>Providence, RI  USA</pubPlace>
      </publicationStmt>
      <sourceDesc>
	<p>This is the source.</p>
      </sourceDesc>
    </fileDesc>
    <revisionDesc>
      <change when="2009-02-02" who="#JF">created file from editorial_markup and markup_challenges</change>
    </revisionDesc>
  </teiHeader>
 <text>
  <presentation>
      <section>
          <head>Document Viewed as a Tree...</head>
          
          <slide>
              <figure>
                  <graphic url="../gfx/document_tree.png"/>
              </figure>
          </slide>
          <lectureNote>
              <p>So far we've been dealing overall with a fairly clean and straightforward view of the document, which we can express more or less as a tree structure</p>
          </lectureNote>
      </section>
      <section>
          <head>...plus some other stuff</head>
          
          <slide>
              <figure>
                  <graphic url="../gfx/document_tree_complex.png"/>
              </figure>
          </slide>
          <lectureNote>
              <p>In the real world, however, documents are not really like trees</p>
              <p>Or rather, they are like real trees rather than like mathematical trees: overhung with vines, spiderwebs, tree houses, all sorts of other stuff that connects their branches.</p>
              <p>Gloss the diagram...</p>
          </lectureNote>
      </section>
   <section>
    <head>Parallel texts</head>
   
    <slide>
    <figure>
     <graphic width="100%" url="../gfx/parallel_texts.png"/>
    </figure>
    </slide>
    <lectureNote>
    <p>Parallel structures are one of the most important and common (and interesting) document features</p>
     
     <p>Very important for what scholars do: in a sense, when they work with a text is explore and
      express its plurality of meaning, of textual possibility this is true
      whether they are acting as editors or as critics: whether they're
      preparing a new version of the text for publication, or creating a version
      for their own interpretive use. </p>
        <p>But also more generally useful: to show alignment or comparison</p>
     <p> First, some large-scale examples of this kind of parallelism: <list>
      
      <item>representing a translation</item>
     </list>
     </p>
     <p> The functional goal in creating these kinds of parallel structures is
      to be able to let the reader use the parallel texts to make comparisons,
      see similarities, read the translation against the original, search in one
      language and get results in the other language, etc. </p>
     <p> From the markup standpoint, the essential thing here is to be able to
      represent the alignment of the two texts, but the question of granularity
      (of how fine-grained the alignment is) depends on what kinds of functional
      goals you have </p>
     <p> We're not going to cover the mechanisms in detail, because they're
      complex and tricky, but I want to show you the concept so that if you're
      thinking about doing this you know the fundamentals of what's involved</p>
     
     
     <p>Start with showing alignment issues: <list>
       <item>coarse-grained (text level, stanza level): we can often infer the
        finer-grained alignment automatically (e.g. in verse where both texts
        have the same number of lines)</item>
       <item>fine-grained (line level, potentially below): in cases where more
        specific alignment is needed</item>
       <item>specific alignment doesn't scale up; instead, can use out-of-line
        approach</item>
      </list>
     </p>


    </lectureNote>
   </section>
   <section>
    <head>Encoding parallel structures</head>
    <slide>
     <figure>
      <graphic height="100%" url="../gfx/parallel_short.png"/>
     </figure>
     <eg><![CDATA[<lg type="stanza" xml:lang="fr">
  <l xml:id="fr2.01" corresp="#en2.01">Nos péchés sont têtus, nos repentirs sont lâches;</l>
  <l xml:id="fr2.02" corresp="#en2.02">Nous nous faisons payer grassement nos aveux,</l>
  <l xml:id="fr2.03" corresp="#en2.04">Et nous rentrons gaiement dans le chemin bourbeux,</l>
  <l xml:id="fr2.04" corresp="#en2.03">Croyant par de vils pleurs laver toutes nos taches.</l>
</lg>

<lg type="stanza" xml:lang="en">
  <l xml:id="en2.01" corresp="#fr2.01">Our sins are stubborn, craven our repentance.</l>
  <l xml:id="en2.02" corresp="#fr2.02">For our weak vows we ask excessive prices.</l>
  <l xml:id="en2.03" corresp="#fr2.04">Trusting our tears will wash away the sentence,</l>
  <l xml:id="en2.04" corresp="#fr2.03">We sneak off where the muddy road entices.</l>
</lg>]]></eg>
    </slide>
    <lectureNote>
     <p>Explain linking and pointers</p>
    </lectureNote>
   </section>
   <section>
    <head>More complex parallelism</head>
    <slide>
     <eg><![CDATA[<linkGrp type="alignment">
  <link targets="#fr2.01 #en-a2.01 #en-b2.01 #en-c2.01 #en-d2.01"/>
  <link targets="#fr2.02 #en-a2.02 #en-b2.02 #en-c2.02 #en-d2.02"/>
  <link targets="#fr2.03 #en-a2.03 #en-b2.03 #en-c2.04 #en-d2.03"/>
  <link targets="#fr2.04 #en-a2.04 #en-b2.04 #en-c2.03 #en-d2.04"/>
</linkGrp>]]></eg>
     <figure>
      <graphic url="../gfx/parallel_linkgrp.png" width="100%"/>
     </figure>
    </slide>
   </section>
   <section>
    <head>Textual splitting: parallelism at a more local level </head>
       <slide>
           <figure>
               <graphic url="../gfx/text_stream.png"/>
           </figure>
       </slide>
       <lectureNote>
           <p>Parallelism also exists and may call for representation at a more local level</p>
           <p>In these cases it's more like a temporary forking or divergence in the text: a case of multiple possibilities</p>
       </lectureNote>
   </section>
      
      <section>
          <head>Example</head>
    <slide>
     <figure>
      <graphic height="400px" url="../gfx/askew_sample_small.png"/>
     </figure>
     <eg><![CDATA[<p>...with them, bycause they woulde
  <lb/>not be 
    <choice>
       <abbr>boūde</abbr>
       <expan>bounde</expan>
    </choice> 
  also for an other wo]]><hi rend="css( font-style: italic; font-weight: normal; )">[see below]</hi><![CDATA[
  <lb/>mā at theyr pleasure, whom they
  <lb/>knewe not, nor yet what matter
  <lb/>was layed unto her charge. Not
  <lb/>wythstandynge at the laste, after
  <lb/>moche a do and reasonyng to and
  <lb/>fro, they toke a bonde of them of
  <lb/>recognisaunce for my fourth com
  <lb/>mynge. And thus I was at the
  <lb/>last, 
    <choice>
      <orig>delyuered</orig>
      <reg>delyvered</reg>
    </choice>. 
  Written by me An
  <lb/>ne Askewe.
</p>]]></eg>
     <eg><![CDATA[<choice>
  <abbr>
    <choice>
      <sic>wo<lb/>mā</sic>
      <corr>wo-<lb/>mā</corr>
    </choice>
  </abbr>
  <expan>
    <choice>
      <sic>wo<lb/>man</sic>
      <corr>wo-<lb/>man</corr>
    </choice>
  </expan>
</choice>]]></eg>
    </slide>
    <lectureNote>
<p>In these next examples, we're still looking at parallelism, but instead of managing it through a linking mechanism, we're managing it in a different way: through an enclosing element.</p>
        <p>These examples don't actually violate the ideal document tree view, but </p>
     <p>This approach is useful for smaller and more local examples of parallel text. There are a number of kinds of local editorial changes that are often
      made in the process of transcription and editing: processes of
      regularization and correction that are often done silently and noted in an
      introduction: <list>
       <item>correction of typographical errors in the source</item>
       <item>regularization or modernization of spelling and typography</item>
       <item>expansion of abbreviations</item>
      </list>
     </p>
     <p> In print-based editing, these choices are exclusionary: whichever kind
      of reading you decide to show the reader, its complementary version has to
      be suppressed (it could be indicated in a note or an appendix but it can't
      typically be displayed as part of the regular reading surface) </p>
     <p>In an XML transcription, however, it's possible to represent both (or in
      principle multiple) readings in a data structure that shows their
      parallelism and treats them as alternatives, which can then be chosen
      (displayed, searched, etc.) when desired. </p>
     <p> In TEI, this mechanism is the <gi>choice</gi> element, which represents
      a moment of textual forking, where instead of a single reading the text
      offers a choice of readings </p>
    </lectureNote>
   </section>
    
   <section><head>Notes and Cross-References</head>
    <slide>
      
        <eg><![CDATA[<p>The <name xml:id="anchor01">Nopal</name>, or Prickly 
Pear, which you may observe in the Mexican coat of arms, is a 
very interesting and valuable production of Mexico (see 
<ref target="#ch4">chapter 4</ref> below). In some districts 
of the upper country, it grows in great abundance, and forms, 
in places impenetrable thickets, higher than a man on horseback. 
This plant produces an immense quantity of fruit, which, together 
with the young leaves, furnishes food for vast herds of cattle 
and wild horses. On this account, the Mexicans, when selecting 
land for a stock farm, always choose that which has a good proportion 
of the Nopal.</p>
      ...
<note type="editorial" target="#anchor01">Opuntia ficus-indica</note>]]></eg>
    </slide>
    <lectureNote>
        <p>Another important form of structural complexity: a hypertextual sprout or fork or jump in the textual stream
        <list>
            <item>For example, a footnote: which sprouts off from the text at a certain point</item>
            <item>Or an endnote, where there's in effect a cross-reference from a place in the text to a subsequent explanation</item>
        </list>
        
        </p>
     <p>In TEI, all types of annotations are encoded using the <gi>note</gi> element
       these can be classified: to indicate responsibility, to indicate
        what kind of note (using any classification system that seems useful:
        e.g. annotation, correction, hypothesis, context, gloss, etc.)
     </p> 
        <p>Cross-references and other kinds of references are encoded with <gi>ref</gi>; NB that the target can point inside or outside the document.</p>
    </lectureNote>
   </section>

      <section><head>Transcriptional complexities: revision</head>
          <slide>
              <figure>
                  <graphic width="100%" url="../gfx/ms_original.jpg"/>
              </figure>
              <eg><![CDATA[<lg>

  <head>After <subst><del>an</del><add>the</add></subst> 
    <del><add>unsolv'd</add></del> argument</head>

  <l><del>The</del><add><del>Coming in,</del> A group of</add> little children, and their
    <lb/>ways and chatter, flow in <del>upon me</del></l>

  <l>Like <add>welcome</add> rippling water o'er my
    <lb/>heated <add>nerves and</add> flesh.</l>
    
</lg>]]></eg>
             
              
          </slide>
          <lectureNote>
              <p>What's at stake here: because the transcription of manuscript materials (and often printed texts as well)
                  involves significant efforts of decipherment and in many cases conjecture
                  or interpretation, and also because primary sources are informationally complex
                  (authorial revision, erasures, missing letters, illegible passages, etc.),
                  a responsible transcription needs to capture not just the end product but
                  also information about the process and the editorial decision-making: not
                  just produce a clean-looking innocent butter-wouldn't-melt-in-its-mouth
                  transcription but preserve information about what was difficult or unclear</p>
              <p>conventions for accomplishing this are familiar from print: carets and
                  brackets for marking insertions and deletions, italics to indicate unclear
                  text, footnotes to indicate hypothetical readings or to describe damaged
                  sections</p>
              <p> In text markup, the goal is to formalize as much of this information as
                  possible and represent it systematically <list>
                      <item>to classify the reasons for illegibility (where possible), to
                          formalize the rationales for determining whether a given letter is
                          illegible or simply unclear</item>
                      <item>with the goal of making it possible to control the display of the
                          reading surface of the text: to show or hide the deleted words and
                          hypothetical readings, perhaps even to let the reader control the
                          threshold of conjecture at which readings are displayed or hidden ("only
                          show me things you're really certain about")</item>
                  </list>
              </p>
              <p>Next: show basic encoding features: unclear, supplied, gap, add, del</p>
          </lectureNote>
      </section>
      <section>
          <head>Transcriptional complexities: difficult or impossible to read</head>
          <slide>
              <figure>
                  <graphic url="../gfx/ms_receipt.jpg" height="200px"/>
              </figure>
              <eg><![CDATA[<p>Johnston etc 1764 Mr Nikl<unclear>e</unclear><supplied>s</supplied>
  <gap reason="folded" extent="unknown"/> Brown <unclear>&amp;Co</unclear> to me 

<lb/>George <unclear>Beverly juner</unclear> to ten Rum Barels at Four 

<lb/>pound &per; Barel — — —  £40</p>]]></eg>
          </slide>
      </section>
      <section>
          <head>Editorial Nuance</head>
          <slide>
              <p>
                  <tag>add hand="#Walt_Whitman" place="supralinear"</tag></p><p>
                  <tag>del hand="#Walt_Whitman" rend="crossout"</tag>
              </p>
              <p>
                  <tag>gap reason="folded" extent="unknown" resp="#Julia_Flanders"</tag></p><p>
                  <tag>supplied reason="mildew" cert="high" evidence="internal" resp="#editor"</tag></p><p>
                  <tag>unclear reason="waterspots"</tag>
              </p>
              
          </slide>
      </section>

<section>
    <head>Overlap</head>
    <slide>
       
        <figure><graphic url="../gfx/document_tree_complex.png"/></figure></slide>
    <lectureNote> <p>Our next topic: structures that don't fit into the tree at all...</p>
             
              <p>As we've already noted, overlapping structures are a potential problem for all XML encoding</p>
              <p>What's interesting for us here, particularly, is to note that many of
                  the classic cases of overlap happen around material features of the text:
                  in fact, one of the reasons we include <soCalled>materiality</soCalled> as
                  a <soCalled>challenge of markup for scholarship</soCalled> is that
                  <emph>as an information structure</emph>, as something to be represented,
                  materiality so often cuts across the grain of textuality, as we see in the
                  list of examples here.</p>
              <p>Particularly common when encoding extant older texts: <list>
                  <item>the encoder does not control structure</item>
              </list></p>
              <p>generally less common when creating documents</p>
              <p>There are a variety of approaches to representing these kinds of
                  overlapping structures in TEI: <list>
                      <item>sometimes you just have one option</item>
                      <item>sometimes there are a few different ways to approach the problem</item>
                      <item>we'll look at the various possibilities; some we've already seen
                          without realizing it</item>
                  </list>
              </p>
              
          </lectureNote>
      </section>
      <section>
          <head>Empty elements used as milestones</head>
          <slide>
              <!--      <figure>
                  <figDesc>[Need a graphic here illustrating what milestones do]</figDesc>
                  </figure>  -->
              <eg><![CDATA[<pb n="249"/>
<milestone unit="sig" n="R5r"/>
<lb/>digested. Its long trunk, as seen slanting down from
<lb/>out of the building across the wharf and into the ship,
<lb/>is a mere wooden pipe; but this pipe is divided within.
<lb/>It has two departments; and as the grain-bearing 
<lb/>troughs pass up the one on a pliable band, they pass
<lb/>empty down the other. The system therefore is that
<lb/>of an ordinary dredging machine; only that corn, and
<lb/>not mud is taken away, and that the buckets or 
<lb/>troughs are hidden from sight. Below, within the
<lb/>stomach of the poor bark, three or four labourers are
<lb/>at work, helping to feed the elevator. They shovel
<lb/>the corn up towards its maw, so that at every swallow
<lb/>he should take in all that he can hold...
<lb/>...The transit of the bushels 
<lb/>of corn from the larger vessel to the smaller will have
<lb/>taken less than a minute, and the cost of that transit
<lb/>will have been—a farthing.</p>
<pb n="250"/>
<milestone unit="sig" n="R5v"/>]]></eg>
          </slide>
          <lectureNote>
              <p>Simplest option: instead of encoding the feature by enclosing it in an
                  element, instead just mark its boundaries with empty elements</p>
              <p>The most common case of this is with milestone elements: <list>
                  <item>Elements that divide the text into segments according to some
                      system: pages, columns, lines</item>
                  <item>works perfectly for an information structure which is completely
                      flat and divides up the whole text into parts: page breaks, signatures,
                      reels of a movie</item>
                  <item>i.e. there's nothing in the text that isn't on some page; there's
                      nothing in a paragraph that's not on some line</item>
                  <item>in these cases, you mark the boundaries between segments, so each
                      boundary element marks the end of one segment and the start of the next.</item>
                  
              </list>
              </p>
          </lectureNote>
      </section>
      <section>
          <head>Empty elements used as endpoints</head>
          <slide>
              <!--     <figure>
                  <figDesc>[Need a graphic here illustrating what endpoints do; maybe a
                  separate slide with image plus encoding showing long deletion or
                  addition]</figDesc>
                  </figure>
              -->
              <eg><![CDATA[<p>...for the elevator is an amphibious insti-
<lb/>tution, and flourishes only on the banks of navigable
<lb/>waters. When its head is ensconced within its box,
<lb/>and the beast of prey is thus nearly hidden within
<lb/>the building, the unsuspicious vessel is brought up
<lb/>within reach of the creature's trunk, and down it
<lb/>comes, like a mosquito's proboscis, right through the
<lb/>deck, in at the open aperture of the hold, and so into
<lb/>the very vitals and bowels of the ship. 
     <delSpan spanTo="#spanEnd01"/>When there,
<lb/>it goes to work upon its food with a greed and
<lb/>avidity that is disgusting to a beholder of any taste
<lb/>or imagination.</p>
<p>And now I must explain the anatomical
<lb/>arrangement by which the elevator still
<lb/>devours and continues to devour, till the corn within
<lb/>its reach has all been swallowed, masticated, 
and digested.<anchor xml:id="spanEnd01"/></p>]]></eg>
              <eg><![CDATA[<addSpan spanTo="#addEnd01"/>
<p>An elevator is as ugly a monster as has been yet
<lb/>produced. In uncouthness of form it outdoes those
<lb/>obsolete old brutes who used to roam about the semi-
<lb/>acqueous world, and live a most uncomfortable life
<lb/>with their great hungering stomachs and huge un-
<lb/>satisfied maws. The elevator itself consists of a big
<lb/>moveable trunk,—moveable as is that of an elephant,
<lb/>but not pliable, and less graceful even than an ele-
<lb/>phant's. This is attached to a huge granary or barn...</p>
<anchor xml:id="addEnd01"/>]]></eg>
          </slide>
          <lectureNote>
              <p>But in addition there are other cases where it's handy to be able to
                  mark the ends of an element at arbitrary places, rather than having to fit
                  the element neatly into the document hierarchy <list>
                      <item>classic example is additions and deletions: authors often add large
                          chunks of stuff, or delete parts of things that don't match the textual
                          structure</item>
                  </list>
              </p>
              <p>For these, as we saw briefly yesterday, we can mark them much more
                  effectively by putting an empty element at each end, sort of like marking
                  the boundaries of an impromptu soccer field by putting your shoes at each
                  end</p>
              <p>Then create a link between the two, using the pointing system we talked
                  about yesterday...</p>
          </lectureNote>
      </section>
      <section>
          <head>Fragmentation</head>
          <slide>
              <!-- 
                  <figure>
                  <figDesc>[Need graphic here showing source text; also need graphic showing
                  how fragmentation works]</figDesc>
                  </figure>  -->
              <eg><![CDATA[<sp>
   <speaker>Leo.</speaker>
   <l part="F">Go on, go on:</l>
   <l>Thou canst not speake too much, I have deserv'd</l>
   <l part="I">All tongues to talk their bittrest.</l>
</sp>
<sp>
   <speaker>Lord.</speaker>
   <l part="F">Say no more;</l>
   <l>How ere the business goes, you have made fault</l>
   <l part="I">I'th boldnesse of your speech.</l>
</sp>
<sp>
   <speaker>Pauline.</speaker>
   <l part="F">I am sorry for't;</l>
   <l>All faults I make, when I shall come to know them</l>
   <!-- ... -->
</sp>]]></eg>
          </slide>
          <lectureNote>
              <p>Take what is logically a single content object <list>
                  <item>encode it as multiple separate XML elements</item>
                  <item>indicate that each XML element is only a
                      <soCalled>partial</soCalled> element</item>
                  <item>optionally have each partial element indicate which is the next
                      piece of the whole content object</item>
              </list></p>
              <p>TEI provides 2 methods for doing this; the first is the part=
                  attribute... </p>
              <p>The <att>part</att> attribute can be used for
                  <soCalled>serial</soCalled> cases: <list>
                      <item>all fragments are in sequential order</item>
                      <item>no intervening occurrence of same element type that is
                          <emph>not</emph> part of the aggregate element</item>
                      <item>e.g., good for <gi>l</gi> but sometimes not <gi>q</gi></item>
                      <item>available on <gi>l</gi>, <gi>lg</gi>, <gi>div</gi>, <gi>seg</gi>,
                          <gi>ab</gi>, <gi>s</gi>, <gi>cl</gi>, <gi>phr</gi>, <gi>w</gi>,
                          <gi>m</gi>, <gi>c</gi></item>
                  </list>
              </p>
          </lectureNote>
      </section>
      <section>
          <head>Another approach to fragmentation</head>
          <slide>
              <eg><![CDATA[Mortal, she said, "I'm sent to you,
Then hold my precepts fast;
Remember earth's best joys are few,
And can't for ever last."]]></eg>
              <eg><![CDATA[<lg type="stanza">
  <l>Mortal, she said, <said xml:id="s01" next="#s02">I'm sent to you,</said></l>
  <l><said xml:id="s02" next="#s03" prev="#s01">Then hold my precepts fast;</said></l>
  <l><said xml:id="s03" prev="#s02" next="#s04">Remember earth's best joys are few,</said></l>
  <l><said xml:id="s04" prev="#s03">And can't for ever last.</said></l>
</lg>]]></eg>
          </slide>
          <lectureNote>
              <p>The <att>next</att> and <att>prev</att> attributes can be used for any
                  cases: <list>
                      <item>available on <emph>every</emph> element when additional tagset for
                          segmentation &amp; alignment is used</item>
                      <item>each fragment must bear either <att>next</att> or <att>prev</att></item>
                      <item>probably better if each fragment bears both</item>
                  </list>
              </p>
              
          </lectureNote>
      </section>

  </presentation>
 </text>
</TEI>
