<?xml version="1.0"?>
<?xml-stylesheet type="text/css" href="../stylesheets/yaps-tei.css"?>
<?oxygen RNGSchema="../schema/yaps.rnc" type="compact"?>
<?oxygen SCHSchema="../schema/yaps.sch"?>
<TEI xmlns="http://www.wwp.brown.edu/ns/yaps/1.0" xmlns:xi="http://www.w3.org/2001/XInclude" version="5.0">
  <teiHeader>
    <fileDesc>
      <titleStmt>
        <title>Overview of Descriptive Markup and the TEI</title>
        <author>Julia Flanders</author>
      </titleStmt>
      <xi:include href="./boilerplate_publicationStmt.xml">
        <xi:fallback>
          <publicationStmt status="restricted">
            <note type="auto">WARNING: XInclude processing failed &#x2014; this file should not be copied or
            used (and is invalid) as a result.</note>
          </publicationStmt>
        </xi:fallback>
      </xi:include>
      <sourceDesc>
        <p>This is the source.</p>
      </sourceDesc>
    </fileDesc>
    <revisionDesc>
     <change who="#jflanders.lfw" when="2008-05-25">updated example comparing procedural and descriptive markup</change>
      <change who="#jflanders.lfw" when="2008-02">removed references
      to P4, removed the historical detail, and implemented changes
      from the seminars version concerning the description of TEI as a
      language.</change>
      <change>Changed details of P5
      release, and emphasis of P4 vs P5</change>
      <change>added more detail on descriptive markup</change>
      <change>automatically converted
        from presentation.odd conforming to yaps.odd conforming
        using p2y.xslt and p2y.perl</change>
      <change>removed SGML &amp; P5
        -specific slides</change>
      <change>Added slides on What is
        XML and more on P5. </change>
      <change>Added more detail on
      TEI</change>
    </revisionDesc>
  </teiHeader>
  <text>
    <presentation>

      <section>
        <head>Motives for text encoding</head>
        <slide>
          <list>
            <item>To store information for the long term</item>
            <item>To analyse information</item>
            <item>To share information</item>
          </list>
        </slide>
        <lectureNote>
          <p>To fulfill the first goal, all you need is something
            that's platform independent, human-readable: XML for instance<list><item>doesn't matter how detailed, what kind of
              markup</item></list></p>
          <p>To fulfill the second goal, you need more than this:
            you need an adequately detailed markup system: a system
            that can capture the kind of information you are
            interested in, and enable the kinds of things you plan
            to do with your data in the future: in other words, make
            it worth your while to store information in the long
            term</p>
          <p>To fulfill the third goal, you need more than this: you
            need a markup system that is shared by other people, who
            agree to use it in the same way you do<list><item>for this, you need some sort of infrastructure
                for developing and maintaining the markup system and
                even more importantly its documentation, so that
                people who want to use it have a place to go find
                it, learn about it.</item><item>you might be able to come up with a perfectly
                good encoding system all by yourself; if you lived
                on a desert island, you wouldn't have any motive to
                do otherwise</item><item>but insofar as text encoding is a
                community-oriented activity, inventing your own
                system from scratch can be a very solipsistic
                activity</item></list></p>
          <p>This is why the TEI exists: because in order to share
            information usefully, you need something that functions
            like a standard.</p>
        </lectureNote>
      </section>

      <section>
        <head>What is Descriptive Markup? <lb/>Why is it Important?</head>
        <slide>
          <p>Foundational assumptions in representing documents: <list><item>Presentation derives from structure and function</item><item>Markup should identify structure (primary)</item><item>Stylesheets produce presentation
              (secondary)</item></list></p>
        </slide>
        <lectureNote>
          <p>In this class we're going to be focusing on a
            particular kind of markup, often called
              <soCalled>descriptive markup</soCalled>
            <list><item>extremely important development in the history of electronic document management</item><item>a long and interesting history of debates about its real nature, what to call it</item><item>but at a simple level, there are a few important
                premises</item></list>
          </p>
          <p>Essentially, descriptive markup is based on the idea that the best way to represent a document is by describing it; not by giving instructions to a particular system on what to do with it but by saying, in general terms, what each of its parts is.</p>
          <p>Underlying this philosophy is the idea
            that presentation derives from the nature and function of documentary parts:
<list><item>a heading is
            bold because it's a heading</item><item>and therefore in encoding, we should identify the parts of the document's
                 structure and then base our presentation on that
                structure</item></list></p>
          <p>Note that this is a significant departure from earlier kinds of document markup, which served to give instructions to specific processing systems (e.g. typesetting engines) on how to format or process the text.</p>
        </lectureNote>
      </section>

      <section>
        <head>Descriptive versus procedural markup</head>
        <slide>
          <p>Strongly procedural:
<eg>
<![CDATA[<center>Chapter 2: The Marketplace</center>
<block>Dear reader, it was all I could do not to <italic>shout</italic> with 
   delight at my own <italic>savoir-faire</italic> when I saw how easily
   I had made myself at home...</block>]]> 
</eg>
          </p>
         <p>Procedural with a nod towards the descriptive:
          <eg><![CDATA[<block class="head">Chapter 2: The Marketplace</block>
<block class="para">Dear reader, it was all I could do not to <span class="italic">shout</span> with 
   delight at my own <span class="italic">savoir-faire</span> when I saw how easily
   I had made myself at home...</block>]]> 
          </eg>
         </p>
         <p>Descriptive with a nod to the procedural:
          <eg><![CDATA[<head rend="text-align:center">Chapter 2: The Marketplace</head>
<p>Dear reader, it was all I could do not to <emph rend="font-style:italic">shout</emph> with 
   delight at my own <foreign rend="font-style:italic">savoir-faire</foreign> when I saw how easily
   I had made myself at home...</p>]]>
          </eg>
         </p>
          <p>Strongly descriptive with an emphasis on structure:
          <eg>
<![CDATA[<head>Chapter 2: The Marketplace</head>
<p>Dear reader, it was all I could do not to <emph>shout</emph> with 
   delight at my own <foreign>savoir-faire</foreign> when I saw how easily
   I had made myself at home...</p>]]>
          </eg>
          </p>
         <p>Strongly descriptive with an emphasis on the original presentation:
          <eg><![CDATA[<head rend="center">Chapter 2: The Marketplace</head>
<p>Dear reader, it was all I could do not to <emph rend="italic">shout</emph> with 
   delight at my own <foreign rend="italic">savoir-faire</foreign> when I saw how easily
   I had made myself at home...</p>]]>
          </eg>
         </p>
        </slide>
        <lectureNote>
         <p>Descriptive markup, broadly speaking, is about representing a source: 
          <list><item>it looks backward to an original document</item>
           <item>or it may express an original, digital idea</item>
          <item>but either way, it is describing structures and ideas</item>
          </list></p>
         <p>Whereas procedural markup is about giving orders:
         <list>
          <item>It specifies some output, some result</item>
          <item>It always looks ahead to output, never back to a source</item>
         </list>
         </p>
         <p>There's clearly a continuum here: from purely procedural approaches on the one hand (in which the only thing we care about is giving instructions concerning output) to purely descriptive approaches on the other (in which the only thing we care about is the representation of the source</p>
         <p>In accomplishing that representation we may or may not be interested in how the source looked: descriptive approaches may focus on structure or on presentation or both </p>
        </lectureNote>
      </section>

      <section>
        <head>Additional assumptions</head>
        <slide>
          <p>Relationship between structure and presentation is
            consistent (though complex)</p>
          <p>Presentation is functional, not whimsical</p>
          <p>Presentation is variable, structure is constant</p>
          <figure>
            <graphic height="400px" url="./gfx/marguerite_sample.png"/>
          </figure>
          <figure>
            <graphic height="400px" url="./gfx/sq_sample.png"/>
          </figure>
          <figure>
            <graphic height="400px" url="./gfx/english_journal_sample.png"/>
          </figure>
          <figure>
            <graphic height="400px" url="./gfx/new_phytologist_sample.png"/>
          </figure>
        </slide>
        <lectureNote>
          <p>This began as one of the primary tenets of SGML
            encoding: that instead of trying to describe what
            documents look like, it's more powerful and efficient to
            describe their structure, and then control appearance
            afterwards.</p>
          <p>Several underlying assumptions here:<list><item>that the relationship between structure and
                presentation is consistent (even if perhaps complex)</item><item>that presentation is not decorative but
                functional: that is, that it exists to express
                function, not for any other purpose</item><item>that presentation is variable while structure is
                constant (in other words, the structure expresses
                something fundamental about the document while
                presentation expresses something secondary)</item></list></p>
        </lectureNote>
      </section>

      <section>
        <head>Advantages of descriptive markup</head>
        <slide>
          <p>The same data can be reused flexibly: <quote>Build once, use many!</quote></p>
          <p>Presentation can be controlled easily through stylesheets</p>
          <p>We can treat document and its markup as an object of analysis</p>
        </slide>
        <lectureNote>
          <p>These assumptions are pretty much true for the kinds of
            information which were first motivating the development
            of SGML: for instance, technical documentation, legal
            forms, documents generated and used by the military and
            the IRS, all of which needed to be encoded not for
            immediate output, but for long-term storage,
            maintenance, and output in multiple formats (including
            formats that couldn't be foreseen). </p>
          <p>And in cases where they are true, there are obvious
            practical benefits to separating presentation and
            structure, which are probably either familiar or
            self-evident or both<list><item>even in Microsoft Word, you see the attempt
                being made (e.g. styles)</item><item>lets you use the same underlying data with
                multiple presentations</item><item>allows you to change presentation easily through
                stylesheets, etc.</item></list></p>
          <p>There are also conceptual benefits, once you move
            beyond these kinds of prosaic organizational information
            and start to consider humanities texts<list><item>if you mark up the structure, you can treat it
                as an object of analysis: literary analysis,
                historical analysis, rhetorical analysis, linguistic
                analysis, etc.</item></list></p>
        </lectureNote>
      </section>

      <section>
        <head>Some complications</head>
        <slide>
          <p>Relationship between structure and appearance is more
            complex than that...</p>
          <p>Appearance is not purely functional (and yet not merely
            decorative either)</p>
          <p>Distinction between <q>primary/secondary</q> or
              <q>essential/inessential</q> is suspect</p>
        </slide>
        <lectureNote>
          <p>So encoding systems like the TEI (and EAD, DocBook,
            EpiDoc, etc.) all emphasize structural markup that
            identifies the parts of the document by their structure
            rather than their appearance, and even a brain-dead
            renegade like HTML has been steadily moving from its
            initial emphasis on presentation (the <gi scheme="HTML">i</gi> and
              <gi scheme="HTML">font</gi> elements, etc.) to greater structural
            expressiveness, precisely because it turns out this is a
            more sustainable, cost-effective way of doing things.
            QED.</p>
          <p>However, if we reexamine those earlier assumptions in
            light of this new humanities emphasis, they appear much
            more problematic:<list><item>the relationship between structure and
                presentation may not be consistent at all,
                particularly when dealing with older texts</item><item>either by accident/sloppiness/practical
                constraints (such as the need to fit more or less
                onto a given page: think of the setting of
                Shakespearean plays as prose or verse depending on
                available space)</item><item>or because in fact presentation in humanities
                texts may well be decorative rather than (or in
                addition to being) functional: it may exist to
                comment on, complexify, ironize, adorn, or distract
                from the content</item><item>and further, there has been an important line of
                commentary within editorial theory and text encoding
                theory both, arguing that the distinction between an
                "essential/fundamental" content and a
                variable/inessential presentation is false: that in
                fact the presentation and the physical substance of
                the document are constitutive of meaning and
                inseparable from it. </item></list></p>
          <p>And not to mention the fact that even if one regards
            presentation as secondary, for humanities scholars it
            turns out to be a very important secondary indeed: they
            still want to know about how the document looked. </p>
          <p>We&#x2019;re going to talk about renditional markup a bit
            later on; for the moment, we want to sketch out the
            issue so that you can be aware of it as we proceed.</p>
        </lectureNote>
      </section>

      <section>
        <head>Text Encoding is Never Simple</head>
        <slide>
          <p>Text encoding is not simple data entry: it is part of research.</p>
          <p>Text encoding is not neutral or objective.</p>
          <p>Text encoding is a strategic representation of the text.</p>
        </slide>
        <lectureNote>
          <p>Central issues of humanities computing: understanding the
            intersection between technology and humanistic/cultural research</p>
          <p>Important to present this not as a simple act of copying, making a
            digital facsimile:<list><item>instead, think of it as part of the intellectual strategy of
                research</item><item>creating research objects that are of value: whether broad
                or specific, advanced or basic</item></list>
          </p>
          <p>Text encoding fits into this as the chief means of creating textual
            representations: reseach objects which are of interest because of
            their textual information <list><item>not simply the letters and words themselves, but also the
                text&#x2019;s structure and its contents</item><item>text encoding allows the researcher to represent the text in
                complex ways</item><item>and allows the addition of specialized research knowledge as
                well as basic information necessary to elucidate arcane
              texts.</item></list>
          </p>
          <p>As a result, text encoding:<list><item>creates a model of the text: a
                representation that will be used for research purposes</item><item>is a strategic act: it exists to serve
                the specific purposes of its user. It is not a neutral or
                objective process</item><item>is thus discipline-specific: it adds
                certain kinds of information and focuses attention on certain
                kinds of information, and it ignores and eliminates other kinds
                of information.</item></list>
          </p>
          <p>These considerations make text encoding more difficult, but also
            more interesting, both to learn and to perform. </p>
          <p>More difficult: <list><item>because it involves complex analysis and decision-making</item><item>because it involves specialized knowledge of the research
                objects and the audience</item></list>
          </p>
          <p>More interesting: <list><item>because its work is directly implicated in the scholarly
                research that will be performed on the text</item><item>and in fact is in some ways inseparable from it.</item></list>
          </p>
        </lectureNote>
      </section>

      <section>
        <head>What is the TEI?</head>
        <slide>
          <p>In English: <abbr>TEI</abbr> stands for <expan>Text
          Encoding Initiative</expan></p>
          <p>Technically: a standards organization for humanities
            text encoding</p>
          <p>Organizationally: an international membership
            consortium</p>
          <p>Socially: a community of people and projects</p>
          <p>For our purposes: a set of guidelines and XML
            specifications</p>
        </slide>
        <lectureNote>
          <p>Technically: The TEI is a standards organization that
            exists to create, maintain, and disseminate a standard
            for humanities text encoding<list><item>a common language for encoding humanities
                documents of all sorts, typically for research or
                archival purposes</item><item>internationally developed and used</item><item>widely supported and used within the academy,
                libraries, museums, anywhere people have important
                humanities data</item></list></p>
          <p>Organizationally: The TEI is an international
            consortium whose members are institutions that want the
            TEI to continue to exist</p>
          <p>Socially: The TEI is a community of people and projects
            who use text encoding in a wide variety of ways, and who
            communicate with one another about their research and
            the practical problems associated with it.</p>
          <p>The TEI is also, importantly, the set of guidelines and
            XML specifications that make up the TEI Guidelines.<list><item>first published in 1990; a major release in 1994
                (P3) which was the first version to be widely used</item><item>an XML version published in 2001 (P4)</item><item>the latest version is P5, published last November</item></list></p>
          <p>It&#x2019;s important to note that the TEI is not a
            fixed tag set that is written in stone<list><item>it is intended to be customized: both for users
                to select a subset of the TEI that they really need,
                and for users to add elements for particular
                features in their texts</item><item>we will cover customization mechanisms towards the end of the workshop</item></list></p>
        </lectureNote>
      </section>

      <section>
        <head>The TEI Guidelines</head>
        <slide>
          <list>
            <item>Can be applied strictly or loosely</item>
            <item>Can adapt to local conditions</item>
            <item>Designed as a set of modules that can be selected as needed</item>
            <item>Not unlike a human language in some respects</item>
          </list>          
        </slide>
        <lectureNote>
          <p>The TEI Guidelines are a flexible specification:</p>
          <list>
            <item>Not intended to be difficult or burdensome to use</item>
            <item>Not intended to require uniformity from all users: permits local variation of usage</item>
            <item>Intended to be adapted and customized</item>
            
            <item>Not unlike a human language: has idiomatic usage, dialects, local usage</item>
          </list>
        </lectureNote>
      </section>

      <section>
        <head>Areas of Usage</head>
        <slide>
          <list>
            <item>Digital libraries and digital archives</item>
            <item>Literary and cultural materials</item>
            <item>Scholarly editions</item>
            <item>Manuscript collections and descriptions</item>
            <item>Dictionaries</item>
            <item>Language corpora</item>
            <item>Historical documents</item>
            <item>Anthropology and social sciences</item>
            <item>Authoring</item>
            <item>Many other areas...</item>
          </list>
        </slide>
        <lectureNote>
          <list>
            <item>Digital libraries and digital archives</item>
            <item>Literary and cultural materials</item>
            <item>Scholarly editions</item>
            <item>Manuscript collections and descriptions</item>
            <item>Dictionaries</item>
            <item>Language corpora</item>
            <item>Historical documents</item>
            <item>Anthropology and social sciences</item>
            <item>Authoring</item>
            <item>Many other areas&#x2026;</item>
          </list>
          <p>Note as well that the TEI's domain is strongly international, both in the kinds of materials it is used for (Tibetan manuscripts, graphical narratives from pre-Columbian Mexico, Near Eastern stone inscriptions) and for the international membership community it intends to serve </p>
          <p>TEI documentation is being translated into multiple languages: <list>
            <item>Chinese</item>
            <item>French</item>
            <item>German</item>
            <item>Japanese</item>
            <item>Spanish</item>
          </list></p>
        </lectureNote>
      </section>

      <section>
        <head>Diagram of TEI Usage</head>
        <slide>
          <figure>
            <graphic height="600px" url="./gfx/tei_areas.jpg"/>
          </figure>
        </slide>
      </section>

    </presentation>
  </text>
</TEI>
