<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/css" href="../stylesheets/yaps-tei.css"?>
<?oxygen RNGSchema="../schema/yaps.rnc" type="compact"?>
<?oxygen SCHSchema="../schema/yaps.sch"?>
<TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt>
				<title>Descriptive Markup</title>
				<author xml:id="JF">Julia Flanders</author>
			</titleStmt>
			<publicationStmt>
				<distributor>Brown University Women Writers Project</distributor>
				<address>
          <addrLine>Julia_Flanders@Brown.edu</addrLine>
        </address>
				<date value="2007-03-16"/>
				<availability status="free">
					<p>Copyleft 2006 Julia Flanders and Brown WWP</p>
				</availability>
				<pubPlace>Stanford University</pubPlace>
			</publicationStmt>
			<sourceDesc>
				<p>This is the source.</p>
			</sourceDesc>
		</fileDesc>
		<revisionDesc>
			<change date="2007-03-14" who="#JF">made new version based on old slides for use with NEH
				seminars: removed TEI materials and expanded discussion of markup</change>
			<change date="2006-03-13" who="#SB">automatically converted from presentation.odd conforming to
				yaps.odd conforming using p2y.xslt and p2y.perl</change>
			<change date="2006-03-02" who="#JF">removed SGML &amp; P5 -specific slides</change>
			<change date="2006-02-13" who="#JF">Added slides on What is XML and more on P5. </change>
			<change date="2006-03-05" who="#JF">Added more detail on TEI</change>
		</revisionDesc>
	</teiHeader>
	<text>
		<presentation>
			<section>
				<head>What is text encoding?</head>
				<slide>
					<list>
						<item>Thick description?</item>
						<item>Scholarly editorial analysis?</item>
						<item>Formalization?</item>
						<item>Commentary?</item>
					</list>
				</slide>
				<lectureNote>
					<p>I'd like to start by situating the activity of text encoding in intellectual space: <list>
							<item>From the viewpoint of the humanities scholar, text encoding looks as if it's coming
								over from computer science: as an activity that takes place on computers and requires some
								technical knowledge (of software, of data standards, of encoding languages)</item>
							<item>in fact, rather than likening a text encoder to a computer programmer, I want to draw
								some different lines of connection</item>
							<item>first, with an anthropologist or ethnographer somewhat in the tradition of Clifford
								Geertz: the text encoder is an observer and documenter of the textual world, and the
								encoding he/she produces has (at least potentially) something of the quality of a
									<quote>thick description</quote>: a contextualized, interpretative account of the details
								of the textual landscape.</item>
							<item>Another affiliation: the text encoder is also very much like a critical editor,
								creating an analytical representation of the text which provides systematic, expert
								knowledge about it</item>
						</list>
					</p>
					<p>Text markup sounds like a technical concept, but like so many things it is a more basic idea
						that has come to our attention because technology and resulting media shifts make us aware of
						it</p>
					<p>In fact it is an expression of motives and practices that have been around for a long time.</p>
					<p>Markup is several things: <list>
							<item>a way of formalizing and externalizing the structures in a text</item>
							<item>a way of adding further information to the text that interests us</item>
							<item>a meta-text that comments on, interprets, or extends the meaning a text</item>
						</list>
					</p>
					<p>Note that markup and text encoding are essentially the same thing, for our purposes. </p>
				</lectureNote>
			</section>
			<section>
				<head>Text encoding in the ancient world</head>
				<slide>
					<figure>
						<figDesc>Sample of scriptio continua</figDesc>
						<graphic height="600px" url="./gfx/sinai.jpg"/>
					</figure>
				</slide>
				<lectureNote>
					<p>This example of scriptio continua lacks word breaks, which are a very basic form of text
						markup, but it does have other kinds of markup: the line positioning, and the differently
						colored inks</p>
					<p>The markup serves to granularize the flow of language and allow the reader to parse its
						significance more easily</p>
					<p>If we think of written language as a secondary derivation from oral language, this kind of
						markup is the written equivalent of the pauses and inflections that make spoken words
						comprehensible (both by separating them from one another and by giving them additional
						emphasis or coloration): in other words, punctuational markup represents an authentically
						linguistic level of meaning</p>
					<p>If we think of written language as a separate sign system, then the markup would seem more
						like an indigenous part of the formalism of written language: part of its distinctive
						expressiveness</p>
					<p>Either way, this kind of markup becomes an important part of our apprehension of the
						document: as manuscript and printing practices become more nuanced, the structured visual
						presentation of documents becomes a way of conveying much more nuanced information about the
						structure and semantics of the documents</p>
				</lectureNote>
			</section>
			<section>
				<head>Text encoding in the early modern world</head>
				<slide>
					<figure>
						<figDesc>Sample of 17th-century dictionary</figDesc>
						<graphic height="600px" url="./gfx/blount_dictionary.jpg"/>
					</figure>
				</slide>
				<lectureNote>
					<p>By the time printing comes to the ascendant, the visual markup of the page has become highly
						formalized, and in addition the structures of printed texts have become significantly
						codified, so that more complex reading practices are possible: <list>
							<item>reading practices that take advantage of this codification to gather more meaning, more
								efficiently and transparently, from the page</item>
							<item>reading in which the semantics of specific formal components of the page actually
								determines the significance of words</item>
							<item>For instance, in this example of an early dictionary, the formatting tells us that
								particular components of the page have a specific function: the heading that locates us
								within the alphabet, the headword, the definition</item>
							<item>In other words, the markup represents and makes visible the intellectual structures of
								the text—or, putting it the other way round, the text is granularized as a set of formal
								structures that are made apparent to the reader through the visual markup</item>
							<item>Obviously not all of the formatting works towards this end: some can be thought of
								almost as a visual surplus: decorative features, choices that don't affect our apprehension
								of textual meaning directly but provide context (such as font size or typeface, margins,
								interlinear spacing)</item>
						</list>
					</p>
				</lectureNote>
			</section>
			<section>
				<head>Text encoding in the digital world</head>
				<slide>
					<eg><![CDATA[<entry>
  <headword rend="face(blackletter)">Abstrude</headword>
  <lemma rend="slant(italic)">(abstrudo)</lemma>
  <definition>to <lb/>thrust away or out, to hide, to <lb/>shut up</definition>
  <source rend="slant(italic)">Fel.</source>
</entry>]]></eg>
					<eg><![CDATA[<entry>
  <headword>Abstrude</headword>
  <lemma>abstrudo</lemma>
  <definition>to thrust away or out, to hide, to shut up</definition>
  <source>Fel.</source>
</entry>
	]]></eg>
				</slide>
				<lectureNote>
					<p>In the digital world of text markup, we are dealing with an entirely different
						representational system: one that is not rooted in the phenomenology of print, but rather
						allows us (if we choose) to represent things from a very different perspective</p>
					<p>A representational system whose basic components are not information about formatting, but
						rather information about structure and function.</p>
					<p>This approach is often called <soCalled>descriptive markup</soCalled>
						<list>
							<item>extremely important development in the history of electronic document management</item>
							<item>a long and interesting history of debates about its real nature, what to call it</item>
							<item>but at a simple level, there are a few important premises</item>
						</list>
					</p>
					<p>Essentially, descriptive markup is based on the idea that the best way to represent a
						document digitally is by describing it; not by giving instructions to a particular system on
						what to do with it but by saying, in general terms, what each of its parts is.</p>
					<p>Underlying this philosophy is the idea that presentation derives from the nature and
						function of documentary parts: <list>
							<item>a heading is bold because it's a heading; a lemma has parentheses around it to mark it
								as a lemma in this text</item>
							<item>and therefore in encoding, we should identify the parts of the document's structure
								first of all, and then base any additional information (such as details of presentation) on
								that structure</item>
						</list></p>
					<p>Three foundational assumptions: <list type="unordered">
							<item>Presentation expresses structure and function</item>
							<item>Markup should identify structure (primary)</item>
							<item>Stylesheets produce presentation (secondary)</item>
						</list></p>


					<p>Note that this is a significant departure from earlier kinds of document markup, which
						served to give instructions to specific processing systems (e.g. typesetting engines) on how
						to format or process the text.</p>
				</lectureNote>
			</section>

			<section>
				<head>Descriptive versus procedural markup</head>
				<slide>
					<p>Procedural (e.g. troff): <eg><![CDATA[.ce]]></eg></p>
					<p>Descriptive (e.g. XML): <eg><![CDATA[<head rend="align(center)">]]></eg></p>

				</slide>
				<lectureNote>
					<p>Note the difference between these two systems: <list>
							<item>troff says: "center this text"</item>
							<item>XML says "just FYI, this text is a (centered) heading"</item>
						</list></p>
				</lectureNote>
			</section>
			<section>
				<head>The Rhetoric of Descriptive Markup</head>
				<slide>
					<p>At least two modes of interest: <list>
							<item>Transcriptional: creating a representation of some other textual artifact</item>
							<item>Authorial: creating a new textual artifact</item>
						</list>
					</p>
				</slide>
				<lectureNote>
					<p>In the digital world, our relationship to textuality becomes somewhat more complex, because
						our use of the digital medium is so often (though not exclusively) to represent materials from
						other media.</p>

					<p>Hence digital text encoding has two different modes:</p>

					<p>1. A transcriptional mode: in which you're creating a second-order representation of a
						textual artifact (using or reproducing the visual markup of the original artifact); the
						original markup (spacing, punctuation, other formatting) either gets subsumed into the new
						markup system and is expressed using the vocabulary of that system, or gets described as part
						of the encoding (as a secondary fact about the document's structure). By analogy with the
						print world, we might liken this to a scholarly edition of a text, in which the original text
						is being represented through a different set of formatting conventions that aim to convey the
						same meaning as those of the original. </p>
					<p>2. An authorial mode: in which you're creating a new textual artifact with its own original
						markup systems, with no backwards look towards print at all: perhaps a look forward to future processing and perhaps remaining agnostic about how it will be used or presented.</p>
					<p>In our discussion of the TEI, we will be focusing primarily on the former, because it is the
						current emphasis of the TEI and similar encoding systems, but the latter is also of increasing
						interest to scholars.</p>

				</lectureNote>
			</section>
			<section>
				<head>Advantages of descriptive markup</head>
				<slide>
					<p>The same data can be reused flexibly</p>
					<p>Presentation can be controlled easily</p>
					<p>The document can be treated as an object of analysis</p>
				</slide>
				<lectureNote>
					<p></p>
					<p>There are a number of practical advantages to descriptive markup and the kinds of digital objects it produces:<list>
							<item>lets you use the same underlying data with multiple presentations</item>
							<item>allows you to change presentation easily through stylesheets, etc.</item>
						<item>in other words, it gives you a more natural way of interacting with the document</item>
						</list></p>
					<p>There are also conceptual benefits, once you move beyond these kinds of prosaic
						organizational information and start to consider humanities texts<list>
							<item>if you mark up the structure, you can treat it as an object of analysis: literary
								analysis, historical analysis, rhetorical analysis, linguistic analysis, etc.</item>
						</list></p>
				</lectureNote>
			</section>
			

			<section>
				<head>Additional assumptions</head>
				<slide>
					<p>Relationship between structure and presentation is consistent (though complex)</p>
					<p>Presentation is functional, not decorative</p>
					<p>Presentation is variable, structure is constant</p>
				</slide>
				<lectureNote>
					<p>There are some additional assumptions that go along with the idea of descriptive markup. </p>
		<list>
							<item>that the relationship between structure and presentation is consistent (even if perhaps
								complex)</item>
							<item>that presentation is not decorative but functional: that is, that it exists to express
								function, not for any other purpose</item>
							<item>that presentation is variable while structure is constant (in other words, the
								structure expresses something fundamental about the document while presentation expresses
								something secondary)</item>
						</list>
				</lectureNote>
			</section>


			<section>
				<head>Some complications</head>
				<slide>
					<p>Relationship between structure and appearance is more complex than that...</p>
					<p>Appearance is not purely functional (and yet not merely decorative either)</p>
					<p>Distinction between <q>primary/secondary</q> or <q>essential/inessential</q> is suspect</p>
				</slide>
				<lectureNote>
					<p>These assumptions are pretty much true for the kinds of information which were first
						motivating the development of SGML: for instance, technical documentation, legal forms,
						documents generated and used by the military and the IRS, all of which needed to be encoded
						not for immediate output, but for long-term storage, maintenance, and output in multiple
						formats (including formats that couldn't be foreseen). </p>
					<p>And encoding systems that emerged out of this same tradition, like the TEI (and EAD, DocBook, EpiDoc, etc.), all emphasize structural
						markup that identifies the parts of the document by their structure rather than their
						appearance, and even a brain-dead renegade like HTML has been steadily moving from its initial
						emphasis on presentation (the <gi>hi</gi> and <gi>font</gi> element etc) to greater structural
						expressiveness, precisely because it turns out this is a more sustainable, cost-effective way
						of doing things. QED.</p>
					<p>However, the use of markup to describe humanities texts (particularly those from the early era of print and before) has revealed complexities that need to be taken into account and make this kind of markup more of a challenge to apply: <list>
							<item>the relationship between structure and presentation may not be consistent at all,
								particularly when dealing with older texts</item>
							<item>either by accident/sloppiness/practical constraints (such as the need to fit more or
								less onto a given page: think of the setting of Shakespearean plays as prose or verse
								depending on available space)</item>
							<item>or because in fact presentation in humanities texts may well be decorative rather than
								(or in addition to being) functional: it may exist to comment on, complexify, ironize,
								adorn, or distract from the content</item>
							<item>and further, there has been an important line of commentary within editorial theory and
								text encoding theory both, arguing that the distinction between an "essential/fundamental"
								content and a variable/inessential presentation is false: that in fact the presentation and
								the physical substance of the document are constitutive of meaning and inseparable from it.
							</item>
						</list></p>
					<p>And not to mention the fact that even if one regards presentation as secondary, for
						humanities scholars it turns out to be a very important secondary indeed: they still want to
						know about how the document looked. </p>
				</lectureNote>
			</section>

			<section>
				<head>Motives for Text Encoding</head>
				<slide>
					<list type="unordered">
						<item>To store information for the long term</item>
						<item>To analyse information</item>
						<item>To share information</item>
					</list>
				</slide>
				<lectureNote>
					<p>We've been talking so far about text encoding as a theoretical pursuit, but of course it's also an intensely practical activity and takes work to actually perform, so it's fair enough to ask why people do it, and why they use systems like the TEI Guidelines. </p>
					<p>The practical motives for text encoding are situated within a fairly complex set of social and
						technological constraints and goals; there are three very significant goals [don't elaborate!]: <list>
							<item>To store information for the long term, in a format that is not vulnerable to changes
								in hardware and software</item>
							<item>To analyse information and represent the results of the analysis in some way</item>
							<item>To share information with colleagues and other projects, and to publish it for future
								use.</item>
						</list>
					</p>
					<p>To fulfill the first goal, all you need is a format that is non-proprietary and
						human-readable: XML for instance (and we'll say more in a few minutes about what this means)<list>
							<item>doesn't matter how detailed, what kind of markup</item>
						</list></p>
					<p>To fulfill the second goal, you need more than this: you need an adequately detailed markup
						system: a system that can capture the kind of information you are interested in, and enable
						the kinds of things you plan to do with your data in the future: in other words, make it worth
						your while to store information in the long term</p>
					<p>To fulfill the third goal, you need more than this: you need a markup system that is shared
						by other people, who agree to use it in the same way you do<list>
							<item>for this, you need some sort of infrastructure for developing and maintaining the
								markup system and even more importantly its documentation, so that people who want to use it
								have a place to go find it, learn about it.</item>
							<item>you might be able to come up with a perfectly good encoding system all by yourself; if
								you lived on a desert island, you wouldn't have any motive to do otherwise</item>
							<item>but insofar as text encoding is a community-oriented activity, inventing your own
								system from scratch can be a very solipsistic activity</item>
						</list></p>
					<p>This is why the TEI exists: to provide a long-term, detailed, analytically rich markup system that is understood by an entire community and can be used to create sharable, durable representations of the textual objects that community cares about.</p>
				</lectureNote>
			</section>


			<section>
				<head>Text Encoding is Never Simple</head>
				<slide>
					<p>Text encoding is not simple data entry: it is part of research.</p>
					<p>Text encoding is not neutral or objective.</p>
					<p>Text encoding is a strategic representation of the text.</p>
				</slide>
				<lectureNote>
					<p>Text encoding sits right at the intersection between technology
						and humanistic/cultural research—at the moment it is the central representational technology for the digital humanities</p>
					<p>Important to present this not as a simple act of copying, making a digital facsimile:<list>
							<item>instead, think of it as part of the intellectual strategy of research</item>
							<item>creating research objects that are of value: whether broad or specific, advanced or
								basic</item>
						</list>
					</p>
					<p>Text encoding fits into this as the chief means of creating textual representations: reseach
						objects which are of interest because of their textual information <list>
							<item>not simply the letters and words themselves, but also the text’s structure and its
								contents</item>
							<item>text encoding allows the researcher to represent the text in complex ways</item>
							<item>and allows the addition of specialized research knowledge as well as basic information
								necessary to elucidate arcane texts.</item>
						</list>
					</p>
					<p>As a result, text encoding:<list>
							<item>creates a model of the text: a representation that will be used for research purposes</item>
							<item>is a strategic act: it exists to serve the specific purposes of its user. It is not a
								neutral or objective process</item>
							<item>is thus discipline-specific: it adds certain kinds of information and focuses attention
								on certain kinds of information, and it ignores and eliminates other kinds of
							information.</item>
						</list>
					</p>
					<p>These considerations make text encoding more difficult, but also more interesting, both to
						learn and to perform. </p>
					<p>More difficult: <list>
							<item>because it involves complex analysis and decision-making</item>
							<item>because it involves specialized knowledge of the research objects and the
							audience</item>
						</list>
					</p>
					<p>More interesting: <list>
							<item>because its work is directly implicated in the scholarly research that will be
								performed on the text</item>
							<item>and in fact is in some ways inseparable from it.</item>
						</list>
					</p>

				</lectureNote>
			</section>


		</presentation>
	</text>
</TEI>
