<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/css" href="../stylesheets/yaps-tei.css"?>
<?oxygen RNGSchema="../schema/yaps.rnc" type="compact"?>
<?oxygen SCHSchema="../schema/yaps.sch"?>
<TEI xmlns="http://www.wwp.brown.edu/ns/yaps/1.0" xmlns:xi="http://www.w3.org/2001/XInclude">
  <teiHeader>
    <fileDesc>
      <titleStmt>
        <title>Publishing TEI Documents</title>
        <author>Julia Flanders</author>
	<author>Syd Bauman</author>
      </titleStmt>
      <xi:include href="./boilerplate_publicationStmt.xml">
        <xi:fallback>
          <publicationStmt status="restricted">
            <note type="auto">WARNING: XInclude processing failed &#x2014; this file should not be copied or
            used (and is invalid) as a result.</note>
          </publicationStmt>
        </xi:fallback>
      </xi:include>
      <sourceDesc>
        <p>This is the source. Based on the same talk given at
        Transliteracies Project and Early Modern Center at the
        University of California, Santa Barbara in 2007-09.</p>
      </sourceDesc>
    </fileDesc>
    <revisionDesc>
      <change when="2006-03-13" who="#SB">automatically converted
        from presentation.odd conforming to yaps.odd conforming
        using p2y.xslt and p2y.perl</change>
    </revisionDesc>
  </teiHeader>
  <text>
    <presentation>
      <section>
        <head>
          <q>Publishing</q>
        </head>
        <slide>
          <p>Several senses of the term: <list>
              <item>making documents readable online</item>
              <item>exploitation of the encoding</item>
              <item>more ambitious things follow: text analysis,
                data mining, other processing</item>
            </list></p>
        </slide>
        <lectureNote>
          <p>Now you've learned about how to create TEI documents,
            but we haven't said anything about what you can do with
            themߪ </p>
          <p> There is a very wide range of things you can do with
            TEI documents, involving in-depth analysis, data mining,
            processing to discover patterns of various sorts. </p>
          <p> We're going to focus on "publishing", both in the
            narrow sense of "making them readable online" and also
            in the broader sense of "exploiting the encoding
            publicly". But most of the more advanced things you can
            do with TEI documents use technologies similar to the
            ones we're talking about here.</p>
        </lectureNote>
      </section>

      <section>
        <head>CSS: Cascading Style Sheets</head>
        <slide>
          <p>The simplest approach &#x2014; apply a CSS stylesheet (we did this yesterday): <list>
              <item>the browser reads the TEI file, which points to
                a stylesheet</item>
              <item>the browser reads the stylesheet, and applies
                its styling to the TEI elements as specified</item>
              <item>the browser displays the formatted text</item>
              <item>can control everything including fonts, colors,
                backgrounds, some aspects of layout, etc.</item>
            </list></p>
        </slide>
        <lectureNote>
          <p> The simplest approach to publishing TEI documents is
          that which we did yesterday: just apply a CSS
          stylesheet. <list>
              <item>same process as applying a CSS stylesheet to HTML
              documents</item>
              <item>the browser reads the TEI file, which points to
                a stylesheet</item>
              <item>the browser reads the stylesheet, and applies
                its styling to the TEI elements as specified</item>
              <item>the browser displays the formatted text</item>
              <item>can control everything including fonts, colors,
                backgrounds, layouts (where the chunks of text are
                placed on the page), etc.</item>
              <item>modern standards-compliant browsers can all do
                this</item>
            </list></p>
        </lectureNote>
      </section>

      <section>
	<head>The CSS Lens</head>
	<slide>
	  <figure>
	    <graphic url="./gfx/CSS_lens_01.png" width="100%"/>
	  </figure>
	</slide>
      </section>

      <section>
        <head>Some limitations</head>
        <slide>
          <list>
            <item>can't (at the moment) make links</item>
            <item>can't search, except by using your browser's
                <ident type="cmd">find</ident> command</item>
            <item>can't do any sort of higher-level stuff (of the
              sort that we'll see in a minute)</item>
          </list>
        </slide>
        <lectureNote>
          <p> This is great, but there are some limitations: <list>
              <item>can't (at the moment) make links &#x2014; can make it underlined and blue</item>
              <item>can't search, except by using your browser's
                <ident type="cmd">find</ident> command</item>
              <item>can't do any sort of higher-level stuff (of the
                sort that we'll see in a minute)</item>
            </list></p>
        </lectureNote>
      </section>

      <section>
        <head>Transformations with XSLT</head>
        <slide>
          <p>Extensible Stylesheet Language transformations allow
            you to transform XML documents: <list>
              <item>into other XML documents, such as XHTML, TEI,
                XSLFO, DocBook, etc.</item>
              <item>into other formats: TeX, RTF, pretty much
                anything if you can figure out how </item>
            </list>
          </p>
        </slide>
        <lectureNote>
          <p>The Extensible Stylesheet Language allows you to
            transform XML documents in many ways: <list>
              <item>into other XML documents, such as XHTML, TEI,
                XSLFO, DocBook, etc.</item>
              <item>into other formats: TeX, RTF, pretty much
                anything if you can figure out how </item>
            </list></p>
          <p>Transformation into other XML documents can mean
            several things: <list>
              <item>taking the entire TEI document and converting
                its markup into HTML, so that you now have an
                HTML-encoded document</item>
              <item>taking the entire TEI document and transforming
                bits of it into HTML: for instance, taking just the
                section headings and making an HTML-encoded TOC; or
                taking a long TEI document and transforming it into
                separate HTML files, one for each chapter,
                accompanied by a TOC; etc.</item>
              <item>transforming one kind of TEI markup into
                another: for instance, if you mark up your documents
                using a customized schema, but you want to exchange
                data with other projects, you might convert your
                markup to TEI Lite for easier interchange.</item>
            </list></p>
          <p>This transformation can be done as a process that you
            run in advance, and then use the output. For instance,
            you might have a set of TEI files which you transform to
            HTML, and then mount the HTML on your web site. When you
            make an update to the TEI files, you run the
            transformation again, and remount the resulting
          HTML.</p>
        </lectureNote>
      </section>

      <section>
        <head>Tools you need...</head>
        <slide>
          <list>
            <item>An XSLT processor (some are built into oXygen)</item>
            <item>An XSLT stylesheet</item>
          </list>
	  <figure>
	    <graphic url="./gfx/XSLT_static_01.png" width="100%"/>
	  </figure>
        </slide>
        <lectureNote>
          <p>Tools you need for this kind of transformation: <list>
              <item>an XSLT processor (some are built into Oxygen);
                there are several, they have different virtues which
                we won't go into here</item>
              <item>an XSLT stylesheet</item>
            </list> The processor reads the stylesheet, and reads
            your XML file, and it applies the stylesheet to the file
            and outputs a result.</p>
          <p>Then the result can be used as appropriate: styled with
            CSS and viewed in a browser if HTML; viewed in a browser
            or PDF reader if PDF; etc.</p>
        </lectureNote>
      </section>

      <section>
        <head>Transformations on the fly</head>
        <slide>
          <list>
            <item>Your TEI files live on a server</item>
            <item>When a user requests a file, the transformation
              software performs the transformation on the fly and
              delivers the resulting HTML</item>
            <item>The transformation can vary depending on the
              request</item>
          </list>
        </slide>
        <lectureNote>
          <p>You can also run these transformations on the fly, as
            part of your publication system: <list>
              <item>your TEI files live on a server</item>
              <item>when a user requests a file (e.g. by clicking on
                a URL), the transformation software performs the
                transformation on the fly and delivers the resulting
                HTML.</item>
              <item>the transformation might vary depending on the
                request: for instance, a user clicking on the <ident type="cmd">sort
                by date</ident> link would get different output — from the
                same underlying TEI file — that she would get by
                clicking on the <ident type="cmd">sort by author</ident> link</item>
            </list></p>
        </lectureNote>
      </section>

      <section>
        <head>Tools you need...</head>
        <slide>
          <p>A web publication framework: e.g. Apache's Cocoon,
              <q>web glue for your web application development
            needs</q></p>
        </slide>
        <lectureNote>
          <p>Tools you use for this kind of transformation: e.g.
            Cocoon <q>web glue for your web application development
            needs</q>; DIY Framework?; Struts?; RIFE.</p>
        </lectureNote>
      </section>

      <section>
        <head>Some limitations...</head>
        <slide>
          <list>
            <item>still not much searching</item>
            <item>what searching there is will be slow; you're using
              a tool not designed for handling searches efficiently</item>
            <item>not good for managing large aggregations of files
              efficiently, or for managing them <emph>as a
              group</emph>, dealing with information that cuts
              across the entire aggregation</item>
          </list>
        </slide>
        <lectureNote>
          <p> This is great, but there are some limitations: <list>
              <item>still not much searching</item>
              <item>what searching there is will be slow; you're
                using a tool not designed for handling searches
                efficiently</item>
              <item>not good for managing large aggregations of
                files efficiently, or for managing them <emph>as a
                  group</emph>, dealing with information that cuts
                across the entire aggregation</item>
            </list>
          </p>
        </lectureNote>
      </section>

      <section>
        <head>XML Databases</head>
        <slide>
          <p>Tools designed to manage large groups of XML files,
            with more advanced functionality:
            <list>
              <item>fast, efficient searching</item>
              <item>transformations involving groups of files</item>
            </list>
          </p>
        </slide>
        <lectureNote>
          <label>4. The XML Database universe</label>
          <p> These kinds of tools are designed to manage large
            groups of XML files, and to provide certain kinds of
            advanced functionality: <list>
              <item>fast, efficient searching</item>
              <item>transformations involving groups of files: not
                just transforming each file separately, but doing
                transformations that involve taking parts of
                different files and creating new results files: for
                instance, a sorted list of the first lines from all
                the poems in a collection.</item>
            </list></p>
        </lectureNote>
      </section>

      <section>
        <head>XML databases in the larger XML framework</head>
        <slide>
          <p>How do they fit in? <list>
              <item>they create and store indexed information: e.g.
                tables of all the document metadata</item>
              <item>they may contain a representation of the
                document's structure</item>
            </list>
          </p>
        </slide>
        <lectureNote>
          <p> How do databases fit into a larger XML publication
            framework? What do they do? <list>
              <item><p>they create and store indexed information: that
                is, information from the source XML files that has
                been preprocessed to make it more accessible and
                easier to manipulate. For instance, they might store
                tables of all the document metadata (author, title,
                genre, date, etc.) so that it can be searched and
                sorted more quickly</p></item>
              <item>they contain a representation of the document's
                structure in a format that makes it easier to
                process, so that certain kinds of navigation are
                easier</item>
            </list> Within the XML publication framework, the
            database sits and waits for queries to come in. <list>
              <item>when it receives a query, it performs the
                necessary searching and returns a result (in the
                form of an XML fragment, or a node set, or some
                proprietary structure) </item>
              <item>the result can then be transformed (e.g. into
                HTML for delivery to a browser, or into some other
                XML format for other processing) using XSLT</item>
            </list></p>
        </lectureNote>
      </section>

      <section>
	<head>Indexing</head>
	<slide>
	  <quote>The year Buttercup was born, the most beautiful woman in the world was …</quote>
	  <table rend="css( border: thin solid black; )">
	    <row rend="css( border: thin double black; )">
	      <cell rend="css( border: thin solid black; padding: 0.25ex;)">beautiful</cell>
	      <cell rend="css( border: thin solid black; padding: 0.25ex;)">08</cell>
	    </row>
	    <row rend="css( border: thin double black; )">
	      <cell rend="css( border: thin solid black; padding: 0.25ex;)">born</cell>
	      <cell rend="css( border: thin solid black; padding: 0.25ex;)">05</cell>
	    </row>
	    <row rend="css( border: thin double black; )">
	      <cell rend="css( border: thin solid black; padding: 0.25ex;)">buttercup</cell>
	      <cell rend="css( border: thin solid black; padding: 0.25ex;)">03</cell>
	    </row>
	    <row rend="css( border: thin double black; )">
	      <cell rend="css( border: thin solid black; padding: 0.25ex;)">in</cell>
	      <cell rend="css( border: thin solid black; padding: 0.25ex;)">10</cell>
	    </row>
	    <row rend="css( border: thin double black; )">
	      <cell rend="css( border: thin solid black; padding: 0.25ex;)">most</cell>
	      <cell rend="css( border: thin solid black; padding: 0.25ex;)">07</cell>
	    </row>
	    <row rend="css( border: thin double black; )">
	      <cell rend="css( border: thin solid black; padding: 0.25ex;)">the</cell>
	      <cell rend="css( border: thin solid black; padding: 0.25ex;)">01, 06, 11</cell>
	    </row>
	    <row rend="css( border: thin double black; )">
	      <cell rend="css( border: thin solid black; padding: 0.25ex;)">was</cell>
	      <cell rend="css( border: thin solid black; padding: 0.25ex;)">04, 13</cell>
	    </row>
	    <row rend="css( border: thin double black; )">
	      <cell rend="css( border: thin solid black; padding: 0.25ex;)">woman</cell>
	      <cell rend="css( border: thin solid black; padding: 0.25ex;)">09</cell>
	    </row>
	    <row rend="css( border: thin double black; )">
	      <cell rend="css( border: thin solid black; padding: 0.25ex;)">world</cell>
	      <cell rend="css( border: thin solid black; padding: 0.25ex;)">12</cell>
	    </row>
	    <row rend="css( border: thin double black; )">
	      <cell rend="css( border: thin solid black; padding: 0.25ex;)">year</cell>
	      <cell rend="css( border: thin solid black; padding: 0.25ex;)">02</cell>
	    </row>
	  </table>
<!--	  <figure>
	    <graphic url="./gfx/index_words.png" width="100%"/>
	  </figure>-->
	</slide>
	<lectureNote>
		<p>An XML index or a word index is not unlike a back-of-the-book
		index. Imagine you have a library where the books are arranged on the shelves randomly,
		or in an order that is relatively useless to the patron (e.g., aquisition order,
		size order). To find a particular book for which she knows the author &amp; title but not the
		shelf number, a patron simply has to start at shelf slot #1 and proceed down the aisle quickly
		looking at each book to see if it's the one. If we have an index, we have a list of all the
		books including author, title, and shelf number, <emph>sorted by author's name</emph>. Now
		she can just look up the book by author name, then title, get the shelf number, and walk
		straight to the right spot and find her book is missing.</p>
		<p>For a computer word indexer it's the same thing: the indexer software reads the
		file, for each word it tucks that word into an alphabetically sorted list, and associates
		the word's position with it.</p>
	</lectureNote>
      </section>

      <section>
        <head>XML Databases</head>
        <slide>
          <list>
            <item>eXist</item>
            <item>DBXML</item>
            <item>Xindice</item>
	    <item>BaseX</item>
	    <item>Qizx/db</item>
          </list>
        </slide>
        <lectureNote>
          <p>XML databases exist as separate modules that can be
            used as the basis for XML publishing systems, for
            instance: <list>
              <item>eXist</item>
              <item>DBXML</item>
              <item>Xindice (Apache)</item>
	      <item>BaseX</item>
	      <item>Qizx/db</item>
	    </list>
	  </p>
        </lectureNote>
      </section>

      <section>
        <head>XML publishing systems with database component</head>
        <slide>
          <list>
            <item>TEI Publisher (uses eXist)</item>
            <item>Philologic (includes its own database, but can
              also work with MySQL)</item>
            <item>commercial products like Tamino</item>
          </list>
        </slide>
        <lectureNote>
          <p> But there also exist XML publishing systems which
            include a database component and also other components
            which handle other aspects of the process: <list>
              <item>TEI Publisher (uses eXist): show a bit?</item>
              <item>Philologic (includes its own database, but can
                also work with MySQL): show WWP site</item>
              <item>commercial products like Tamino</item>
            </list></p>
        </lectureNote>
      </section>
    </presentation>
  </text>
</TEI>
