<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/css" href="../../../_utils/stylesheets/yaps-tei.css"?>
<?oxygen RNGSchema="../../../_utils/schema/yaps.rnc" type="compact"?>
<?oxygen SCHSchema="../../../_utils/schema/yaps.isosch"?>
<TEI xmlns="http://www.wwp.brown.edu/ns/yaps/1.0" xmlns:xi="http://www.w3.org/2001/XInclude">
  <teiHeader>
    <fileDesc>
      <titleStmt>
        <title>Publishing TEI Documents</title>
        <author xml:id="jhf">Julia Flanders</author>
	<author>Syd Bauman</author>
      </titleStmt>
      <editionStmt>
        <edition>Introduction to Manuscript Encoding with TEI, Brown University</edition>
      </editionStmt>
      <publicationStmt>
        <distributor>Women Writers Project (via website)</distributor>
        <address>
          <addrLine>wwp@Brown.edu</addrLine>
        </address>
        <date when="2012-05-09"/>
        <availability status="restricted">
          <p>Copyright 2011 Syd Bauman, Julia Flanders, and Brown WWP</p>
	  <p>This TEI-encoded XML file is available under the terms of
	  the <ref target="http://creativecommons.org/licenses/by-sa/3.0/">Creative
	  Commons Attribution-ShareAlike 3.0 (Unported)</ref>
	  license.</p>
        </availability>
        <pubPlace>Providence, RI  USA</pubPlace>
      </publicationStmt>
      <sourceDesc>
        <p>Very brief minimalist coverage of basic XML publishing tool chain (at a conceptual level) and framework.</p>
      </sourceDesc>
    </fileDesc>
    <revisionDesc>
      <change when="2011-10-18" who="#jhf">Created new from scratch</change>
    </revisionDesc>
  </teiHeader>
  <text>
    <presentation>
      <section>
        <head>
          <q>Publishing</q>
        </head>
        <slide>
          <p>Several senses of the term: <list>
              <item>making documents readable online</item>
              <item>exploitation of the encoding</item>
              <item>more ambitious things follow: text analysis,
                data mining, other processing</item>
            </list></p>
        </slide>
        <lectureNote>
          <p>Now you've learned about how to create TEI documents,
            but we haven't said anything about what you can do with
            them&#x2026; </p>
          <p> There is a very wide range of things you can do with
            TEI documents, involving in-depth analysis, data mining,
            processing to discover patterns of various sorts. </p>
          <p> We're going to focus on "publishing", both in the
            narrow sense of "making them readable online" and also
            in the broader sense of "exploiting the encoding
            publicly". But most of the more advanced things you can
            do with TEI documents use technologies similar to the
            ones we're talking about here.</p>
        </lectureNote>
      </section>

      <section>
        <head>Transformations with XSLT</head>
        <slide>
          <p>Extensible Stylesheet Language transformations allow you
            to transform XML documents into other formats </p>
          <figure>
            <graphic height="400px"
              url="../../../_utils/gfx/publication_tools_xslt.png"/>
          </figure>
        </slide>
        <lectureNote>
          <p> The Extensible Stylesheet Language allows you to
            transform XML documents into other XML formats</p>
          <p>Essentially XSLT allows you to map a given XML element onto another XML element: saying "take in the following construct, and put out this other construct"</p>
          <p>It could be a construct in the same language, or in a different language such as XHTML, as in the example here</p>
        </lectureNote></section>
      
      
      
      
      
      
      <section>
        <head>XML Databases and Publication Frameworks</head>
        <slide>
          <p>Tools designed to manage large groups of XML files, with
            more advanced functionality: <list>
              <item>fast, efficient searching</item>
              <item>transformations involving groups of files</item>
              <item>eXist, DBXML, Xindice, XTF, MarkLogic</item>
            </list>
          </p>
          <figure>
            <graphic height="400px"
              url="../../../_utils/gfx/publication_tools_framework.png"/>
          </figure>
        </slide>
        <lectureNote>
          <label>The XML database and publication framework universe</label>
          <p> These kinds of tools are designed to manage large groups
            of XML files, and to provide certain kinds of advanced
            functionality: <list>
              <item>fast, efficient searching</item>
              <item>transformations involving groups of files: not
                just transforming each file separately, but
                doing transformations that involve taking parts
                of different files and creating new results
                files: for instance, a sorted list of the first
                lines from all the poems in a collection.</item>
            </list></p>
          
          <p> How do databases fit into a larger XML publication
            framework? What do they do? <list>
              <item>they create and store indexed information:
                that is, information from the source XML files
                that has been preprocessed to make it more
                accessible and easier to manipulate. For
                instance, they might store tables of all the
                document metadata (author, title, genre, date,
                etc.) so that it can be searched and sorted more
                quickly</item>
              <item>they contain a representation of the
                document's structure in a format that makes it
                easier to process, so that certain kinds of
                navigation are easier</item>
            </list> Within the XML publication framework, the
            database sits and waits for queries to come in. <list>
              <item>when it receives a query, it performs the
                necessary searching and returns a result (in the
                form of an XML fragment, or a node set, or some
                proprietary structure) </item>
              <item>the result can then be transformed (e.g. into
                HTML for delivery to a browser, or into some
                other XML format for other processing) using
                XSLT</item>
            </list></p>
          <p>XML databases exist as separate modules that can be used
            as the basis for XML publishing systems, for instance: <list>
              <item>eXist</item>
              <item>DBXML</item>
              <item>Xindice (Apache)</item>
            </list></p>
          
        </lectureNote>
      </section>
      <section>
        <head>What you need!</head>
        <slide>
          <figure>
            <graphic height="400px"
              url="../../../_utils/gfx/publication_spectrum.png"/>
          </figure>
        </slide>
        <lectureNote>
          <p>The tools you need, and the people you need, can be imagined as a rough continuum of increasing scale, complexity, difficulty, and cost:
          <list>
            <item>at the simplest level, there are things you can do (or learn to do) by yourself, with very little in the way of equipment or software: tools like XSLT and CSS will go a long way towards producing simple, effective interfaces for browsing and reading small sets of documents</item>
            <item>at a slightly more complex level, as the number of documents increases and as you want to do more ambitious things with them (such as visualizations, complex searching), you need software tools that are a little more challenging to manage: perfectly within the capabilities of a humanist, but requiring more time: not something you can do on the side of another job; this becomes someone's major job responsibility</item>
            <item>Going a bit further, we get to things that require XML publication frameworks that require a professional systems administrator, someone who really understands the installation and configuration of things like web servers, XML databases, etc. These are the kinds of tools we need to build things like data mining or text/topic analysis into our publications, and also if we want to publish larger collections of documents that require more server power/speed</item>
            <item>For production-level publication, where you may be actually charging money for access (and hence need to do things like authentication) and hence may have higher standards of performance and reliability, you need to start engaging with your institutional IT organization to make sure that things like backups, server maintenance, etc. are being handled at the appropriate level of professionalism; this is also the level of scale at which we start to be able to really work effectively with multiple large data sets: for instance, multiple projects of substantial size</item>
            <item>Finally, if we want to be able to ensure the long-term sustainability of projects, we need to engage with systems like institutional repositories and the data curators who can help us ensure that data will be maintained, migrated, etc. after the project itself is no longer funded.</item>
          </list>
          
          </p>
        </lectureNote>
      </section>
      
      
    </presentation>
  </text>
</TEI>
