<?xml version="1.0"?>
<?xml-stylesheet type="text/css" href="../stylesheets/yaps-tei.css"?><?oxygen RNGSchema="../schema/yaps.rnc" type="compact"?><?oxygen SCHSchema="../schema/yaps.sch"?><TEI xmlns="http://www.tei-c.org/ns/1.0">
  <teiHeader><fileDesc>
      <titleStmt>
        <title>The <code>gaiji</code> module </title>
        <author>Christian Wittern</author>
      </titleStmt>
      <publicationStmt>
        <distributor>Brown University Women Writers Project</distributor>
        <address>
          <addrLine>cwittern@zinbun.kyoto-u.ac.jp</addrLine>
        </address>
        <date value="2007-03-16"/>
        <availability status="free">
          <p>Copyleft 2006 Christian Wittern</p>
        </availability>
        <pubPlace>Stanford Humanities Center</pubPlace>
      </publicationStmt>
      <sourceDesc>
        <p>This is the source.</p>
      </sourceDesc>
    </fileDesc><revisionDesc><change date="2006-03-13" who="#SB">automatically converted from presentation.odd conforming to
        yaps.odd conforming using p2y.xslt and p2y.perl</change><change who="#SB" date="2006-03-11">Merged the two change sets below into
        this one document, also keeping 1 slide CW had deleted; fixed whitespace
        in the examples, </change><change who="#CW" date="2006-03-09">Some content corrections</change><change who="#SB" date="2006-03-08">Minor typo correction</change></revisionDesc></teiHeader>
  <text>
    <presentation>
      <section><head>Characters and glyphs</head>
          <slide>
            
            <list rend="unordered">
              <item>In a document printed on paper, a character and the glyph
                representing it form a single unit that can not be further
                divided</item>
              <item>While transcribing such a document with a computer, these
                have to be split in apart:<list>
                  <item>in the abstract unit that is represented internally with
                    a codepoint: the character (文字)</item>
                  <item>in the shape that is associated with this codepoint: the
                    glyph (字形)</item>
                </list>
              </item>
              <item>There are usually some generic shapes associated with a
                character, but they can vary considerably</item>
              <item>Two problems follow from this:<list>
                  <item>Sometimes there is no character that can be used</item>
                  <item>At other times, we want to identify the exact form that
                    was used in the text</item>
                </list></item>
              <item>In this unit, we investigate solutions to both
              problems</item>
            </list>
          </slide>
        </section><section><head>Representing a character that is not in Unicode</head>
          <slide>
            
            <p>The represenation has two components:<list>
                <item>The definition of the character (in the
                  <gi>teiHeader</gi>)</item>
                <item>The invocation of the character somewhere in the
                text</item>
              </list> These are available in the <name type="module">gaiji</name> module</p>
          </slide>
        </section><section><head>The definition of a new character</head>
          <slide>
            
            <p>The TEI provides the following for defining a new character:<list>
                <item>In <gi>charDesc</gi> of the <gi>encodingDesc</gi> section
                  in the <gi>teiHeader</gi> is a list of character definitions:</item>
                <item>The <gi>char</gi> holds a definition of a character</item>
                <item>a <gi>charName</gi> is a required child</item>
                <item>Additional properties of the character can be defined
                  using <gi>charProp</gi></item>
                <item>The invocation of the character somewhere in the text</item>
                <item>Additionally, a graphic representation of the character
                  can be given in <gi>graphic</gi></item>
                <item>A <gi>mapping</gi> to another character can be defined, a
                    <att>type</att> is used to indicate the type of the mapping
                  according to some typology.</item>
              </list></p>
            <p>More information can be found in <ref target="http://www.tei-c.org/release/doc/tei-p5-doc/html/WD.html">Representation of non-standard characters and glyphs</ref>
              (which is currently Chapter 25 of the TEI Guidelines)</p>
            <p>The character thus defined can then be invoked using the
              <gi>g</gi> element, the required <att>ref</att> has to point to
              the definition in the header.</p>
          </slide>
        </section><section><head>Example: A character defined using the TEI <code>gaiji</code>
              module</head>
          <slide>
            
            <p>Here is a character with its name and three properties: <eg><![CDATA[<char xml:id="CB02596">
  <charName>CBETA CHARACTER CB02596</charName>
  <charProp>
    <localName>composition</localName>
    <value>[(禾*尤)/上/日]</value>
  </charProp>
  <charProp>
    <localName>Mojikyo number</localName>
    <value>M025240</value>
  </charProp>
  <charProp>
    <localName>entity</localName>
    <value>CB02596</value>
  </charProp>
  <mapping type="normalized">稽</mapping>
</char>]]></eg>
            </p>
          </slide>
        </section><section><head>Differentiating between different glyphs of the same character</head>
          <slide>
            
            <p>Sometimes, the character is in Unicode but with a different shape
              (異體字)</p>
            <p>In this case, a similar mechanism can be used:<list>
                <item>Define a character in the header </item>
                <item>Give the intended shape there (as a graphic or point to a
                  font file)</item>
                <item>Use the standard Unicode character in the text, but within
                  a <gi>g</gi> element</item>
                <item>When rendering the text, the graphic can be used to
                  display the character</item>
                <item>When indexing the text for search, the standard form can
                  be used.</item>
              </list>
            </p>
            <p>Sometimes there are more than one glyph of the same character in
              Unicode. In this case, the same method can be used, but
              additionally, the <gi>mapping</gi> can be used to indicate the
              relationship between these glyphs.</p>
          </slide>
        </section><section><head>Associate a specific glyph with a character</head>
          <slide>
            
            <p>Sometimes, it is desirable to associate a character with a
              specifc glyph instead of the set of generic glyphs which is the
              default in Unicode. In this case, a similar mechanism to the one
              above can be used, but it differs in two important points:<list>
                <item>Instead of a <gi>char</gi>, now define a <gi>glyph</gi> in
                  the header of the document, the intended shape of the glyph
                  can be stated by pointing to a font file or giving using a
                    <gi>graphic</gi> element. </item>
                <item>In the text, the <gi>g</gi>element is used, but now the
                  standard character is used as its content.</item>
              </list></p>
            <p>This example used Ideographic Definition Sequences (IDS) as
              defined by Unicode to indicate the shape of the glyph: <eg><![CDATA[<charDesc>
  <glyph xml:id="v884c">
    <glyphName>Variant of CJK U+884C</glyphName>
    <charProp>
      <localName>IDS</localName>
      <value>⿰⺅亍</value>
      <graphic url="v884c.png"/>          
    </charProp>
  </glyph>
</charDesc>]]></eg> In the text, this would be invoked as follows:
              <eg><![CDATA[﻿總作六<g ref="#v884c">行</g>，北頭第一<g ref="#v884c">行</g>]]></eg>
            </p>
            <p>As a result, the text can be used in different ways: <list>
                <item>When rendering the text, the graphic can be used to
                  display the character</item>
                <item>When indexing the text for search, the standard form can
                  be used.</item>
              </list>
            </p>
          </slide>
        </section><section><head>Create a mapping between related characters</head>
          <slide>
            
            <p>A slightly different problem arises from the fact that in some
              cases Unicode encodes not only characters, but also glyph
              variants. The following: 説 (U+8AAC) and 說 (U+8AAA) are different
              glyphs of the same character, but they each has its own separate
              Unicode value. Normally, only one of these should be used in a
              given text. Howeve, if for some reason it is necessary to use
              both, a mapping between these can be established. A similar method
              can be employed for other cases where one character has used in a
              ideosyncratic way as a replacement for another (異體字). To establish
              the fact that we consider these two as variants of the same
              character we could define the following in the header: <eg><![CDATA[<char xml:id="u8AAAC">
  <glyphName>Unified CJK 8AAC</glyphName>
  <mapping type="variant">&#x8aaa;</mapping>
</char>]]></eg></p>
          </slide>
        </section>
    </presentation>
  </text>
</TEI>
