Women Writers Online
About
Texts
Encoding
Site Index
Contact
The WWP's encoding documentation is available online. Our documentation of sources, names, transcribed files, and transcription workflow is managed in a relational database system.
Each xerox or microfilm that the WWP purchases is given a WWP catalogue number to identify it in our records. These sources are then catalogued in a database which stores, among other things, author, title, publication information, the length and format of the source text, the source library, call number, Wing or STC number if appropriate, microfilm versions available, condition of the source (if flawed or illegible), and whether the text has yet been encoded.
Once the text has been encoded we also store information pertaining to the electronic transcription, including its filename, the person responsible for the transcription, and the price of the print version (if a print version is being made available).
Information on sources is maintained by the Textbase Editor.
The WWP documents its encoding methods in two ways: from the bottom up, and from the top down. The bottom-up documentation takes the form of an online discussion list, where encoding problems and questions can be posted and addressed as they arise. Minutes of regular encoding meetings are also archived on this list. The top-down documentation is stored online, with an entry for each separate encoding issue or aspect of transcription (treatment of catchwords, how to encode castlists, the use of the <bibl> element). The documentation is keyworded for easier searching and is the WWP's chief encoding reference tool, after the TEI Guidelines. Topics of particular interest or difficulty are also documented in training tutorials which are accessible from the WWP training web site.
All staff and encoders participate in the discussion list and encoding meetings. The documentation database is maintained by the Textbase Editor and the Programmer/Analyst. The training tutorials are maintained by the Textbase Coordinator.
The WWP tracks the work of transcription in a database, maintaining a record for each encoded text indicating the original encoder, the person currently working on the text (if different), the latest DTD version against which the text is known to be valid, the date of last validation, the stage of encoding (document analysis, initial capture, proofreading, etc.), and the date at which each stage was completed. The record also includes a document analysis report (completed by the encoder as part of the initial preparation of the text) and reviews by staff members of the finished encoding.
The workflow information and document analysis are entered by the encoder; staff also use this database to generate reports on the progress of the textbase, to determine the status of individual texts, and to review the work of individual encoders.
The TEI system of DTDs is designed to be extended by TEI projects as needed. The WWP's TEI extensions are documented in TEI-conformant extension files.
The WWP maintains a database containing all the non-fictional proper names in the WWP textbase, plus the names of all people connected with the textbase. Each name is assigned a unique key using the name2key program written by Syd Bauman. This key is used wherever the name appears in an encoded text; in addition, the name key serves as a link between the names database and other documentation databases (for instance, the sources documentation and the workflow documentation) wherever a name is used. This reduces typing and also eliminates inconsistency of format and ambiguity of reference. The names database stores the proper name of the individual (broken into its component parts), with some demographic and biographical information (birth and death dates, gender, distinguishing historical facts). This method of storing names has the added advantage that names can be stored centrally and then invoked in various different forms (full name, inverted order, last name only, etc.) as needed in various applications. The names database is also used by encoders to find existing keys for proper names they encounter in the course of encoding their texts. If no key already exists for a given proper name, the encoder creates a new key using the name2key program. The individual key records are then periodically collected and imported into the names database, and additional information is added as necessary. At this stage duplicates are eliminated and ambiguities (which Darius?) are resolved. The names database is maintained by the Textbase Editor.