Validation
Validation is the process by which we check whether the encoding of a file matches the rules established in the schema that governs the file: whether all of the elements used are defined in the schema, whether they appear in legal places, and whether all required elements are present. There are several different validation tools available; the one we use by default is called xmllint.
It's much easier to fix one or two validation errors than to fix a large number, so we recommend that you validate early and often—validate every time you save. (And save often!) To validate your file, type Control-c Control-v and then type Return at the next two prompts. The first prompt asks you to choose the validation tool, with xmllint as the default; the second asks you to confirm the file being validated, with the current file as the default.
A few general pointers:
- Always save your file before validating it; the validation process looks at the last saved version.
- Fix one error, then save and validate again. The XML validation process may report the same error several times in different ways, so a list of five errors may result from a single missing character in the file. Fixing one thing may dramatically reduce your error list and make it easier to diagnose the remaining errors.
- Look out for missing angle brackets, missing slashes in end-tags, missing end-tags: these are the most common errors and the hardest to see. If you use tag insertion (Control-c Control-e in emacs) you'll avoid making these errors to start with.
The lists below give common error messages and suggests how to fix the errors involved. Examples use the element <foo type="bar">.
Parser errors
Parser errors indicate that your file is ill-formed: it does not conform to the basic XML rules of nesting and delimitation. Errors in this class mean that the XML parser (the tool that reads the XML document and maps out its tree structure) could not parse the file because of ill-formedness errors. Examples of this kind of error are things like missing start- or end-tags, missing markup characters (like unmatched quotation marks around attribute values, or missing angle brackets), or overlapping elements.
- Opening and ending tag mismatch: xmllint has found a discrepancy between a start-tag and the end-tag it thinks should correspond to it. This might be caused by a typo in the name of the start- or end-tag, or by a missing character in the markup (missing or mismatched quotation marks, missing equals sign, missing angle bracket). Go to the error location and look closely. The error message will try to indicate the exact location of the error. If the error is a mistyped element name, the error message will report the start-tag and end-tag involved (e.g. <div> and </dov>), and point out exactly where you made the mistake. However, if the error is an omitted markup character, the error message reports the nearest well-formed tag. This may produce a misleading error message that mentions elements which are perfectly fine. Look at the other error messages for clues about where the error might be, and check your encoding carefully for missing characters.
- Error parsing attribute name: this suggests that there's an error in the start-tag, since xmllint got confused while looking for an attribute (which can only appear in a start-tag). Look for a missing angle bracket, or a missing space between the element name and the attribute name.
- Attributes construct error: as above, this suggests an error in a start-tag. Look for a missing angle bracket, a missing space between the element name and the attribute name, a missing equals sign, or a missing or mismatched quotation mark around an attribute value.
- Couldn't find end of start tag...: as above, this suggests an error in a start-tag. Look for a missing angle bracket, a missing space between the element name and the attribute name, a missing equals sign, or a missing or mismatched quotation mark around an attribute value.
- AttValue: " or ' expected: indicates a missing quotation mark around an attribute value.
- Unescaped "<" not allowed in attributes values: this one is a little misleading. It sometimes indicate a mismatched quotation mark around an attribute value: for example, <foo type='bar">content</foo>. Essentially, the parser sees the first quotation mark and understands that it is reading an attribute value. It does not find a matching quotation mark, so it assumes the attribute value continues and includes "content". It then sees the < of the end-tag and complains that this character is not allowed inside an attribute value (unless it is escaped). But the error is not with the angle bracket, but with the mismatched quotation marks. The same error message will be reported if the second quotation mark is simply missing.
- Specification mandate value for attribute bar: indicates a missing equals sign between the attribute name and its value.
- Premature end of data in tag...: This indicates that an expected end-tag is not found. This could mean that the end-tag is actually missing, or that you've omitted the slash in the end-tag (thus making the parser think it's a start-tag)
Validity errors
Validity errors indicate that your file does not conform to the WWP DTD: either because you've used an element that doesn't exist, or you've put an element in the wrong place, or you've omitted some required element.
- Element foo content does not follow the DTD, expecting [big scary DTD fragment here], got [some other list of elements]: Reduced to essentials, this error message is saying that it expected some particular set of elements as the content of the <foo> element, but it got some other set of elements. In theory you could read the DTD fragment and figure out what it wanted and where your encoding is different, but in practice it's usually easier to look at your encoding and think "where did I mess up?". If the "some other list of elements" contains the term "CDATA", and no CDATA is listed in the big DTD fragment, then one likely problem is that you've typed PCDATA (letters and spaces) in a place where only elements are permitted.
- Element foo is not declared in list of possible children: the element foo is not allowed where you have put it. This might mean that element foo does not exist, or simply that it is not permitted in that particular location. Look up the element in the WWP documentation or the TEI Guidelines to make sure it exists, and to find out where it is permitted.
- No declaration for attribute bar of element foo: The attribute bar either does not exist or is not permitted on this particular element. Check your typing and look up the element in the documentation to check on its permissible attributes.
- Element foo does not carry attribute duck: the element foo is missing a required attribute named "duck".
- Value "bar" for attribute type of foo is not among the enumerated set: this means that the type= attribute for this element uses a controlled list of possible values, and the attribute value "bar" isn't one of the permitted values.