Brown
University
Women Writers Project
Research and Encoding
Training Materials
Clean With Spam, ediff, macros |
This document last updated Tuesday, 03-Apr-2012 12:03:50 EDT.
Click here for Syd's original, more detailed directions. If this is your first time to this web page, please read Syd's original directions first. Use this web page only for quick reminders.
This document covers the following commands and macros: wwp-spam-and-ediff, wwp-fix-case-of-GIs, wwp-next-missing-TAGC, wwp-next-unquoted-attr.
This document does NOT cover the following commands (however, they ARE covered in Syd's original directions): wwp-sort-revisionDesc, find-tag.
Reminder: there are four things you can do to make sure your encoding is correct: (1)use PSGML properly; (2)validate frequently using nsgmls; (3)run supra-validation; (4)use Spam, ediff, and macros to clean up the last few errors.
NOTE: These commands have the potential to be dangerous in the same way a global search-and-replace can be dangerous: you may make changes to your file that you're not aware of. They execute major changes on your file. It's a good idea to SAVE the file, check it in, then check it back out of RCS before using any of these.
Pick Show missing markup from the WWP menu or do M-x wwp-spam-and-ediff from the keyboard. After the command completes, you will see your original document in buffer A (top buffer) and the "spammed" document in buffer B (bottom buffer).
While looking at a difference, you can change one buffer to match the other. When in ediff, the primary commands you'll be using are:
Reminder: this command runs spam on the current buffer ("Spam" is short for "SGML Parser add markup", and is freeware courtesy of James Clark) and then runs the ediff command to compare the output of spam (with its added markup) with the original buffer.
Pick Fix case of GIs from the WWP menu or do M-x wwp-fix-case-of-GIs from the keyboard.
Reminder: this fixes the case of all GIs, e.g.:
<castlist> <castitem><actor><PERSNAME>Madonna</PERSNAME></actor></castitem> </castlist>
becomes
<castlist> <castItem><actor><persName>Madonna</persName></actor></castItem> </castlist>
Reminder: this is a potentially dangerous command: prior to using it make sure to check for missing ">"s with a regexp search or the wwp-next-missing-TAGC macro (discussed below); then save, check in, and check back out (as with all major global change commands). Then, after you've done it, scan through the file and quickly check if things look all right. You can compare the current file to the most recently checked-in version with the vc-diff command (C-x v =).
Note, these two macros are temporarily broken in xemacs, but DO work in emacs. Syd has already been notified of their brokenness.
Pick Next missing TAGC from the WWP menu or do M-x
wwp-next-missing-TAGC from the keyboard. If you have any errors,
you will be scrolled to the first error, e.g.: <titlepart
type="main". Fix this so it is correct (<titlepart type="main">). Hit C-x e to go to
the next error of this kind.
Pick Next unquoted attribute from the WWP menu or do
M-x wwp-next-unquoted-attr from the keyboard. If you have
any errors, you will be scrolled to the first error, e.g.: type=main. Fix this so it is correct (type="main"). Hit C-x e to go to the next
error of this kind.
NOTE: These macros may NOT find any errors -- if you have been using PSGML key bindings correctly to enter elements and attributes, you will probably not have any of these errors. ALSO note: C-x e will not work properly in this context if you execute other macros in between executing the original macro and typing C-x e.
[an error occurred while processing this directive]