![]() |
James Thornton |
| Internet Business Consultant |
| Home | Blog | Bio | Projects | Contact | Latest Blog (new site): How to Get to Genius |
|---|
|
B.3. ValidationB.3.1. Why Validate Your DocumentThe LDP uses a number of scripts to distribute your document. These scripts submit your document to the LDP's CVS (a free document version management system), and then they transform your document to other formats that users then read. Your document will also be mirrored on a number of sites worldwide (yet another set of scripts). In order for these scripts to work correctly, your document must be both "well formed" and use "valid markup". Well formed means your document follows the rules that XML is expecting: it complies with XML grammar rules. Valid markup means you only use elements or tags which are "valid" for your document: XML vocabulary rules are applied. If your document is not well formed or uses invalid markup, the scripts will not be able to process it. As a result, your revised document will not be distributed.
B.3.2. Validation for the Faint of HeartYour life is already hard enough without having to install a full set of tools just to see if you validate as well. You can upload your raw XML files to a web site, then go to http://validate.sf.net, enter the URL to your document, then validate it.
B.3.3. Validation for the Not So Faint Of HeartB.3.3.1. CatalogsXML and SGML files contain most of the information you need; however, there are sometimes entities which are specific to SGML in general. For example © can be used to produce © and $ can be used to produce $. To match these entities to their actual values you need to use a catalog. The role of a catalog is to tell your system where to find the files it is looking for. You may want to think of a catalog as a guide book (or a map) for your tools. Most distributions (Red Hat/Fedora and Debian at least) have a common location for the main SGML catalog file, called /etc/sgml/catalog. In times past, it could also be found in /usr/lib/sgml/catalog. The structure of XML catalog files is not the same as SGML catalog files. The section on tailoring a catalog (see Section B.3.4) will give more details about what these files actually contain. If your system cannot find the catalog file, or you are using custom catalog files, you may need to set the SGML_CATALOG_FILES and XML_CATALOG_FILES environment variables. Using echo $SGML_CATALOG_FILES, check to see if it is currently set. If a blank line is returned, the variable has not been set. Use the same command to see if XML_CATALOG_FILES is set as well. If the variables are not set, use the following example to set them now. Example B-1. Setting the SGML_CATALOG_FILES and XML_CATALOG_FILES Environmental Variables
To make this change permanent, you can add the following lines to your ~/.bashrc file.
If you installed XML tools via a RedHat or Debian package, you probably don't need to do this step. If you are using a custom XML catalog you will definitely need to do this. There is more on custom catalogs in the next section. To ensure my backup scripts grab this custom file, I have added mine in a sub-directory of my home directory named "docbook".
You can also change your .bashrc if you want to save these changes.
If you are adding the changes to your .bashrc you will not see the changes until you open a new terminal window. To make the changes immediate in the current terminal, "source" the configuration file. B.3.4. Creating and modifying catalogsIn the previous section I mentioned a catalog is like a guide book for your tools. Specifically, a catalog maps the rules from the public identifier to your system's files. At the top of every DocBook (or indeed every XML) file there is a DOCTYPE which tells the processing tool what kind of document it is about to be processed. At a minimum this declaration will include a public identifier, such as -//OASIS//DTD DocBook V4.2//EN. This public identifier has a number of sections all separated by //. It contains the following information: ISO standard if any (- -- in this case there is no ISO standard), author (OASIS), type of document (DTD DocBook V4.2), language (English). Your DOCTYPE may also include a URL. A public identifier is useless to a processing tool, as it needs to be able to access the actual DTD. A URL is useless if the processing tool is off-line. To help your processor deal with these problems you can download all of the necessary files and then "map" them for your processing tools by using a catalog. If you are using SGML processing tools (for instance Jade), you will need an SGML catalog. If you are using XML processing tools (like XSLT), you will need an XML catalog. Information on both is included. B.3.4.1. SGML CatalogsExample B-2. Example of an SGML catalog As in the example above, to associate an identifier to a file just follow the sequence shown:
B.3.4.1.1. Useful commands for catalogsThe most common mappings to be used in catalogs are:
B.3.4.2. XML CatalogsThe following sample catalog was provided by Martin A. Brown. Example B-3. Sample XML Catalog file
B.3.5. Validating XMLB.3.5.1. nsgmlsYou can use nsgmls, which is part of the jade suite (on Debian apt-get the docbook-utils package, see Section B.4.2), to validate SGML or XML documents.
If there are no issues, you'll just get your command prompt back. The -s tells nsgmls to show only the errors.
For more information on processing files with Jade/OpenJade please read DocBook XML/SGML Processing Using OpenJade. B.3.5.2. onsgmlsThis is an alternative to nsgmls. It ships with the OpenJade package. This program gives more options than nsgmls and allows you to quietly ignore a number of problems that arise while trying to validate an XML file (as opposed to an SGML file). This also means you don't have to type out the location of your xml.dcl file each time. I was able to simply use the following to validate a file with only error messages that were related to my markup errors.
According to Bob Stayton you can also turn off specific error messages. The following example turns off XML-specific error messages.
B.3.5.3. xmllintYou can also use the xmllint command-line tool from the libxml2 package to validate your documents. This tool does a simple check on completeness of tags and whether all tags that are opened, are also closed again. By default xmllint will output a results tree. So if your document comes out until the last line, you know there are no heavy errors having to do with tag mismatches, opening and closing errors and the like. To prevent printing the entire document to your screen, add the --noout parameter.
If nothing is returned, your document contains no syntax errors. Else, start with the first error that was reported. Fix that one error, and run the tool again on your document. If it still returns output, again fix the first error that you see, don't botter with the rest since further errors are usually generated because of the first one. If you would like to check your document for any errors which are specific to your Document Type Definition, add --valid.
The xmllint tool may also be used for checking errors in the XML catalogs, see the man pages for more info on how to set this behavior. If you are a Mac OSX or Windows user, you may also want to check out tkxmllint, a GUI version of xmllint. More information is available from: http://tclxml.sourceforge.net/tkxmllint.html. Example B-4. Debugging example using xmllint The example below shows how you can use xmllint to check your documents. I've created some errors that I made a lot, as a beginning XML writer. At first, the document doesn't come through, and errors are shown:
Now, as we already mentioned, don't worry about anything except the first error. The first error says there is an inconsistency between the tags on line 6 and line 22 in the file. Indeed, on line 6 we left out the "e" in "articleinfo". Fix the error, and run xmllint again. The first complaint now is about the offending line 37, where the closing tag for list items has been forgotten. Fix the error and run the validation tool again, until all errors are gone. Most common errors include forgetting to open or close the paragraph tag, spelling errors in tags and messed up sections. |
|
James Thornton, jamesthornton.com>Services: Search Engine Optimization And Site Marketing |
Electric Speed: Online Marketing Solution |