XML-hul Module

1 Introduction

The XML-hul module recognizes and validates the XML (Extensible Markup Language) format. [XML].

The module is invoked by the:

jhove ... -m XML-hul [-x sax-class] ...
command line option.

The XML-hul module can use any XML parser that conforms to the SAX2 interfaces. Note that if the SAX2 optional LexicalHandler interface isn't supported by the parser, JHOVE will only be able to report a restricted set of representation information.

The actual parser used is either:

  1. The parser specified by the -x sax-class command line option (whose class file must be found on the CLASSPATH at the time of execution);
  2. The value of the edu.harvard.hul.ois.jhove.saxClass property in the properties file ${user.home}/jhove/jhove.properties properties file, where ${user.home} is the standard Java user.home property; or
  3. The default parser of the J2SE 1.4 Java Runtime Environment (JRE).

For example, if you are using Xerces, you will have to specify the parser class as org.apache.xerces.parsers.SAXParser.

Note that testing indicates that the default Crimson parser included in Sun's J2SE JRE does not support validation by XML Schema. The Apache Xerces parser does validate against Schemas; other commercial and open source parsers may also validate against Schemas. (JSTOR and the Harvard University Library do not endorse or recommend the use of any particular XML parser; the previous discussion is provided solely for information purposes.)

2 Coverage

The XML-hul module recognizes and validates the following public profiles:

3 Well-Formedness

JHOVE uses the criteria for XML well-formedness defined by [XML].

4 Validity

JHOVE uses the criteria for XML validity defined by [XML].

Note that the concept of validity applies only to XML files that explicitly reference a DTD or XML Schema. JHOVE can determine if either of these conditions are met and if so, it will invoke automatically the SAX2 parser in a validating mode. Otherwise, the parser is invoked in a manner that only checks for well-formedness.

5 Representation Information

The MIME type is reported as: application/xml

In addition to the standard JHOVE representation information, the following XML-specific properties are reported:

Note that the notations and entities reported above are only those that appear in the XML file, not all of those that are defined in the DTD or XML Schema associated with the file.

6 Additional Module Properties

Copyright 2004-2005 by JSTOR and the President and Fellows of Harvard College. Used by permission.
Last updated 2005-05-09