XML-hul Module
1 Introduction
The XML-hul module recognizes and validates the XML (Extensible Markup
Language) format.
[XML].
The module is invoked by the:
jhove ... -m XML-hul [-x sax-class] ...
command line option.
The XML-hul module can use any XML parser that conforms to the
SAX2 interfaces.
Note that if the SAX2 optional LexicalHandler interface isn't supported by the
parser, JHOVE will only be able to report a restricted set of
representation information.
The actual parser used is either:
- The parser specified by the -x sax-class command line
option (whose class file must be found on the CLASSPATH at the
time of execution);
- The value of the edu.harvard.hul.ois.jhove.saxClass property
in the properties file
${user.home}/jhove/jhove.properties properties file,
where ${user.home}
is the standard Java user.home property; or
- The default parser of the
J2SE 1.4
Java Runtime Environment (JRE).
For example, if you are using Xerces, you will have to specify the parser class as
org.apache.xerces.parsers.SAXParser
.
Note that testing indicates that the default
Crimson parser included in
Sun's
J2SE
JRE
does not support validation by XML
Schema.
The Apache
Xerces parser does validate
against Schemas;
other commercial and open source parsers may also validate against Schemas.
(JSTOR and the Harvard University Library do not endorse or recommend
the use of any particular XML parser; the previous discussion is provided
solely for information purposes.)
2 Coverage
The XML-hul module recognizes and validates the following public profiles:
3 Well-Formedness
JHOVE uses the criteria for XML well-formedness defined by
[XML].
4 Validity
JHOVE uses the criteria for XML validity defined by
[XML].
Note that the concept of validity applies only to XML files that explicitly
reference a DTD or XML Schema.
JHOVE can determine if either of these conditions are met and if so,
it will invoke automatically the SAX2 parser in a validating mode.
Otherwise, the parser is invoked in a manner that only checks for
well-formedness.
5 Representation Information
The MIME type is reported as: application/xml
In addition to the standard JHOVE
representation information, the following
XML-specific properties are reported:
- Property "XMLMetadata" of type PROPERTY and arity LIST
- Property "Version" of type STRING and arity SCALAR
- Property "Encoding" of type STRING and arity SCALAR
- Property "Standalone" of type BOOLEAN and arity SCALAR
- Property "DTD" of type PROPERTY and arity LIST (if a DTD is specified)
- Property "PublicID" of type STRING and arity SCALAR
- Property "SystemID" of type STRING and arity SCALAR
- Property "InternalSubset" of type BOOLEAN and arity SCALAR
- Property "Schemas" of type PROPERTY and arity LIST (if an XML Schema
is specified)
- Property "Schema" of type PROPERTY and arity ARRAY
- Property "NamespaceURI" of type STRING and arity SCALAR
- Property "SchemaLocation" of type STRING and arity SCALAR
- Property "Root" of type STRING and arity SCALAR
- Property "Namespaces" of type PROPERTY and arity LIST
- Property "Namespace" of type PROPERTY and arity ARRAY
- Property "Prefix" of type STRING and arity SCALAR
- Property "URI" of type STRING and arity SCALAR
- Property "Notations" of type PROPERTY and arity LIST (if there are any)
- Property "Notation" of type PROPERTY and arity SCALAR
- Property "Name" of type STRING and arity SCALAR
- Property "PublicID" of type STRING and arity SCALAR (if non-null)
- Property "SystemID" of type STRING and arity SCALAR (if non-null)
- Property "CharacterReferences" of type SCALAR and arity LIST
- Property "Entities" of type PROPERTY and arity LIST
(if there are any)
- Property "Entity" of type PROPERTY and arity SCALAR
- Property "Name" of type STRING and arity SCALAR
- Property "Type" of type STRING and arity SCALAR
must be: "Internal", "External parsed", or "External unparsed"
- Property "Value" of type STRING and arity SCALAR (if internal)
- Property "PublicID" of type STRING and arity SCALAR (if external
and non-null)
- Property "SystemID" of type STRING and arity SCALAR (if external
and non-null)
- Property "Notation" of type STRING and arity SCALAR (if unparsed)
- Property "ProcessingInstructions" of type PROPERTY and arity LIST
(if there are any)
- Property "ProcessingInstruction" of type PROPERTY and arity SCALAR
- Property "Target" of type STRING and arity SCALAR
- Property "Data" of type STRING and arity SCALAR
- Property "Comments" of type PROPERTY and arity LIST (if there are any)
- Property "Comment" of type STRING and arity SCALAR
Note that the notations and entities reported above are only those that appear
in the XML file, not all of those that are defined in the DTD or XML Schema
associated with the file.
6 Additional Module Properties
- Nominal file extension: .xml
Copyright 2004-2005 by JSTOR and the President and Fellows of Harvard College. Used by permission.
Last updated 2005-05-09