Selecting an XML parser (draft, 2007-03-30)

The default XML parser provided with Java 1.4 does not support schema validation. For this or other reasons, you may want to use a different XML parser with JHOVE. Xerces2 is a widely used parser which provides schema validation; any parser which conforms to Java's XML API may also be used. To use an XML parser with JHOVE, take the following steps:

1. Obtain the necessary jar files for the parser. In the case of Xerces, this will be xercesImpl.jar.

2. Add the jar file or files to the classpath that will be used when you invoke JHOVE. If you are invoking jhove (Unix/Linux) or jhove.bat (Windows) from the command line, then add the path to the parser jar file to EXTRA_JARS.

If you have been invoking JhoveViewer.jar by double-clicking it, you will need to create a command line file to run it with a third-party XML parser. The simplest way to do this is to make a modified copy of jhove or jhove.bat and change the line which runs JHOVE (the last line of the file) to:
${JAVA} -classpath $CP JhoveView   (Unix/Linux)
%JAVA% -classpath %CP% JhoveView   (Windows)
Then make the changes as with command line JHOVE.

If you are currently invoking JHOVE with a custom command-line script that uses the java -jar option, your classpath will not be recognized. Instead, put JhoveApp.jar or JhoveViewer.jar into your classpath and invoke the main class, Jhove or JhoveView. For example:
java -classpath bin/xerces-impl.jar:bin/JhoveApp.jar Jhove [parameters]

3. Specify the parser class in your command line or configuration file. The class to specify must be a subclass of org.xml.sax.XMLReader. If you're using Xerces and not creating your own subclass, this would usually be org.apache.xerces.parsers.SAXParser.

In the JHOVE command line, you would specify this as
  -x [sax-class]

You can do this permanently by setting the value of $ARGS in the jhove or jhove.bat script file. Ignore the warning against not editing below the line.

Alternatively, you can specify the name of the parser in the Java properties used by the application. The name of the property is edu.harvard.hul.ois.jhove.saxClass. This can be specified in the command line, e.g.,
  java -Dedu.harvard.hul.ois.jhove.saxClass=org.apache.xerces.parsers.SAXParser ...

Since the viewer does not recognize JHOVE command line parameters, you must use a property to specify the parser when running the viewer.

Copyright 2007 by JSTOR and the President and Fellows of Harvard College. Used by permission.
Last updated 2007-04-02