Home | Tutorial | Documentation | Distribution | Links |
JHOVE can be downloaded from the JHOVE SourceForge site.
JHOVE should be usable on any Unix, Windows, or OS X platform with the appropriate J2SE installation.
JHOVE is distributed as a GZIP'ed TAR and ZIP files under the terms of the GNU Lesser General Public License (LGPL).
Modules for additional file formats have been written by authors outside of the Harvard University Library (HUL). These are available for downloading. Please note that these modules are not supported by HUL.
The files of the standard HUL-supported distribution packages can be unpacked by:
orgunzip jhove-1_1.tar.gz tar xvf jhove-1_1.tar gunzip jhove-examples.tar.gz tar xvf jhove-examples.tar
unzip jhove-1_1.zip unzip jhove-example.zip
which will produce the following directory structure:
jhove/ COPYING # GNU Lesser General Public License LICENSE # JHOVE license information README RELEASENOTES # JHOVE release notes bin/ jhove.jar jhove-handler.jar jhove-module.jar JhoveApp.jar JhoveView.jar build.xml classes/ build.xml edu/harvard/hul/ois/jhove/ build.xml *.class *.java handler/ build.xml *.class *.java META-INF/ MANIFEST.MF audit/ build.xml *.class *.java META-INF/ MANIFEST.MF module/ build.xml *.class *.java META-INF/ MANIFEST.MF aiff/ build.xml *.class *.java gif/ build.xml *.class *.java html/ build.xml *.class *.java xhtml-lat1.ent xhtml-special.ent xhtml-symbol.ent xhtml1-frameset.dtd xhtml1-strict.dtd xhtml1-transitional.dtd xhtml11-flat.dtd iff/ build.xml *.class *.java jpeg/ build.xml *.class *.java jpeg2000/ build.xml *.class *.java pdf/ build.xml *.class *.java tiff/ build.xml *.class *.java wave/ build.xml *.class *.java xml/ build.xml *.class *.java viewer/ build.xml *.class *.java META-INF/ MANIFEST.MF Jhove.* META-INF/ MANIFEST.MF ADump.* GDump.* JDump.* J2Dump.* PDump.* TDump.* WDump.* conf/ jhove.conf doc/ *.html ... examples/ ascii/... gif/ ... pdf/ ... tiff/ ... utf-8/... adump adump.bat j2dump j2dump.bat jdump jdump.bat jhove jhove.bat pdump pdump.bat tdump tdump.bat wdump wdump.bat
After unpacking the distribution, edit the configuration file, jhove/conf/jhove.conf, and set the <jhoveHome> element to the absolute pathname of the JHOVE installation, or home, directory and the temporary directory (in which temporary files are created):
The JHOVE home directory is the top-most directory in the distribution TAR or ZIP file. On Unix systems, /var/tmp is an appropriate temporary directory; on Windows, C:\Temp. For example, if the distribution TAR file is disaggregated on a Unix system in the directory "/users/sampleuser/projects", then the configuration file should read:<jhoveHome>jhove-installation-directory</jhoveHome> <tempDirectory>temporary-directory</tempDirectory>
<jhoveHome>/users/sampleuser/projects/jhove</jhoveHome> <tempDirectory>/var/tmp</jhoveHome>
In the JHOVE home directory, edit the JHOVE Bourne shell driver script, jhove, and set the JHOVE home directory, Java home directory, and Java interpreter:
where JHOVE_HOME is set to specify the absolute pathname of the JHOVE home directory; JAVA_HOME is set to specify the absolute pathname of the Java home directory; and JAVA is set to specify the absolute pathname of the Java interpreter. For example:JHOVE_HOME=jhove-home-directory JAVA_HOME=java-home-directory JAVA=java-interpreter
JHOVE_HOME=/users/sampleuser/projects/jhove JAVA_HOME=/usr/local/j2re1.4.1_02 JAVA=$JAVA_HOME/bin/java
For Unix systems which support Perl, a Perl script is provided to facilitate the setting of the appropriate values without a lot of manual editing. From the JHOVE home directory, type
configure.pl jhove-home-directory java-home-directory java-interpreter
This will configure the jhove
script,
the various dump utilities, and
conf/jhove.conf
with the values you select.
This will not work on Windows systems.
For Window systems, the DOS shell driver script, jhove.bat, should be edited to supply the correct values for the JHOVE home directory, Java home directory, and Java interpreter:
SET JHOVE_HOME=jhove-home-directory SET JAVA_HOME=java-home-directory SET JAVA=%JAVA_HOME%\bin\java
For example:
SET JHOVE_HOME="C:\Program Files\jhove" SET JAVA_HOME="C:\Program Files\java\j2re1.4.1_02" SET JAVA=%JAVA_HOME%\bin\java
The quotation marks are necessary because of the embedded space characters. On Windows platforms it may also be necessary to add the Java bin subdirectory to the System PATH environment variable:
PATH=C:\Program Files\java\j2re1.4.1_02\bin;...Specific instructions on installing JHOVE in a Windows XP environment are available. For additional information on setting a Windows environment variable, consult your local documentation or system administrator.
At the time of its invocation, JHOVE performs dynamic configuration of its modules and output handlers based on a XML-formatted configuration file. The configuration file is specified by the first valid value defined as:
Note that the GUI interface only searches for the configuration file at the second and third locations listed above; it does not make use of the -c config option.
All format modules and output handlers must be specified in the XML-formatted configuration file, validatable against the XML Schema <http://hul.harvard.edu/ois/xml/xsd/jhove/jhoveConfig.xsd>. (In the following display, brackets [ and ] enclose optional configuration file elements.)
<?xml version="1.0"?> <jhoveConfig version="1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://hul.harvard.edu/ois/xml/ns/jhove/jhoveConfig" xsi:schemaLocation="http://hul.harvard.edu/ois/xml/ns/jhove/jhoveConfig http://hul.harvard.edu/ois/xml/xsd/jhove/jhoveConfig.xsd"> <jhoveHome>jhove-home-directory</jhoveHome> [ <defaultEncoding>encoding</defaultEncoding> ] [ <tempDirectory>directory</tempDirectory> ] [ <bufferSize>buffer</bufferSize> ] [ <mixVersion>version</mixVersion> ] [ <sigBytes>n</sigBytes> ] <module> <class>module-class-name</class> [ <init>optional-module-init-argument</init> ] [ <param>optional-module-parameter</param> ] ... </module> ... <outputHandler> <class>output-handler-class-name</class> </outputHandler> ... [ <logLevel>logging-level</logLevel> ] </jhoveConfig>
The optional <defaultEncoding> element specifies the default character encoding used by output handlers. This option can also be specified by the -e encoding command line argument. The default output encoding is UTF-8.
The optional <tempDirectory> element specifies the pathname of the directory in which temporary files are created. This option can also be specified by the -t directory command line argument. On most Unix systems, a reasonable temporary directory is "/var/tmp"; on Windows, "C:\temp".
The optional <bufferSize> element specifies the buffer size use for buffered I/O. This option can also be specified by the -b buffer command line argument.
The optional <mixVersion> element specifies the MIX schema version conformance for the output produced by the XML output handler. By default the handler output conforms to version 0.2 of the schema. For version 1.0 conformance, specify:
<mixVersion>1.0<mixVersion>
The optional <sigBytes> element specifies the maximum number of byte that JHOVE modules will examine looking for an internal signature (or magic number). The default value is 1024.
The optional <logLevel> element specifies the logging level, used by calls to the logging API. This option can also be specified by the -l log-level command line argument. The default is SEVERE.
All class names must be fully qualified with their package name, for example:
edu.harvard.hul.ois.jhove.module.AsciiModule edu.harvard.hul.ois.jhove.module.PdfModule edu.harvard.hul.ois.jhove.module.TiffModule edu.harvard.hul.ois.jhove.module.Utf8Module
The order in which format modules are defined is important; when performing a format identification operation, JHOVE will search for a matching module in the order in which the modules are defined in the configuration file. In general, the modules for more generic formats should come later in the list. For example, the standard module ASCII should be defined before the UTF-8 module, since all ASCII objects are, by definition, UTF-8 objects, but not vice versa.
The optional <init> element is used to pass a module-specific argument to a module at the time it is first instantiated within JHOVE. See the details for the individual modules to see if such an argument is defined. The use of the <init> argument is currently not defined for any of the standard JHOVE modules.
The optional and repeatable <param> element is used to pass a module-specific parameter to a module immediately prior to each invocation of the module's parse() method. See the details for the individual modules to see if such a parameter is defined.
In addition to the modules and output handlers specified in the configuration file, JHOVE is also always statically linked with the standard Bytestream module and Text and XML output handlers.
Invoking JHOVE on the example file control.txt:
should generate the following output:jhove -c conf/jhove.conf -k examples/ascii/control.txt
Jhove (Rel. 1.0, 2005-05-26) Date: 2005-05-12 10:20:43 EDT RepresentationInformation: examples/ascii/control.txt ReportingModule: ASCII-hul, Rel. 1.1 (2005-01-11) LastModified: 2003-09-09 16:56:50 EDT Size: 51 Format: ASCII Status: Well-formed and valid MIMEtype: text/plain; charset=US-ASCII ASCIIMetadata: LineEndings: LF ControlCharacters: TAB (0x09), VT (0x0B), FF (0x0C), SUB (0x1A) Checksum: bae406b6 Type: CRC32 Checksum: 2774395cac046bf2fd1898ebed8a5f9a Type: MD5 Checksum: 6753be3059ba36fd970ff620a7ebe3fff95c13fe Type: SHA-1
with the appropriate date and time values.
The JHOVE Swing-based GUI interface can be invoked from a command shell:
java -jar bin/JhoveView.jar
A tutorial on how to use JHOVE is available.
Development of JHOVE is funded in part by the Andrew W. Mellon Foundation through a grant to JSTOR for the recently launched Electronic-Archiving Initiative.
Home | Tutorial | Documentation | Distribution | Links |