JHOVE - JSTOR/Harvard Object Validation Environment

1 Introduction

The concept of representation format, or type, permeates all technical areas of digital repositories. Policy and processing decisions regarding object ingest, storage, access, and preservation are frequently conditioned on a per-format basis. In order to achieve necessary operational efficiencies, repositories need to be able to automate these procedures to the fullest extent possible.

JSTOR and the Harvard University Library are collaborating on a project to develop an extensible framework for format validation: JHOVE (pronounced "jove"), the JSTOR/Harvard Object Validation Environment.

JHOVE provides functions to perform format-specific identification, validation, and characterization of digital objects.

Identification, validation, and characterization actions are frequently necessary during routine operation of digital repositories and for digital preservation activities. These actions are performed by modules. The output from JHOVE is controlled by output handlers. JHOVE uses an extensible plug-in architecture; it can be configured at the time of its invocation to include whatever specific format modules and output handlers that are desired. The initial release of JHOVE includes modules for arbitrary byte streams, ASCII and UTF-8 encoded text, GIF, JPEG2000, and JPEG, and TIFF images, AIFF and WAVE audio, PDF, HTML, and XML; and text and XML output handlers.

2 Use Cases

Potential use cases for JHOVE include:

  1. Identification
    1. "I have an object; what format is it?"
  2. Validation
    1. "I have an object that purports to of format F; is it?"
    2. "I have an object of format F; does it meet profile P of F?"
    3. "I have an object of format F and external metadata about F in schema S; are they consistent?"
  3. Characterization
    1. "I have an object of format F; what are its salient properties (given in schema S)?"

In terms of the OAIS Reference Model [ISO/IEC 14721], JHOVE can be integrated into repository workflows with respect to Submission Information Package (SIP) creation and ingest validation.

3 Architecture

JHOVE is designed as a layered architecture with an API (with well-defined, public interfaces) invoked by a thin application layer for a stand-alone, command line tool, applicable for batch and interactive operation. The API can be used on its own to create other compatible tools.

5 Implementation

JHOVE is implemented as a Java application, written to conform to J2SE 1.4, using the Sun SDK 1.4.1. JHOVE should be usable on any Unix, Windows, or OS X platform with an appropriate J2SE installation.

JHOVE can be invoked with two interfaces:

  1. A command-line interface
  2. A Swing-based GUI interface

6 Tutorial

A tutorial on how to use JHOVE is available.

7 Standard Modules

The following standard modules are available:

  1. AIFF-hul: Audio Interchange File Format
  2. ASCII-hul: ASCII-encoded text
  3. BYTESTREAM: Arbitrary bytestreams (always well-formed and valid)
  4. GIF-hul: Graphics Exchange Format (GIF)
  5. HTML-hul: Hypertext Markup Language (HTML)
  6. JPEG-hul: Joint Photographic Experts Group (JPEG) raster images
  7. JPEG2000-hul: JPEG 2000
  8. PDF-hul: Page Description Format (PDF)
  9. TIFF-hul: Tagged Image File Format (TIFF) raster images
  10. UTF8-hul: UTF-8 encoded text [Unicode]
  11. WAVE: Audio for Windows [WAVE, WAVEFORMAT]
  12. XML-hul: Extensible Markup Language (XML)

8 License

JHOVE is made available by JSTOR and the President and Fellows of Harvard College under the GNU Lesser General Public License (LGPL).

Note that previous versions of JHOVE were released under the GNU General Public License (GPL).


Development of JHOVE was funded in part by the Andrew W. Mellon Foundation through a grant to JSTOR for the recently launched Electronic-Archiving Initiative.

The JHOVE logo is based on a manipulated three-color filter image of Jupiter and its moon Ganymede (P-20945C, Voyager 1-9, January 31, 1979) produced by the Jet Propulsion Laboratory from images taken by the Voyager 1 spacecraft on January 24, 1979. The original image and caption are made available from NASA by the National Space Science Data Center.

Copyright 2003-2009 by JSTOR and the President and Fellows of Harvard College. Used by permission.
Last updated 2009-02-25