PDF-hul Module

1 Introduction

The PDF-hul module recognizes and validates the PDF (Portable Document Format) format [PDF 1.4, PDF 1.5, PDF 1.6]. Documents created as PDF 1.7 will be identified as such, but PDF 1.7 is not supported, and documents using features specific to PDF 1.7 or later may be reported as not well-formed or not valid.

The module is invoked by the:

jhove ... -m PDF-hul ...
command line option.

Parameters may be set in the configuration file to control the amount of information supplied by the module. (In earlier versions of JHOVE, these were set by the -p option of the command line, and added information rather than reducing it.) These parameters are set in the <param> element under the <module> element. The parameters may be specified as a string of letters, or as separate one-letter parameters, e.g.:

The parameters function as flags with the following significance:
a  Suppress document annotations
fSuppress document font information
oSuppress document outline
pSuppress document page structure
n (JHOVE 1.2) Specify maximum number of fonts to report. Must be followed by a number, e.g., n500

By default, document annotations, font information, and outlines are all displayed; they may be suppressed to reduce the size of the JHOVE output. In earlier versions of JHOVE, they were suppressed by default.

Some PDF files have thousands of fonts, and attempting to report them all can make JHOVE run out of memory. By default, a maximum of 1000 fonts will be reported. If there are more fonts, an informational message will report the total number and state that some have been omitted. This parameter will be available with the release of Jhove 1.2.

2 Coverage

The PDF-hul module recognizes and validates the following public profiles:

3 Well-Formedness

The following criteria must be met by a PDF object for JHOVE to consider it well-formed:

4 Validity

4.1 Validity criteria

The following criteria must be met by a PDF file for JHOVE to consider it valid:

4.2 Limitations

The PDF-HUL module does not check certain aspects of a PDF file, primarily because thoroughly checking these would require access to proprietary compression and encryption algorithms. The following are not checked:

5 Representation Information

The MIME type is reported as: application/pdf

5.1 Profiles

6 Additional Module Properties

Copyright 2003-2008 by JSTOR and the President and Fellows of Harvard College. Used by permission.
Last updated 2008-02-26