Home | Tutorial | Documentation | PDF-hul Module | Distribution | Links |
The PDF-hul module recognizes and validates the PDF (Portable Document Format) format [PDF 1.4, PDF 1.5, PDF 1.6]. Documents created as PDF 1.7 will be identified as such, but PDF 1.7 is not supported, and documents using features specific to PDF 1.7 or later may be reported as not well-formed or not valid.
The module is invoked by the:
command line option.jhove ... -m PDF-hul ...
Parameters may be set in the configuration file to control the amount of information supplied by the module. (In earlier versions of JHOVE, these were set by the -p option of the command line, and added information rather than reducing it.) These parameters are set in the <param> element under the <module> element. The parameters may be specified as a string of letters, or as separate one-letter parameters, e.g.:
or<param>apn500</param>
The parameters function as flags with the following significance:<param>a</param> <param>p</param> <param>n500</param>
a Suppress document annotations f Suppress document font information o Suppress document outline p Suppress document page structure n (JHOVE 1.2) Specify maximum number of fonts to report. Must be followed by a number, e.g., n500
By default, document annotations, font information, and outlines are all displayed; they may be suppressed to reduce the size of the JHOVE output. In earlier versions of JHOVE, they were suppressed by default.
Some PDF files have thousands of fonts, and attempting to report them all can make JHOVE run out of memory. By default, a maximum of 1000 fonts will be reported. If there are more fonts, an informational message will report the total number and state that some have been omitted. This parameter will be available with the release of Jhove 1.2.
The PDF-hul module recognizes and validates the following public profiles:
The following criteria must be met by a PDF object for JHOVE to consider it well-formed:
In general, a file is well-formed if it has a header:
a body consisting of well-formed objects; a cross-reference table; and a trailer defining the cross-reference table size, and an indirect reference to the document catalog dictionary, and ending with:%PDF-m.n
%%EOF
The following criteria must be met by a PDF file for JHOVE to consider it valid:
The PDF-HUL module does not check certain aspects of a PDF file, primarily because thoroughly checking these would require access to proprietary compression and encryption algorithms. The following are not checked:
The MIME type is reported as: application/pdf
The PDF version is determined by the data specified in the PDF header and the Version key of the document catalog dictionary. In the event that these two values do not match, the Version key is taken as the authoritative value.
The PDF/X-1 profile is for pre-press data exchange using CMYK data [PDF/X-1].
The PDF/X-1 profile is for pre-press data exchange using CMYK and spot color data [PDF/X-1a].
The PDF/X-2 profile is for partial pre-press data exchange [PDF/X-2].
The PDF/X-3 profile is for pre-press data exchange using color-managed workflows [PDF/X-3].
The Linearized PDF profile is for optimized viewing over a network [PDF 1.4]
The Tagged PDF profile provides access to higher-level structural and semantic information contained in PDF files [PDF 1.4]
The PDF/A profile is for long-term preservation of electronic documents [PDF/A].
Note that the PDF module does not parse the contents on streams, so it cannot determine conformance to PDF/A to the degree required by the ISO standard.
Home | Tutorial | Documentation | PDF-hul Module | Distribution | Links |