HTML-hul Module
1 Introduction
The HTML-hul module recognizes and validates the HTML (Hypertext Markup
Language) format.
[HTML].
The module is invoked by the:
jhove ... -m HTML-hul ...
command line option.
The HTML-hul module recognizes XHTML 1.0 (including transitional,
frameset and strict) and 1.1, making use of the XML-hul module.
If the XML-hul module is not available, only limited information
will be provided on XHTML documents.
2 Coverage
The HTML-hul module recognizes and validates the following public profiles:
3 Well-Formedness
For the HTML profiles JHOVE uses the criteria for HTML
well-formedness defined by
[HTML 3.2,
HTML 4.0,
HTML 4.01];
for the XHTML profiles, JHOVE uses the criteria defined by
[XML].
Specifically, a well-formed HTML document must have no
syntactic errors, and must contain at least one of the
tags HTML, HEAD, BODY or TITLE.
4 Validity
For the HTML profiles JHOVE uses the criteria for HTML validity defined by
[HTML 3.2,
HTML 4.0,
HTML 4.01];
for the XHTML profiles
JHOVE uses the criteria defined by
[XHTML 1.0,
XHTML 1.1].
5 Representation Information
The MIME type is reported as: text/html
[RFC 2854]
In addition to the standard JHOVE
representation information, the following
HTML-specific properties are reported:
- Property "XMLMetadata" of type PROPERTY and arity LIST (for XHTML only;
see the documentation of the XML-hul module
for the contents of this property).
- Property "HTMLMetadata" of type PROPERTY and arity LIST
- Property "PrimaryLanguage" of type STRING
- Property "OtherLanguages" of type STRING and arity SET
- Property "Title" of type STRING
- Property "MetaTags" of type PROPERTY and arity LIST
- Property "Name" of type STRING
- Property "Httpequiv" of type STRING
- Property "Content" of type STRING
- Property "Frames" of type PROPERTY and arity LIST
- Property "Name" of type STRING
- Property "Title" of type STRING
- Property "Longdesc" of type STRING
- Property "Src" of type STRING
- Property "Links" of type STRING and arity LIST
- Property "Scripts" of type STRING and arity LIST
- Property "Images" of type PROPERTY and arity LIST
- Property "Alt" of type STRING
- Property "Longdesc" of type STRING
- Property "Src" of type STRING
- Property "Height" of type STRING
- Property "Width" of type STRING
- Property "Citations" of type STRING and arity LIST
- Property "DefinedTerms" of type STRING and arity LIST
- Property "Abbreviations" of type PROPERTY and arity LIST
- Property "Text" of type STRING
- Property "Title" of type STRING
- Property "Entities" of type STRING and arity LIST
- Property "UnicodeEntityBlocks" of type STRING and arity LIST
6 Additional Module Properties
- Nominal file extension: .html, .htm
- Macintosh OS file type: TEXT
Copyright 2004-2005 by JSTOR and the President and Fellows of Harvard College. Used by permission.
Last updated 2005-05-09