Package edu.harvard.hul.ois.jhove.module.html

Contains supporting classes for the HTML-HUL module.

See:
          Description

Interface Summary
CharStream This interface describes a character stream that maintains line and column number positions of the characters.
ParseHtmlConstants  
 

Class Summary
DTDMapper Class to map public DTD ID's to files which are included with this HTML module.
Html3_2DocDesc This class describes the requirements of an HTML 3.2 document.
Html4_01FrameDocDesc This class describes the requirements of an HTML 4.01 Frameset document.
Html4_01StrictDocDesc This class describes the requirements of an HTML 4.01 Strict document.
Html4_01TFDocDesc Abstract class for the HTML 4.01 Transitional and Frameset document types.
Html4_01TransDocDesc This class describes the requirements of an HTML 4.01 Transitional document.
Html4_0FrameDocDesc This class describes the requirements of an HTML 4.01 Frameset document.
Html4_0StrictDocDesc This class describes the requirements of an HTML 4.0 Strict document.
Html4_0TFDocDesc Abstract class for the HTML 4.0 Transitional and Frameset document types.
Html4_0TransDocDesc This class describes the requirements of an HTML 4.0 Transitional document.
Html4DocDesc Abstract class for common features of HTML 4.0 and 4.01 documents.
Html4StrictDocDesc Abstract class for common features of HTML 4.0 and 4.01 strict documents.
Html4TFDocDesc Abstract class for common features of HTML 4.0 and 4.01 transitional and frameset documents.
HtmlAttributeDesc Class representing an abstract attribute of an HTML element.
HtmlCharStream An implementation of interface CharStream, where the stream is assumed to contain only ASCII characters (without unicode processing).
HtmlDocDesc This is an abstract class for processing an HTML document that has been parsed into a List of HtmlElements.
HtmlMetadata Repository for an HTML document's metadata.
HtmlSpecialToken Class for defining special items in HTML element and attribute definitions.
HtmlStack A LinkedList dressed up as a stack for processing HTML objects.
HtmlTagDesc This class defines the permitted behavior of a particular HTML tag.
HtmlTempTagDesc Subclass of HtmlTagDesc for temporary tags.
JHAttribute A description of an attribute within a JHOpenTag.
JHCloseTag Representation of a parsed HTML close tag.
JHComment Representation of a parsed HTML comment.
JHDoctype Representation of a parsed HTML DOCTYPE.
JHElement Abstract superclass for the representation of portions of an HTML file.
JHErrorElement A JHElement which signifies a syntactic error.
JHOpenTag Representation of a parsed HTML open tag, including its attributes.
JHPCData Representation of parsed HTML PCDATA.
JHXmlDecl Representation of an XML declaration.
ParseHtml  
ParseHtmlTokenManager  
SimpleCharStream An implementation of interface CharStream, where the stream is assumed to contain only ASCII characters (without unicode processing).
Token Describes the input token stream.
 

Exception Summary
ParseException This exception is thrown when parse errors are encountered.
 

Error Summary
TokenMgrError  
 

Package edu.harvard.hul.ois.jhove.module.html Description

Contains supporting classes for the HTML-HUL module.

This module uses code generated by JavaCC. The grammar file is ParseHtml.jj. It can be compiled using BuildParser.bat. Compiling it generates the following files:

In addition, HtmlCharStream.java has been created by manually modifying CharStream.java. If a future version of JavaCC changes CharStream.java, HtmlCharStream.java should be changed to match.

A number of DTD and Entity files have been stored with this package to facilitate resolution of Doctypes without having to get them over the Internet. These are the W3 Consortium's files, and no rights over them are claimed by including them here.
The list of files:

This module uses the XML-HUL module in validating XHTML files.