edu.harvard.hul.ois.jhove.module.html
Class HtmlDocDesc

java.lang.Object
  extended by edu.harvard.hul.ois.jhove.module.html.HtmlDocDesc
Direct Known Subclasses:
Html3_2DocDesc, Html4DocDesc

public abstract class HtmlDocDesc
extends java.lang.Object

This is an abstract class for processing an HTML document that has been parsed into a List of HtmlElements. It defines common behavior for all supported versions of HTML except XHTML. Subclasses modify this base as needed.

Author:
Gary McGath

Field Summary
protected  HtmlTagDesc bodyElement
          A representation of the BODY element.
protected static java.util.HashMap commonTags
          Generic list of supported tags.
protected  HtmlTagDesc framesetElement
          A representation of the FRAMESET element.
protected  HtmlTagDesc headElement
          A representation of the HEAD element.
protected static java.lang.String[] headings
          Header tags, which are invariant for all HTML versions.
protected  HtmlTagDesc htmlElement
          A representation of the HTML element.
protected  java.util.Map supportedElements
          List of supported tags for this version of HTML.
 
Constructor Summary
HtmlDocDesc()
          Consructor.
 
Method Summary
protected static void addRequiredAttribute(java.util.List atts, java.lang.String name)
          Adds an attribute to a List, with unrestricted values and type REQUIRED.
protected static void addSelfAttribute(java.util.List atts, java.lang.String name)
          Adds an attribute to a List, with the only permitted value being the name of the attribute.
protected static void addSimpleAttribute(java.util.List atts, java.lang.String name)
          Adds an attribute to a List, with unrestricted values and type IMPLIED.
protected static void addStringsToList(java.lang.String[] names, java.util.List lst)
          Adds all the Strings in an array to the end of a List.
 HtmlMetadata getMetadata()
          Returns the metadata for this document.
protected  void init()
          Initialization called by subclass constructors after supportedElements has been assigned.
protected  void pushElementStack(JHOpenTag tag)
          Pushes an element onto the element stack.
protected static void removeStringsFromList(java.util.List lst, java.lang.String[] strs)
          Removes excluded strings from a List.
 boolean validate(java.util.List elements, RepInfo info)
          Validates the document and puts interesting properties into the RepInfo.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

commonTags

protected static java.util.HashMap commonTags
Generic list of supported tags. For efficiency, this is generated only once. Subclasses will need to get a copy of this list and make additions or deletions as necessary. They must not modify any of the existing members of the list.


supportedElements

protected java.util.Map supportedElements
List of supported tags for this version of HTML. The subclass is responsible for generating this, typically using commonTags as a starting point.


htmlElement

protected HtmlTagDesc htmlElement
A representation of the HTML element.


headElement

protected HtmlTagDesc headElement
A representation of the HEAD element.


bodyElement

protected HtmlTagDesc bodyElement
A representation of the BODY element.


framesetElement

protected HtmlTagDesc framesetElement
A representation of the FRAMESET element.


headings

protected static java.lang.String[] headings
Header tags, which are invariant for all HTML versions.

Constructor Detail

HtmlDocDesc

public HtmlDocDesc()
Consructor.

Method Detail

validate

public boolean validate(java.util.List elements,
                        RepInfo info)
Validates the document and puts interesting properties into the RepInfo.

Parameters:
elements - The element list constructed by the parser
info - The RepInfo object which will be populated with properties

getMetadata

public HtmlMetadata getMetadata()
Returns the metadata for this document.


init

protected void init()
Initialization called by subclass constructors after supportedElements has been assigned.


addStringsToList

protected static void addStringsToList(java.lang.String[] names,
                                       java.util.List lst)
Adds all the Strings in an array to the end of a List.


addSimpleAttribute

protected static void addSimpleAttribute(java.util.List atts,
                                         java.lang.String name)
Adds an attribute to a List, with unrestricted values and type IMPLIED.


addRequiredAttribute

protected static void addRequiredAttribute(java.util.List atts,
                                           java.lang.String name)
Adds an attribute to a List, with unrestricted values and type REQUIRED.


addSelfAttribute

protected static void addSelfAttribute(java.util.List atts,
                                       java.lang.String name)
Adds an attribute to a List, with the only permitted value being the name of the attribute. This kind of attribute is normally represented in HTML without an explicit value; in fact, some (most?) readers won't permit an explicit value.


removeStringsFromList

protected static void removeStringsFromList(java.util.List lst,
                                            java.lang.String[] strs)
Removes excluded strings from a List.


pushElementStack

protected void pushElementStack(JHOpenTag tag)
Pushes an element onto the element stack.