edu.harvard.hul.ois.jhove.module.html
Class HtmlMetadata

java.lang.Object
  extended by edu.harvard.hul.ois.jhove.module.html.HtmlMetadata

public class HtmlMetadata
extends java.lang.Object

Repository for an HTML document's metadata. Also hold some state information, so that properties involving tags, attributes and pcdata can be constructed.

Author:
Gary McGath

Constructor Summary
HtmlMetadata()
          Constructor.
 
Method Summary
 void addAbbr(Property prop)
          Adds an ABBR tag's contents to the Meta property.
 void addCitation(java.lang.String text)
          Adds a CITE element's pcdata to the Citations property.
 void addDef(java.lang.String text)
          Adds a defined term to the Defined Terms property.
 void addEntity(java.lang.String entity)
          Adds a String to the Entities property.
 void addFrame(Property prop)
          Adds a FRAME tag's contents to the Meta property.
 void addImage(Property prop)
          Adds an item to the Images property.
 void addLanguage(java.lang.String lang)
          Add a language defined in an attribute of any element except the HTML element.
 void addLink(java.lang.String link)
          Adds a link to the Links property.
 void addMeta(Property prop)
          Adds a META tag's contents to the Meta property.
 void addScript(java.lang.String stype)
          Adds the language of a SCRIPT element to the Scripts property.
 void addToPropUnderConstruction(char[] ch, int start, int length)
          Adds PCDATA text to the property under construction.
 java.lang.String extractHttpEquivValue(Property prop, java.lang.String httpEquivValue)
          Extract the content value associated with a given httpEquiv.
 void finishPropUnderConstruction()
          Finishes any property under construction.
 java.lang.String getCharset()
           
 Property getPropUnderConstruction()
          Returns the "property under construction."
 java.lang.String getTitle()
          Returns the contents of the TITLE element.
 Utf8BlockMarker getUtf8BlockMarker()
          Returns the UTF8BlockMarker for the metadata.
 void setCharset(java.lang.String charset)
          Stores the charset defined in the HTML element.
 void setLanguage(java.lang.String lang)
          Stores the language defined in the HTML element.
 void setPropUnderConstruction(Property p)
          Sets a "property under construction".
 void setTitle(java.lang.String title)
          Stores the contents of the TITLE element.
 Property toProperty(TextMDMetadata _textMD)
          Converts the metadata to a Property.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

HtmlMetadata

public HtmlMetadata()
Constructor. Initializes to the empty state.

Method Detail

setTitle

public void setTitle(java.lang.String title)
Stores the contents of the TITLE element.


setLanguage

public void setLanguage(java.lang.String lang)
Stores the language defined in the HTML element.


addLanguage

public void addLanguage(java.lang.String lang)
Add a language defined in an attribute of any element except the HTML element.


addCitation

public void addCitation(java.lang.String text)
Adds a CITE element's pcdata to the Citations property.


addMeta

public void addMeta(Property prop)
Adds a META tag's contents to the Meta property.


extractHttpEquivValue

public java.lang.String extractHttpEquivValue(Property prop,
                                              java.lang.String httpEquivValue)
Extract the content value associated with a given httpEquiv.

Parameters:
prop - List containing the description of the meta tag
httpEquivValue - the httpEquiv to consider
Returns:
the content value

setCharset

public void setCharset(java.lang.String charset)
Stores the charset defined in the HTML element.


addFrame

public void addFrame(Property prop)
Adds a FRAME tag's contents to the Meta property.


addAbbr

public void addAbbr(Property prop)
Adds an ABBR tag's contents to the Meta property.


addLink

public void addLink(java.lang.String link)
Adds a link to the Links property.


addImage

public void addImage(Property prop)
Adds an item to the Images property.


addDef

public void addDef(java.lang.String text)
Adds a defined term to the Defined Terms property.


addScript

public void addScript(java.lang.String stype)
Adds the language of a SCRIPT element to the Scripts property.


addEntity

public void addEntity(java.lang.String entity)
Adds a String to the Entities property. This property is a SortedSet, so duplicates are not added, and the resulting set can be iterated in alphabetical order.


getUtf8BlockMarker

public Utf8BlockMarker getUtf8BlockMarker()
Returns the UTF8BlockMarker for the metadata.


getTitle

public java.lang.String getTitle()
Returns the contents of the TITLE element.


getCharset

public java.lang.String getCharset()

toProperty

public Property toProperty(TextMDMetadata _textMD)
Converts the metadata to a Property.


setPropUnderConstruction

public void setPropUnderConstruction(Property p)
Sets a "property under construction". This is generally called when an XML element is found, and the PCDATA must be incorporated into the property.


getPropUnderConstruction

public Property getPropUnderConstruction()
Returns the "property under construction."


addToPropUnderConstruction

public void addToPropUnderConstruction(char[] ch,
                                       int start,
                                       int length)
Adds PCDATA text to the property under construction. This may not all be provided in one lump, so it has to allow for multiple chunks.


finishPropUnderConstruction

public void finishPropUnderConstruction()
Finishes any property under construction. This is called when an end element is encountered.