edu.harvard.hul.ois.jhove.module.xml
Class XmlModuleHandler

java.lang.Object
  extended by org.xml.sax.helpers.DefaultHandler
      extended by edu.harvard.hul.ois.jhove.module.xml.XmlModuleHandler
All Implemented Interfaces:
org.xml.sax.ContentHandler, org.xml.sax.DTDHandler, org.xml.sax.EntityResolver, org.xml.sax.ErrorHandler

public class XmlModuleHandler
extends org.xml.sax.helpers.DefaultHandler

This handler does the parsing work of the XML module.

Author:
Gary McGath

Constructor Summary
XmlModuleHandler()
          Constructor.
 
Method Summary
 void characters(char[] ch, int start, int length)
          Processes PCData characters.
 void endElement(java.lang.String namespaceURI, java.lang.String localName, java.lang.String qName)
          The only action taken here is some bookkeeping in connection with the HTML metadata.
 void error(org.xml.sax.SAXParseException e)
          Processes a parsing exception.
 java.util.Set<java.lang.String> getAttributeValues()
          Returns the set of attribute values.
 java.lang.String getDTDURI()
          Returns the DTD URI.
 HtmlMetadata getHtmlMetadata()
          Returns the HTML metadata object.
 java.util.List<Message> getMessages()
          Returns the List of messages generated during the parse.
 java.util.Map<java.lang.String,java.lang.String> getNamespaces()
          Returns the map of prefixes to namespaces.
 java.util.List<java.lang.String[]> getNotations()
          Returns the list of notations.
 java.util.List<ProcessingInstructionInfo> getProcessingInstructions()
          Returns the List of processing instructions.
 java.lang.String getRoot()
          Returns the qualified name of the root element.
 java.util.List<SchemaInfo> getSchemas()
          Returns the list of schemas.
 boolean getSigFlag()
          Returns true if we have seen an element or a processing instruction, which implies that we've seen an XML declaration.
 java.util.List<java.lang.String[]> getUnparsedEntities()
          Returns the list of unparsed entities.
 boolean hasSchemaURI(SchemaInfo newinfo)
           
 boolean isValid()
          Returns the validity state.
 void notationDecl(java.lang.String name, java.lang.String publicID, java.lang.String systemID)
          Puts all notations into the notation list.
 void processingInstruction(java.lang.String target, java.lang.String data)
          Handles a processing instruction.
 org.xml.sax.InputSource resolveEntity(java.lang.String publicId, java.lang.String systemId)
          Overrides standard resolveEntity.
 void setLocalSchemas(java.util.Map<java.lang.String,java.io.File> schemas)
          Sets a map of schema URIs to local files.
 void setXhtmlFlag(boolean flag)
          Sets the value of the XHTML flag.
 void startElement(java.lang.String namespaceURI, java.lang.String localName, java.lang.String qualifiedName, org.xml.sax.Attributes atts)
          Looks for the first element encountered.
 void startPrefixMapping(java.lang.String prefix, java.lang.String uri)
          Begin the scope of a prefix-URI Namespace mapping.
 void unparsedEntityDecl(java.lang.String name, java.lang.String publicId, java.lang.String systemId, java.lang.String notationName)
          Picks up unparsed entity declarations, after calling the superclass's unparsedEntityDecl, and puts their information into the unparsed entity declaration list as an array of four strings: [ name, publicId, systemId, notationName].
 void warning(org.xml.sax.SAXParseException e)
          Processes a warning.
 
Methods inherited from class org.xml.sax.helpers.DefaultHandler
endDocument, endPrefixMapping, fatalError, ignorableWhitespace, setDocumentLocator, skippedEntity, startDocument
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

XmlModuleHandler

public XmlModuleHandler()
Constructor.

Method Detail

setXhtmlFlag

public void setXhtmlFlag(boolean flag)
Sets the value of the XHTML flag. Special properties are extracted if this is an XHTML document.


setLocalSchemas

public void setLocalSchemas(java.util.Map<java.lang.String,java.io.File> schemas)
Sets a map of schema URIs to local files. This information comes from jhove.conf parameters.


getHtmlMetadata

public HtmlMetadata getHtmlMetadata()
Returns the HTML metadata object. Will be non-null only for a document recognized as XHTML.


startElement

public void startElement(java.lang.String namespaceURI,
                         java.lang.String localName,
                         java.lang.String qualifiedName,
                         org.xml.sax.Attributes atts)
                  throws org.xml.sax.SAXException
Looks for the first element encountered. Stores its name as the value to be returned by getRoot, qualified name by preference, local name if the qualified name isn't available.

Specified by:
startElement in interface org.xml.sax.ContentHandler
Overrides:
startElement in class org.xml.sax.helpers.DefaultHandler
Throws:
org.xml.sax.SAXException

endElement

public void endElement(java.lang.String namespaceURI,
                       java.lang.String localName,
                       java.lang.String qName)
The only action taken here is some bookkeeping in connection with the HTML metadata.

Specified by:
endElement in interface org.xml.sax.ContentHandler
Overrides:
endElement in class org.xml.sax.helpers.DefaultHandler

characters

public void characters(char[] ch,
                       int start,
                       int length)
Processes PCData characters. This does things only in connection with properties under construction in HTML metadata.

Specified by:
characters in interface org.xml.sax.ContentHandler
Overrides:
characters in class org.xml.sax.helpers.DefaultHandler

startPrefixMapping

public void startPrefixMapping(java.lang.String prefix,
                               java.lang.String uri)
                        throws org.xml.sax.SAXException
Begin the scope of a prefix-URI Namespace mapping. Prefixes mappings are stored in _namespaces.

Specified by:
startPrefixMapping in interface org.xml.sax.ContentHandler
Overrides:
startPrefixMapping in class org.xml.sax.helpers.DefaultHandler
Throws:
org.xml.sax.SAXException

processingInstruction

public void processingInstruction(java.lang.String target,
                                  java.lang.String data)
                           throws org.xml.sax.SAXException
Handles a processing instruction. Adds it to the list that will be returned by getProcessingInstructions. Each element of the list is an array of two Strings. Element 0 of the array is the target, and element 1 is the data.

Specified by:
processingInstruction in interface org.xml.sax.ContentHandler
Overrides:
processingInstruction in class org.xml.sax.helpers.DefaultHandler
Throws:
org.xml.sax.SAXException

notationDecl

public void notationDecl(java.lang.String name,
                         java.lang.String publicID,
                         java.lang.String systemID)
                  throws org.xml.sax.SAXException
Puts all notations into the notation list. A list entry is a String[3], consisting of name, public ID, and system ID.

Specified by:
notationDecl in interface org.xml.sax.DTDHandler
Overrides:
notationDecl in class org.xml.sax.helpers.DefaultHandler
Throws:
org.xml.sax.SAXException

resolveEntity

public org.xml.sax.InputSource resolveEntity(java.lang.String publicId,
                                             java.lang.String systemId)
                                      throws org.xml.sax.SAXException
Overrides standard resolveEntity. First looks for DTD and entity files that are stored as resources, and uses those if available. (Faster and more reliable than grabbing them over the Net.) If that fails, calls the superclass's resolveEntity. Regardless, it then looks for anything that appears to be a DTD and puts it in the DTD URI field. If the superclass's attempt to resolve the entity results in an IOException, we just ignore it.

Specified by:
resolveEntity in interface org.xml.sax.EntityResolver
Overrides:
resolveEntity in class org.xml.sax.helpers.DefaultHandler
Throws:
org.xml.sax.SAXException

unparsedEntityDecl

public void unparsedEntityDecl(java.lang.String name,
                               java.lang.String publicId,
                               java.lang.String systemId,
                               java.lang.String notationName)
                        throws org.xml.sax.SAXException
Picks up unparsed entity declarations, after calling the superclass's unparsedEntityDecl, and puts their information into the unparsed entity declaration list as an array of four strings: [ name, publicId, systemId, notationName]. Null values are converted into empty strings.

Specified by:
unparsedEntityDecl in interface org.xml.sax.DTDHandler
Overrides:
unparsedEntityDecl in class org.xml.sax.helpers.DefaultHandler
Throws:
org.xml.sax.SAXException

warning

public void warning(org.xml.sax.SAXParseException e)
Processes a warning. We just add an InfoMessage.

Specified by:
warning in interface org.xml.sax.ErrorHandler
Overrides:
warning in class org.xml.sax.helpers.DefaultHandler

error

public void error(org.xml.sax.SAXParseException e)
Processes a parsing exception. An ill-formed piece of XML will get a fatalError (I think), so we can assume that any error here indicates only invalidity.

Specified by:
error in interface org.xml.sax.ErrorHandler
Overrides:
error in class org.xml.sax.helpers.DefaultHandler

getAttributeValues

public java.util.Set<java.lang.String> getAttributeValues()
Returns the set of attribute values.


getSchemas

public java.util.List<SchemaInfo> getSchemas()
Returns the list of schemas. The elements of the list are Strings, giving the URI's for the schemas.


getUnparsedEntities

public java.util.List<java.lang.String[]> getUnparsedEntities()
Returns the list of unparsed entities. The elements of the list are arrays of four Strings, giving the name, public ID, system ID and notation name respectively.


getNamespaces

public java.util.Map<java.lang.String,java.lang.String> getNamespaces()
Returns the map of prefixes to namespaces. The keys and values are Strings.


getDTDURI

public java.lang.String getDTDURI()
Returns the DTD URI. May be null.


getProcessingInstructions

public java.util.List<ProcessingInstructionInfo> getProcessingInstructions()
Returns the List of processing instructions. Each element is an array of two strings, giving the target and data respectively.


getNotations

public java.util.List<java.lang.String[]> getNotations()
Returns the list of notations. Each is an array String[3]: name, public ID, and system ID.


getRoot

public java.lang.String getRoot()
Returns the qualified name of the root element.


getMessages

public java.util.List<Message> getMessages()
Returns the List of messages generated during the parse.


isValid

public boolean isValid()
Returns the validity state. If error has been called, the return value will be false.


getSigFlag

public boolean getSigFlag()
Returns true if we have seen an element or a processing instruction, which implies that we've seen an XML declaration.


hasSchemaURI

public boolean hasSchemaURI(SchemaInfo newinfo)