Parser (JHOVE Documentation)

Overview

Package

Class

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

edu.harvard.hul.ois.jhove.module.pdf
Class Parser

java.lang.Object
  edu.harvard.hul.ois.jhove.module.pdf.Parser

public class Parser
extends java.lang.Object
extends java.lang.Object

The Parser class implements some limited syntactic analysis for PDF. It isn't by any means intended to be a full parser. Its main job is to track nesting of syntactic elements such as dictionary and array beginnings and ends.

Constructor Summary
`Parser(Tokenizer tokenizer)` Constructor.

Method Summary
`int`	`getArrayDepth()` Returns the number of array starts not yet matched by array ends.
`int`	`getDictDepth()` Returns the number of dictionary starts not yet matched by dictionary ends.
`java.util.Set`	`getLanguageCodes()` Returns the language code set from the Tokenizer.
`Token`	`getNext()` Gets a token.
`Token`	`getNext(java.lang.Class clas, java.lang.String errMsg)` A class-sensitive version of getNext.
`Token`	`getNext(long max)` Gets a token.
`long`	`getOffset()` Returns the current offset into the file.
`boolean`	`getPDFACompliant()` Returns false if either the parser or the tokenizer has detected non-compliance with PDF/A restrictions.
`java.lang.String`	`getWSString()` Returns the Tokenizer's current whitespace string.
`PdfArray`	`readArray()` Reads an array.
`PdfDictionary`	`readDictionary()` Reads a dictionary.
`PdfObject`	`readObject()` Reads an object.
`PdfObject`	`readObjectDef()` Reads an object definition, from wherever we are in the stream to the completion of one full object after the obj keyword.
`PdfObject`	`readObjectDef(Numeric objNumTok)` Reads an object definition, given the first numeric object, which has already been read and is passed as an argument.
`void`	`reset()` Clear the state of the parser so that it can start reading at a different place in the file.
`void`	`resetLoose()` Clear the state of the parser so that it can start reading at a different place in the file and ignore any nesting errors.
`void`	`scanMode(boolean flag)` If true, do not attempt to parse non-whitespace delimited tokens, e.g., literal and hexadecimal strings.
`void`	`seek(long offset)` Positions the file to the specified offset, and resets the state for a new token stream.
`void`	`setEncrypted(boolean encrypted)` Tells this Parser, and its Tokenizer, whether the file is encrypted.
`void`	`setObjectMap(java.util.Map objectMap)` Set the object map on which the parser will work.
`void`	`setPDFACompliant(boolean pdfACompliant)` Set the value of the pdfACompliant flag.

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Constructor Detail

Parser

public Parser(Tokenizer tokenizer)

Constructor. A Parser works with a Tokenizer that feeds it tokens.

Parameters:: tokenizer - The Tokenizer which the parser will use

Method Detail

setObjectMap

public void setObjectMap(java.util.Map objectMap)

Set the object map on which the parser will work.

reset

public void reset()

Clear the state of the parser so that it can start reading at a different place in the file. Clears the stack and the dictionary and array depth counters.

resetLoose

public void resetLoose()

Clear the state of the parser so that it can start reading at a different place in the file and ignore any nesting errors. Sets the stack and the dictionary and array depth counters to a large number so that nesting exceptions won't be thrown.

getNext

public Token getNext()
              throws java.io.IOException,
                     PdfException

Gets a token. Uses Tokenizer.getNext, and keeps track of the depth of dictionary and array nesting.

Throws:: java.io.IOException; PdfException

getNext

public Token getNext(long max)
              throws java.io.IOException,
                     PdfException

Gets a token. Uses Tokenizer.getNext, and keeps track of the depth of dictionary and array nesting.

Parameters:: max - Maximum allowable size of the token
Throws:: java.io.IOException; PdfException

getNext

public Token getNext(java.lang.Class clas,
                     java.lang.String errMsg)
              throws java.io.IOException,
                     PdfException

A class-sensitive version of getNext. The token which is obtained must be of the specified class (or a subclass thereof), or a PdfInvalidException with message errMsg will be thrown.

Throws:: java.io.IOException; PdfException

getDictDepth

public int getDictDepth()

Returns the number of dictionary starts not yet matched by dictionary ends.

setEncrypted

public void setEncrypted(boolean encrypted)

Tells this Parser, and its Tokenizer, whether the file is encrypted.

getArrayDepth

public int getArrayDepth()

Returns the number of array starts not yet matched by array ends.

getWSString

public java.lang.String getWSString()

Returns the Tokenizer's current whitespace string.

getLanguageCodes

public java.util.Set getLanguageCodes()

Returns the language code set from the Tokenizer.

getPDFACompliant

public boolean getPDFACompliant()

Returns false if either the parser or the tokenizer has detected non-compliance with PDF/A restrictions. A value of true is no guarantee that the file is compliant.

setPDFACompliant

public void setPDFACompliant(boolean pdfACompliant)

Set the value of the pdfACompliant flag. This may be used to clear previous detection of noncompliance. If the parameter has a value of true, the tokenizer's pdfACompliant flag is also set to true.

readObjectDef

public PdfObject readObjectDef()
                        throws java.io.IOException,
                               PdfException

Reads an object definition, from wherever we are in the stream to the completion of one full object after the obj keyword.

Throws:: java.io.IOException; PdfException

readObjectDef

public PdfObject readObjectDef(Numeric objNumTok)
                        throws java.io.IOException,
                               PdfException

Reads an object definition, given the first numeric object, which has already been read and is passed as an argument. This is called by the no-argument readObjectDef; the only other case in which it will be called is for a cross-reference stream, which can be distinguished from a cross-reference table only once the first token is read.

Throws:: java.io.IOException; PdfException

readObject

public PdfObject readObject()
                     throws java.io.IOException,
                            PdfException

Reads an object. By design, this reader has a number of limitations.

It doesn't retain the contents of streams
It doesn't recognize a stream when it's pointing at the stream's dictionary; it will just read the dictionary

Functions which it uses may call it recursively to build up structures. If it encounters a token inappropriate for an object start, it throws a PdfException on which getToken() may be called to retrieve that token.

Throws:: java.io.IOException; PdfException

readArray

public PdfArray readArray()
                   throws java.io.IOException,
                          PdfException

Reads an array. When this is called, we have already read the ArrayStart token, and arrayDepth has been incremented to reflect this.

Throws:: java.io.IOException; PdfException

readDictionary

public PdfDictionary readDictionary()
                             throws java.io.IOException,
                                    PdfException

Reads a dictionary. When this is called, we have already read the DictionaryStart token, and dictDepth has been incremented to reflect this. Only for use in this special case, where we're picking up a dictionary in midstream.

Throws:: java.io.IOException; PdfException

getOffset

public long getOffset()

Returns the current offset into the file.

seek

public void seek(long offset)
          throws java.io.IOException,
                 PdfException

Positions the file to the specified offset, and resets the state for a new token stream.

Throws:: java.io.IOException; PdfException

scanMode

public void scanMode(boolean flag)

If true, do not attempt to parse non-whitespace delimited tokens, e.g., literal and hexadecimal strings.

Parameters:: flag - Scan mode flag

Overview

Package

Class

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

edu.harvard.hul.ois.jhove.module.pdf Class Parser

Parser

setObjectMap

reset

resetLoose

getNext

getNext

getNext

getDictDepth

setEncrypted

getArrayDepth

getWSString

getLanguageCodes

getPDFACompliant

setPDFACompliant

readObjectDef

readObjectDef

readObject

readArray

readDictionary

getOffset

seek

scanMode

edu.harvard.hul.ois.jhove.module.pdf
Class Parser