|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectedu.harvard.hul.ois.jhove.module.pdf.Tokenizer
public abstract class Tokenizer
Tokenizer for PDF files. This is used in conjunction with the Parser, which assembled Tokens into higher-level constructs.
Field Summary | |
---|---|
protected int |
_ch
Character code of current character. |
protected java.io.RandomAccessFile |
_file
Source from which to read bytes. |
static char[] |
PDFDOCENCODING
Mapping between PDFDocEncoding and Unicode code points. |
Constructor Summary | |
---|---|
Tokenizer()
Constructor. |
Method Summary | |
---|---|
void |
addLanguageCode(java.lang.String langCode)
Add a string to the language codes |
abstract void |
backupChar()
Back up a byte so it will be read again. |
java.util.Set |
getLanguageCodes()
Return the set of language codes. |
Token |
getNext()
Parses out and returns a token from the input file. |
Token |
getNext(long max)
Parses out and returns a token from the input file. |
long |
getOffset()
Return the current offset into the file. |
boolean |
getPDFACompliant()
Returns the value of the pdfACompliant flag, which indicates that the tokenizer hasn't detected non-compliance. |
java.lang.String |
getWSString()
Returns the value of the last white space string read by the tokenizer. |
protected abstract void |
initStream(Stream token)
Initialization code for Stream object. |
abstract int |
readChar()
Get a character from the file or stream, using a buffer |
int |
readChar1(boolean utf16)
Read a character in one-byte or 2-byte format, as requested |
void |
scanMode(boolean flag)
If true, do not attempt to parse non-whitespace delimited tokens, e.g., literal and hexadecimal strings. |
abstract void |
seek(long offset)
Set the Tokenizer to a new position in the file. |
protected void |
seekReset(long offset)
Reset after a seek. |
void |
setEncrypted(boolean encrypted)
Tell this object that the file is or isn't encrypted. |
void |
setPDFACompliant(boolean pdfACompliant)
Set the value of the pdfACompliant flag. |
protected abstract void |
setStreamOffset(Stream token)
Sets the offset of a Stream to the current file position. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static char[] PDFDOCENCODING
protected java.io.RandomAccessFile _file
protected int _ch
Constructor Detail |
---|
public Tokenizer()
Method Detail |
---|
public Token getNext() throws java.io.IOException, PdfException
java.io.IOException
PdfException
public Token getNext(long max) throws java.io.IOException, PdfException
max
- Maximum allowable size of the token
java.io.IOException
PdfException
public long getOffset()
public java.util.Set getLanguageCodes()
public void setEncrypted(boolean encrypted)
public boolean getPDFACompliant()
true
is no guarantee that the file is compliant.
public void setPDFACompliant(boolean pdfACompliant)
public java.lang.String getWSString()
public abstract void seek(long offset) throws java.io.IOException, PdfException
offset
- The offset in bytes from the start of the file.
java.io.IOException
PdfException
protected void seekReset(long offset)
public abstract int readChar() throws java.io.IOException
java.io.IOException
public int readChar1(boolean utf16) throws java.io.IOException
java.io.IOException
public abstract void backupChar()
public void addLanguageCode(java.lang.String langCode)
public void scanMode(boolean flag)
flag
- Scan mode flagprotected abstract void initStream(Stream token) throws java.io.IOException, PdfException
java.io.IOException
PdfException
protected abstract void setStreamOffset(Stream token) throws java.io.IOException, PdfException
java.io.IOException
PdfException
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |