edu.harvard.hul.ois.jhove.module.pdf
Class Literal

java.lang.Object
  extended by edu.harvard.hul.ois.jhove.module.pdf.Token
      extended by edu.harvard.hul.ois.jhove.module.pdf.StringValuedToken
          extended by edu.harvard.hul.ois.jhove.module.pdf.Literal
Direct Known Subclasses:
Hexadecimal

public class Literal
extends StringValuedToken

Class for Tokens which represent PDF strings. The class maintains a field for determining whether the string is encoded as PDF encoding or UTF-16. This is determined in the course of analyzing the characters for the token.


Field Summary
static char[] PDFDOCENCODING
          Mapping between PDFDocEncoding and Unicode code points.
 
Fields inherited from class edu.harvard.hul.ois.jhove.module.pdf.StringValuedToken
_rawBytes, _value
 
Constructor Summary
Literal()
          Creates an instance of a string literal
 
Method Summary
 void appendHex(int ch)
          Append a hex character.
 void convertHex()
          Convert the raw hex data.
 boolean isDate()
          Returns true if the string value is a parsable date.
 boolean isPDFACompliant()
          Returns true if this token doesn't violate any PDF/A rules, false if it does.
 boolean isPDFDocEncoding()
          Returns true if this string is in PDFDocEncoding, false if UTF-16.
 java.util.Date parseDate()
          Parse the string value to a date.
 long processLiteral(Tokenizer tok)
          Process the incoming characters into a string literal.
 void setPDFDocEncoding(boolean pdfDocEncoding)
          Sets the value of pDFDocEncoding.
 
Methods inherited from class edu.harvard.hul.ois.jhove.module.pdf.StringValuedToken
getRawBytes, getValue, setValue
 
Methods inherited from class edu.harvard.hul.ois.jhove.module.pdf.Token
isPdfACompliant, isSimpleToken
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

PDFDOCENCODING

public static char[] PDFDOCENCODING
Mapping between PDFDocEncoding and Unicode code points.

Constructor Detail

Literal

public Literal()
Creates an instance of a string literal

Method Detail

appendHex

public void appendHex(int ch)
               throws PdfException
Append a hex character. This is used only for hex literals (those that start with '<').

Parameters:
ch - The integer 8-bit code for a hex character
Throws:
PdfException

processLiteral

public long processLiteral(Tokenizer tok)
                    throws java.io.IOException
Process the incoming characters into a string literal. This is used for literals delimited by parentheses, as opposed to hex strings.

Parameters:
tok - The tokenizer, passed to give access to its getChar function.
Returns:
true if the character was processed normally, false if a terminating parenthesis was reached.
Throws:
java.io.IOException

convertHex

public void convertHex()
                throws PdfException
Convert the raw hex data. Two buffers are saved: _rawBytes for the untranslated hex-encoded data, and _value for the PDF or UTF encoded string.

Throws:
PdfException

isPDFDocEncoding

public boolean isPDFDocEncoding()
Returns true if this string is in PDFDocEncoding, false if UTF-16.


setPDFDocEncoding

public void setPDFDocEncoding(boolean pdfDocEncoding)
Sets the value of pDFDocEncoding.


isDate

public boolean isDate()
Returns true if the string value is a parsable date. Conforms to the ASN.1 date format: D:YYYYMMDDHHmmSSOHH'mm' where everything before and after YYYY is optional. If we take this literally, the format is frighteningly ambiguous (imagine, for instance, leaving out hours but not minutes and seconds), so the checking is a bit loose.


parseDate

public java.util.Date parseDate()
Parse the string value to a date. PDF dates conform to the ASN.1 date format. This consists of D:YYYYMMDDHHmmSSOHH'mm' where everything before and after YYYY is optional. Adobe doesn't actually say so, but I'm assuming that if a field is included, everything to its left must be included, e.g., you can't have seconds but leave out minutes.


isPDFACompliant

public boolean isPDFACompliant()
Returns true if this token doesn't violate any PDF/A rules, false if it does.