Home | Tutorial | Documentation | UTF8-hul Module | Distribution | Links |
The UTF8-hul module recognizes and validates content streams encoded with the Unicode UTF-8 encoding.
The module is invoked by the:
command line option.jhove ... -m utf8-hul ...
The following criteria must be met by an UTF8 content streams for JHOVE to consider it well-formed:
Single octet: 0xxxxxxx Two octets: 110yyyyy 10xxxxxx Three octets: 1110zzzz 10yyyyyy 10yyyyyy
Four octets: 11110uuu 10uuzzzz 10yyyyyy 10xxxxxx
Two octets: 0xEF 0xFF UTF-16 big-endian encoding 0xFFFE UTF-16 little-endian encoding Four octets: 0x0000FEFF UCS-4 big-endian encoding 0xFFFE0000 UCS-4 little-endian encoding
The following criteria must be met by an UTF-8 encoded file for JHOVE to consider it valid:
The MIME type is reported as: text/plain; charset=UTF-8
In addition to the standard JHOVE representation information, the module defines the following properties:
Home | Tutorial | Documentation | UTF8-hul Module | Distribution | Links |