DjVu eBook Signature Format: Specification & DjVu Recovery Example
DjVu is a computer file format designed primarily to store scanned documents and books, especially those containing a combination of text, line drawings, indexed color images, and photographs. It uses technologies such as image layer separation of text and background/images, progressive loading, arithmetic coding, and lossy compression for bitonal (monochrome) images. This allows for high-quality, readable images to be stored in a minimum of space, so that they can be made available on the web. DjVu has been promoted as an alternative to PDF, promising smaller files than PDF for most scanned documents.
DjVu Document files must have a signature (tag) AT&T at the beginning of the document followed by FORM tag which points to the data chunk. Data chunk has its data size at offset 4 (from chunk start) or, for the first chunk at offset 8 (from the file beginning). Chunk size is big-endian (highest byte first). By adding data size for the first chunk to a data offset (12 for the first chunk) we get the total DjVu file size.
Let's examine the example
When inspecting example.djvu file's text data using any Hex Viewer, like Active@ Disk Editor, which is included in Active@ File Recovery package, we can see it starts with a tag AT&T (hex: 41, 54, 26, 54). Next to it, at offset 4, there is a tag FORM (hex: 46, 4F, 52, 4D) which points to the data chunk. Data chunk consists of a chunk size at absolute offset 8 (hex: 00, 01, 70, A0) followed by actual data. The chunk size is big-endian value (highest byte first), which gives size of data 94,368 dec. Data offset is 12 (dec) from the file beginning. Thus total DJVU file size is 12+94,368=94,380 bytes, and reading of all 94,380 consecutive bytes starting from the position of detected AT&T header provide us with all DjVu file data, provided that file is not fragmented.
DjVu files header:
|0||4||signature, must be 41 54 26 54 hex ("AT&T")|
|4||4||chunkId, must be 46 4F 52 4D hex ("FORM")|
|8||4||chunk size (length of the data), big-endian|
Active@ File Recovery Custom Scripting Example
This example just specifies DJVU start signature and calculates file size based on the first data chunk. Syntax of the signature definition language you can read here.
[DJV_HEADER] DESCRIPTION=DjVu Document EXTENSION=djvu BEGIN=DJV_BEGIN SCRIPT=DJV_SCRIPT [DJV_BEGIN] AT&TFORM=0|0 [DJV_SCRIPT] size = read(dword, 8) size = endian(dword, size) size = sum(size, 12)