CHM Signature Format: Specification & CHM Recovery Example
Microsoft Help CHM files start with a signature ITSF (characters 'I','T','S','F' or bytes 0x49, 0x54, 0x53, 0x46). Except signature, it begins with a short initial header (56 bytes). This is followed by the header section table and the offset to the content. File version is in initial header - four bytes at offset 3, little-endian order. Initial header size is defined at offset 8: 4 bytes, little-endian order (lowest byte first). Next to the initial header is located Header Section 0. It has a signature 0x01 0xFE, and CHM total file size is located at offset 8 at this header section, 4 bytes, little-endian order.
Let's examine the example
When inspecting example.chm file's binary data using any Hex Viewer, like Active@ Disk Editor we can see it starts with a signature ITSF (hex: 49, 54, 53, 46). Version check confirms that it is a valid CHM file v. 3 (4 bytes at offset 4: 0x03). Initial header total size is 96 bytes (4 bytes at offset 8, little-endian, hex: 60 00 00 00). Next to the initial header we see Header Section 0, starting from signature 0xFE 0x01. Total CHM file size is 815,211 bytes (6B 70 0C hex), four bytes, little-endian order, at offset 8 in the section (or offset 0x68 from the beginning of the file).
Thus reading of all 815,211 consecutive bytes starting from the position of detected ITSF header provide us with all CHM file data, provided that file is not fragmented.
The CHM file initial header:
|0||4||signature, must be 49, 54, 53, 46 hex ("ITSF")|
|4||4||version of Microsoft Help CHM file, 3 in most cases|
|8||4||total initial header size|
|12||4||must be one|
|16||4||timestamp, big-endian DWORD, contains seconds (MSB) and fractional seconds (second byte).|
|20||4||Windows Language ID, for example 0x0409 = LANG_ENGLISH/SUBLANG_ENGLISH_US|
Header Section 0:
|0||4||signature, must be hex: 01 FE|
|4||4||must be zero|
|8||8||CHM file size, little-endian|
Active@ File Recovery Custom Scripting Example
This example does some validation calculations for CHM header's parameters beyond simple file size extraction. Syntax of the signature definition language you can read here.
[CHM_HEADER] DESCRIPTION=Microsoft CHM Help EXTENSION=chm BEGIN=CHM_BEGIN SCRIPT=CHM_SCRIPT [CHM_BEGIN] ITSF=0|0 [CHM_SCRIPT] version = read(dword, 4) if (version == 0) goto exit header = read(dword, 8) if (header <= 1Ch) goto exit temp = read(qword, header) if (temp != 1FEh) goto exit temp = sum(header, 8) size = read(qword, temp) temp = sum(header, 10h) if (size > temp) goto exit size = 0