CAB Signature Format: Documentation & Recovery Example

CAB Signature Format: Specification & Recovery Example

CAB (Microsoft Compressed Archive) - cabinet files start with a signature MSCF ( characters 'M','S','C','F' or bytes 0x4D, 0x53, 0x43, 0x46). This is used to assure that the file is a cabinet file. File version, two bytes at offset 24, must be 0x01, 0x03. File size is defined at offset 8, however if additional header exists, this size is not valid. Two bytes at offset 30 is a flag defining whether additional headers exist, or not (bit 3). If there are additional headers, size of the CAB archive file is calcualted as additional data offset plus additional data size. Additional headers located at offset 36, size of header (4 bytes, little-endian order) must be 20 (checkpoint). Additional data offset located at offset 44 (from file beginning) and consist of 4 bytes (little-endian order, low byte first). Additional data size located at offset 48 (from file beginning) and consist of 4 bytes (little-endian order, low byte first).

Let's examine the example

When inspecting file's binary data using any Hex Viewer, like Active@ Disk Editor we can see it starts with a signature MSCF (hex: 4D 53 43 46). Version check confirms that it is a valid CAB archive (2 bytes at offset 24: 0x03 0x01). File size is 779 bytes (4 bytes at offset 8, hex: 0B 03 00 00) however, flags field at offset 30 (hex: 04 00) has active bit 3, which means that we should calculate additional data storage. Additional header has a valid size 20 (hex: 14 00 00 00 at offset 36). Additional data offset is 779 (hex: 0B 03 00 00 at offset 44, little-endian). Additional data size is 5,968 (hex: 50 17 00 00 at offset 48, little-endian). Total file size is 779 + 5,968=6,747 bytes. Thus reading of all 6,747 consecutive bytes starting from the position of detected MSCF header provide us with all CAB file data.

CAB Signature inspection

More info:

The CFHEADER structure provides information about this cabinet file:

  u1  signature[4]  /* cabinet file signature contains the characters 'M','S','C','F' (bytes 0x4D, 0x53, 0x43, 0x46). */
                    /* This field is used to assure that the file is a cabinet file. */
  u4  reserved1     /* reserved */
  u4  cbCabinet     /* size of this cabinet file in bytes */
  u4  reserved2     /* reserved */
  u4  coffFiles	    /* offset of the first CFFILE entry */
  u4  reserved3     /* reserved */
  u1  versionMinor  /* cabinet file format version, minor */
  u1  versionMajor  /* cabinet file format version, major */
  u2  cFolders      /* number of CFFOLDER entries in this cabinet */
  u2  cFiles        /* number of CFFILE entries in this cabinet */
  u2  flags         /* cabinet file option indicators */
  u2  setID         /* must be the same for all cabinets in a set*/
  u2  iCabinet;     /* number of this cabinet file in a set */
  u2  cbCFHeader;   /* (optional) size of per-cabinet reserved area */
  u1  cbCFFolder;   /* (optional) size of per-folder reserved area */
  u1  cbCFData;         /* (optional) size of per-datablock reserved area */
  u1  szCabinetPrev[];  /* (optional) name of previous cabinet file */
  u1  szDiskPrev[];     /* (optional) name of previous disk */
  u1  szCabinetNext[];  /* (optional) name of next cabinet file */
  u1  szDiskNext[];     /* (optional) name of next disk */

Active@ File Recovery Custom Scripting Example

This example does some validation calculations for critical CAB header's parameters beyond simple file size extraction.
Syntax of the signature definition language you can read here.

DESCRIPTION=Microsoft Compressed Archive CAB


	version = read(word, 24)
	if (version != 103h) goto exit
	folders = read(word, 26)
	folders = mul(folders, 8)
	folders = sum(folders, 36)
	files = read(word, 28)
	files = mul(files, 16)
	files = sum(files, folders)
	temp = read(dword, 16)
	if (temp < folders) goto exit
	temp = read(dword, 8)
	if (temp <= files) goto exit
	flags = read(word, 30)
	flags = and(flags, 4)
	if (flags == 0) goto skip
	flags = read(dword, 36)
	if (flags != 20) goto skip
	flags = read(dword, 44)
	if (flags < temp) goto skip
	size = flags
	temp = read(dword, 48)
	size = sum(temp, size)