PST Signature Format: Documentation & Recovery Example

MS-PST Signature Format: Specification & Recovery Example

MS-PST (Microsoft Outlook Personal Information Storage) - files start with a signature !BDN ( bytes 0x21, 0x42, 0x44, 0x4E ). Another "magic number" is located at the offset 8, SM (bytes 0x53, 0x4D). File size is specified in the header, at the offset 168 (ANSI) or 184 (Unicode), little-endian (low byte first) byte order. File format is defined at the offset 10, hex value 0x17 (23 dec) means Unicode PST file, 0x0E or 0x0F (14 and 15 decimal) mean ANSI PST file.

Let's examine the example

When inspecting example.pst file's binary data using any Hex Viewer, like Active@ Disk Editor we can see it starts with a signature !BDN, at the offset 8 we can verify SM "magic numbers", at the offset 10 we can define Unicode file format, and at the offset 184 (for Unicode file) we can finally define its size. Hex values: 00 24 04 00 when converted to decimal format using little endian (low byte first) order give us length of file 271,360 bytes. Thus reading of all 271,360 consecutive bytes starting from the position of detected !BDN header provide us with all PST file data (provided that PST file is not fragmented).

MS-PST Header Signature inspection

More info:

MS-PST Header

The HEADER structure is located at the beginning of the PST file (absolute file offset 0), and contains metadata about the PST file:

dwMagic      (4 bytes): MUST be "{ 0x21, 0x42, 0x44, 0x4E } ("!BDN")".
dwCRCPartial (4 bytes): The 32-bit CRC  value of the 471 bytes of data starting from wMagicClient (0ffset 0x0008)
wMagicClient (2 bytes): MUST be "{ 0x53, 0x4D }".
wVer         (2 bytes): File format version. This value MUST be 14 or 15 if the file is an ANSI PST file, and MUST be 23 if the file is a Unicode PST file.
wVerClient   (2 bytes): Client file format version. The version that corresponds to the format described in this document is 19. Creators of a new PST file based on this document SHOULD initialize this value to 19.
bPlatformCreate (1 byte): This value MUST be set to 0x01.
bPlatformAccess (1 byte): This value MUST be set to 0x01.
dwReserved   (8 bytes)
bidUnused    (8 bytes Unicode only): Unused padding added when the Unicode PST file format was created.
bidNextP     (Unicode: 8 bytes; ANSI: 4 bytes): Next page BID. Pages have a special counter for allocating bidIndex values. The value of bidIndex for BIDs for pages is allocated from this counter.
bidNextB     (4 bytes ANSI only): Next BID. This value is the monotonic counter that indicates the BID to be assigned for the next allocated block. BID values advance in increments of 4. For more details, see section 2.2.2.2.
dwUnique     (4 bytes): This is a monotonically-increasing value that is modified every time the PST file's HEADER structure is modified. The function of this value is to provide a unique value, and to ensure that the HEADER CRCs are different after each header modification.
rgnid[]      (128 bytes): A fixed array of 32 NIDs, each corresponding to one of the 32 possible NID_TYPEs (NID_TYPE, NID_TYPE_NORMAL_FOLDER, NID_TYPE_SEARCH_FOLDER, NID_TYPE_NORMAL_MESSAGE,NID_TYPE_ASSOC_MESSAGE)
qwUnused     (8 bytes): Unused space; MUST be set to zero. Unicode PST file format only.
root         (Unicode: 72 bytes; ANSI: 40 bytes): A ROOT structure (section 2.2.2.5).
dwAlign      (4 bytes): Unused alignment bytes; MUST be set to zero. Unicode PST file format only.
rgbFM        (128 bytes): Deprecated FMap. This is no longer used and MUST be filled with 0xFF. Readers SHOULD ignore the value of these bytes.
rgbFP        (128 bytes): Deprecated FPMap. This is no longer used and MUST be filled with 0xFF. Readers SHOULD ignore the value of these bytes.
bSentinel    (1 byte): MUST be set to 0x80.
bCryptMethod (1 byte): Indicates how the data within the PST file is encoded. MUST be set to one of the pre-defined values (NDB_CRYPT_NONE, NDB_CRYPT_PERMUTE, NDB_CRYPT_CYCLIC).
rgbReserved  (2 bytes): Reserved; MUST be set to zero.
bidNextB     (8 bytes): Indicates the next available BID value. Unicode PST file format only.
bidNextB     (Unicode ONLY: 8 bytes): Next BID. This value is the monotonic counter that indicates the BID to be assigned for the next allocated block. BID values advance in increments of 4. For more details, see section 2.2.2.2.
dwCRCFull    (4 bytes): The 32-bit CRC value of the 516 bytes of data starting from wMagicClient to bidNextB, inclusive. Unicode PST file format only.
ullReserved  (8 bytes): Reserved; MUST be set to zero. ANSI PST file format only.
dwReserved   (4 bytes): Reserved; MUST be set to zero. ANSI PST file format only.
rgbReserved2 (3 bytes)
bReserved    (1 byte) 
rgbReserved3 (32 bytes) 

Read full Microsoft Outlook PST format specification at
http://msdn.microsoft.com/en-us/library/office/gg615595(v=office.14).aspx

Active@ File Recovery Custom Scripting Example

This example does some validation calculations for critical PST header's parameters beyond simple file size extraction depending on Unicode/ANSI file format.
Syntax of the signature definition language you can read here.

[PST_HEADER]
DESCRIPTION=Outlook Archive
EXTENSION=pst
BEGIN=PST_BEGIN
SCRIPT=PST_SCRIPT


[PST_BEGIN]
!BDN=0|0
SM=8|8
\x13\x00=12|12
\x01\x01=14|14

[PST_SCRIPT]
	   data = read(word, 10)
	   if (data == 0Eh) goto valid
	   if (data == 0Fh) goto valid
	   if (data != 17h) goto exit
	   size = read(dword, 184)
	   goto exit
valid:
	   size = read(dword, 168)