Most disassembly tools perform either a linear sweep retrieval or a recursive traversal retrieval. Linear sweep starts at the beginning of each executable section and disassembles from the first offset, continuing to the offset following the end of the retrieved instruction. Recursive traversal has a formal definition, but put simply, it performs piece-wise linear sweep over a series of program blocks, or contiguous (non-branching) instruction segments. When a branch instruction is discovered, an attempt is made to determine the target and, if any are found, each target is recursively disassembled. I mentioned in an earlier post that this tool can handle aliased instructions by intentional design. This ability affords us several benefits as a disassembler, one of which is ability to perform both linear sweep and recursive traversal simultaneously.
The PE parser provides us with a number of disassembly targets, some of which are definitely correct, and some which have no such guarantee. The wonder of hybrid disassembly is that we don’t have to pick between one or another, we can just start disassembling from anywhere in the file, using either retrieval method.
Windows PE files present several useful addresses to disassemble from:
- First address in an executable section
- Exported functions
- TLS initializers
- Program Entry Point
The first address in a section is useful for performing linear sweep retrievals, allowing disassembly of the largest amount of program code. Following this, we try to select exported function identifiers (assuming any exist). The header location containing these offsets does not necessarily have to be valid, so we only select them if the header passes a sanity check. TLS initializers absolutely must be sane to execute, so we rely on the address listed as well. Finally, the program entry point also must be valid, except in the instance of TLS initializers. The 3 latter target sets are all disassembled via recursive traversal mechanisms.