Primary Methods to Reverse Engineering PE Files (.exe Files)

Eshan Harshana Agalawatta
6 min readMay 18, 2021

Revere engineering, also called back engineering is the process by which a man-made object is deconstructed to reveal its designs, architecture, or to extract knowledge from the object. In reverse engineering, there are 5 major steps.

Software reverse engineering is the process of discovering the technological principles of a system based on analysis of its structure, function, operation, and behavior. The following methods are the most famous methods of software reverse engineering in the windows environment.

Disassembling / Decompiling 

Debugging 

Hex-editing

Unpacking

File analysis and monitoring

Registry monitoring

In this article, will discuss only most important reverse engineering methods.

Disassembling

Disassembling is a process of transforming machine language into readable assembly instructions. The main task of a disassembler tool is to identify the byte sequences compatible with an assembly instruction. Simple one-to-one mapping of processor instruction codes into instruction mnemonics is performed by a disassembler. There are two flavor types of disassembly syntax. They are Intel and ATT Intel. Both of them do not change code-wise, changed only how it is displayed. The below figures show source code and equal assembly instructions.

Assembly syntax has two parts. The opcode is a part of the instruction that tells the processor what should be done (MOV, PUSH). The operand is a part of the instruction that contains the data to be acted on, or the memory location of the data in a register (eax 0, esp 10h).

Decompilers are different from disassemblers in one important aspect. Decompiler can generate much higher-level text which is more concise and much easier to read than disassembler.

Popular disassembler tools - IDA-Pro,  CFF Explorer,  Hiew

Disassembling techniques can be categorized into two principal classes.

Static Disassembling — When disassembling, the binary file is not executed. It will be disassembling the complete instruction stream wherein .exe file at once. The speed and time of the disassembling process is depending on the size of the executable file. In Static Disassembling, it is used two common algorithms.

Linear Sweep Algorithm — In this algorithm, sequentially disassemble machine code in PE sections of executable. It starts with the first byte in the .text section and proceeds by decoding each byte until an illegal instruction is encountered. It does not accommodate control flow such as branches. The main problem of the algorithm is it doesn’t take control flow of the program and susceptible to mistakes intentionally left in the instruction stream to derail the algorithm from its path. Another is this algorithm can’t simply distinguish the code from data in a binary file because it decodes each byte as code as long as it looks like a legitimate code byte. It ends up interpreting many unnecessary data bytes as assembly instructions.

Tools that use this algorithm — gdb, WinDbg, objdump

Recursive Descent (Traversal) Algorithm — Much more complex and effective approach than Linear Sweep. In this algorithm, code isn’t disassembled in a linear way. It is based on the concept of control flow. Example: When a branch instruction is identified by the dissembler, determination of the addresses where the branch instruction blocks begin and the branch instruction blocks are disassembled.

Tools that used this algorithm — IDA, distorm3, Olly

Dynamic Disassembling — In the disassembly process, the binary file is executed, and its execution is monitored to identify the instruction actions and behavior; the execution is made for some input sets, and as an effect, some instruction streams of the binary file can be avoided. This execution is being monitored by an external tool (debugger). The speed of disassembly is not affected by the size of the executable file because it disassembles only parts related to the real-time execution process.

Debuggers

Debuggers are the expandable version of disassemblers. This tool expands the functionality of a disassembler by supporting the CPU registers, memory map, hex duping of the program, view of stack, etc. Using debuggers allow to set breakpoints and edit the code at run state. There are two types of debuggers.

1. Assembly-level debuggers (low-level debuggers) — Operate on assembly code. Example: OllyDbg, WinDbg.

2. Source-level debuggers — Working on source code. Intergraded with the integrated development environment (IDE). Used to fix bugs of the software under development. Example: Visual studio debugger, Code-blocks debugger

Assembly-level Debuggers

These debuggers use the Dynamic Disassembling method. It will be disassembling binary data of compiled program and allows the Reverser to step through the code by running one line at a time using breakpoints and give the ability to investigate or edit the results. There are two ways to debug executables. 

Starting an executable program with the debugger — When stating executable program, it is load into RAM and stops running immediately prior to the execution of its entry point. Then can control program. 

Attaching a debugger to the executable that already executing — All program’s threads are paused, and debug it. Mostly used in analyzing malware.

In this article, discussed only assembly-level debuggers.

Hex-Editing

In computing, a hex dump is a hexadecimal view of computer data, from RAM or from a computer file or storage device. In a hex dump, each byte (8-bits) is represented as a two-digit hexadecimal number. Hex dumps are commonly organized into rows of 8 or 16 bytes, sometimes separated by whitespaces. A hex editor is a computer program that allows for the manipulation of the fundamental binary data that constitutes a computer file. After open the .exe file through the hex editor normally it gives 3 columns. They are,

Data position (offset) 

Binary data in hex 

Binary data in character encoding (Example: ASCII, …)

Data position -This column has 8 length hexadecimal address that represents where the row of bytes is located within the file.

Binary data in hex- The middle column is the binary data as-is. It displayed byte-per-byte as a hex pair. In this section can edit given hex values that equal to binary data (1 or 0) of the .exe file and build a new one. But it can do only limited functions.

Binary data in character encoding - This column interpreted each byte of the file as a text character. This called character encoding. Non-printable characters are often displayed as dots “.”. Sometimes this column shows hard-coded strings inside binary files. Example: IP/URL address, message bodies, etc.

Other reverse engineering methods are not discussed in this article.

Conclusions

There are some limitations in reverse engineering methods. Mainly in the article focused disassembly process as a reverse engineering method. In disassemble process, 

It is impossible to disassemble an application fully to its original state before being compiled. The disassembler tool operating on the machine code would no produce disassembly comments and textual identifiers such as variable and label names. 

A single disassembly error could result in many subsequent bytes being interpreted incorrectly (because many disassemblers sequentially disassemble machine code) and additionally, it can be very difficult to disassemble of application due to Obfuscation.

Reference

  1. Hoglund, G. & McGraw, G. (2004) Decompiling And Disassembling Software | Reverse Engineering And Program Understanding | Informit [Online] Available from: [20 February 2020].
  2. Sikorski, M. & Honig, A. (2012) Practical Malware Analysis. 2nd edn. India:MGHills.
  3. Veracode Inc. (2020) Static Testing Vs. Dynamic Testing [Online] Available from: [20 February 2020].
  4. Yan, K. (n.d) System — C++ Reference from GeekfromGeek [Online] Available from: [2 May 2020].

--

--