Memory corruption exploits have historically been one of the strongest accessories in a good red teamer's toolkit. They present an easy win for offensive security engineers, as well as adversaries, by allowing the attacker to execute payloads without relying on any user interaction.
Fortunately for defenders, but unfortunately for researchers and adversaries, these types of exploits have become increasingly more difficult to execute, thanks largely to a wide array of operating system mitigations that have been implemented directly within the systems we use every day. This vast apparatus of mitigations makes formerly trivial exploitation expensive and arduous on more modern hardware and software.
This two-part blog series walks through the evolution of exploit development and vulnerability research on Windows systems. It addresses questions such as “How does this affect the landscape of future breaches?” and “Is the price for developing a reliable, portable and effective binary exploit still worth it?”
How Did We Get Here?
From its inception, computing garnered curiosity, which eventually led to the discovery of the “computer bug,” or unintended behavior from systems as a result of user interaction. This, in turn, led to the use of these bugs by bad actors with malign intent and launched the era of binary exploitation. Since then, security researchers, red teamers and adversaries alike have never looked back. The onset of binary exploitation has led vendors, most notably Microsoft and Apple (with a special mention to grsecurity on Linux who led the charge over two decades ago), to thwart these exploits with various mitigations. These exploitation mitigations, many of which are enabled by default, have reduced the impact of modern exploitation. Akin to the massive use of Active Directory in enterprise environments, which has forced red team research to place heavy focus on Microsoft products, adversaries and researchers have made Windows a focal point, due to its widespread use in both corporate and non-corporate environments. As a result, this blog will be Windows-centric focusing on both user mode and kernel mode mitigations.Vulnerability Classes: Then and Now
Researchers and adversaries have always had to answer an age-old question when it comes to binary exploitation: “How can code be executed on a target without any user interaction?” The answer came in the form of various vulnerability classes. Although not an exhaustive list, some common vulnerabilities include:- Classic stack overflow (yes, even in 2020): This is the ability to overwrite existing content on the stack and use a controlled write to locate and corrupt the return address of a function to jump to an arbitrary location.
- Use-after-free: An object is allocated in memory on the heap in user mode (or in kernel-mode pool memory). A premature “free” of this object occurs, though a reference/handle to this freed object remains. Using another primitive, a new object is created in the freed object’s place and the reference to the old object is utilized to execute or otherwise modify the new object, which is acting in place of the old object. The expectation is that these unexpected changes to the new object somehow result in privilege escalation or other malicious capability.
- Arbitrary writes: This is the ability to arbitrarily write data, such as one or more pointers, to an arbitrary location. This also could be the result of another vulnerability class. Arbitrary write primitives can also be leveraged as arbitrary read primitives depending on the precision one has over the write primitive.
- Type-confusion: An object is of one type, but later on that type is referenced as another type. Due to the layout in memory of various data types, this can lead to unexpected behavior.
Exploit Mitigations: Then and Now
While security researchers and adversaries historically have had the upper hand with various methods of delivering payloads through vulnerabilities, vendors slowly started to level the playing field by implementing various mitigations, in hopes of eliminating bug classes completely or breaking common exploitation methods. At the very least, the hope is that a mitigation will make the technique too expensive, or unreliable, for mass-market use, such as in a drive-by exploit kit. Exploit mitigations, as defined here, have come a long way since the early days of Windows. Legacy mitigations — the initial mitigations released on Microsoft operating systems — will be addressed first. Contemporary mitigations, which comprise more prevalent and documented instruments of exploit thwarting, will be the second pillar outlined in this series. Lastly, mitigations that are less documented and not as widely adopted — referred to here as “Modern,” or cutting-edge mitigations — will wrap up the series.Legacy Mitigation #1: DEP a.k.a No-eXecute (NX)
Data Execution Prevention (DEP), referred to as No-eXecute (NX), was one of the first mitigations that forced researchers and adversaries to adopt additional methods of exploitation. DEP prevents arbitrary code from being executed in non-executable portions of memory. It was introduced both in user mode and kernel mode with Windows XP SP2, although only for the user-mode heap and stack, and the kernel-mode stack plus pageable kernel memory (paged pool). It took many more releases, up to and including Windows 8, for most kernel-mode heap memory, including resident memory (nonpaged pool) to become non-executable. Although considered to be an “older” mitigation, it remains one that all vulnerability researchers and adversaries have to take into consideration. DEP’s implementation in kernel mode and in user mode is very similar in that DEP is enforced on a per-page memory basis via a page table entry. A page table entry, or PTE, refers to the lowest-level entry in the paging structures used for virtual memory translation. PTEs, at a very high level, contain bits that are responsible for enforcing various permissions and properties for a given range of virtual addresses. Each chunk of virtual memory, referred to as a page (and typically 4KB), is marked as either executable or writable — but not both at the same time — via its page table entry in the kernel. Utilizing the!address
and !pte
commands in WinDbg can provide greater insight into DEP’s implementation.
Before kernel-mode DEP was extended to cover the resident kernel heap on Windows operating systems, the PTEs for such allocations were marked as RWX
— which refers to the NonPagedPool
— meaning that this type of kernel-mode memory was executable and writable. Resident memory refers to the fact that memory owned by this allocation type will never be “paged-out” of memory, meaning this type of virtual memory will always be mapped to a valid physical address.
With the release of Windows 8, the NonPagedPoolNx
pool became the default kernel-mode heap for resident memory allocations. This captures all of the properties of NonPagedPool
but makes it non-executable. Just like for user-mode addresses, the executable bit is enforced by the page table entry of a kernel-mode virtual address.
Usermode DEP can be bypassed with common exploitation techniques such as return-oriented programming, call-oriented programming and jump-oriented programming. These “code-reuse” techniques are used to dynamically call Windows API functions such as VirtualProtect()
or WriteProcessMemory()
to either change permissions of memory pages to RWX
or write shellcode to an already existing executable memory region using pointers from different modules loaded at runtime. In addition to altering permissions of memory, it is also possible to utilize VirtualAlloc()
or similar routines to allocate executable memory.
Kernel-mode DEP can be bypassed using an arbitrary read/write primitive to extract the page table entry control bits for a particular page in memory and modifying them to allow both write and execute access. It is also bypassable by redirecting execution flow into user-mode memory that has already been marked as RWX,
since by default, kernel-mode code can call into user-mode code at will.
Legacy Mitigation #2: ASLR/kASLR
With the addition of DEP, vulnerability researchers and adversaries quickly adopted code reuse techniques. The implementation of Address Space Layout Randomization (ASLR) and Kernel Address Space Layout Randomization (KASLR) caused exploitation to be less straightforward. ASLR and its kernel-mode implementation KASLR randomize the base addresses of various DLLs, modules, and structures. For instance, this particular version of Windows 10 loads the kernel, before reboot, at the virtual memory addressfffff800`0fe00000
.
Upon reboot, the kernel is loaded at a different virtual address, fffff800`07000000
.
Historically, before the implementation of ASLR, defeating DEP was as trivial as disassembling an application or DLL into its raw assembly instructions and utilizing pointers to these instructions, which were static before ASLR, to bypass DEP. However, with the implementation of ASLR, one of three actions is generally required:
- Utilize DLLs and applications that are not compiled with ASLR
- Utilize an out-of-bounds vulnerability or some other type of information/memory leak
- Brute force the address space (not feasible on 64-bit systems)
Because Windows only performs ASLR on a per-boot basis, all processes share the same address space layout once the system has started. Therefore, ASLR is not effective against a local attacker that already has achieved code execution. Similarly, because the kernel provides introspection APIs to non-privileged users, which provide kernel memory addresses, KASLR isn’t effective against this class of attack either. For this reason, ASLR and KASLR on Windows are only effective mitigations against remote exploitation vectors. With the rise of the local remote, however, it was recognized that KASLR was ineffective against remote attackers that first achieved a user RCE because, as mentioned, certain Windows API functions, such as
EnumDeviceDrivers()
or NtQuerySystemInformation()
, can be leveraged to enumerate the base addresses of all loaded kernel modules.
Since a local remote attacker would first begin with a user-mode RCE targeting a browser, etc. Microsoft began heavily enforcing that such applications run in sandboxed environments and introduced Mandatory Integrity Control (MIC), and later, AppContainer, as a way to lower the privileges of these applications, through, among other things, running them with a low integrity level. Then, in Windows 8.1, it blocked access to such introspective API functions to medium integrity level processes and above.
Therefore, a low integrity level process, such as a browser sandbox, will require an information leak vulnerability to circumvent KASLR.
Several other primitives for leaking the base address of the kernel have been mitigated throughout various builds of Windows 10. Notably, the Hardware Abstraction Layer (HAL) heap, which contains multiple pointers to the kernel, was also located at a fixed location. This was because the HAL heap is needed very early in the boot process, even before the actual Windows memory manager has initialized. At the time, the best solution was to reserve memory for the HAL heap at a perfectly fixed location. This was mitigated with the Windows 10 Creators Update (RS2) build.
Even though ASLR is almost as old as DEP and they are both among some of the first mitigations implemented, they must be taken into consideration during modern exploitation.
Contemporary Mitigation #1: CFG/kCFG
Control Flow Guard (CFG), and its implementation in the kernel known as kCFG, is Microsoft’s version of Control Flow Integrity (CFI). CFG works by performing checks on indirect function calls made inside of modules and applications compiled with CFG. Additionally, the Windows kernel has been compiled with kCFG starting with the Windows 10 1703 (RS2) release. Note, however, that in order for kCFG to be enabled, VBS (Virtualization Based Security) needs to be enabled. VBS will be discussed in more detail in part 2 of this blog series. With a nod to efficiency for users, indirect calls that are protected by CFG are validated using a bitmap, with a set of bits indicating if a target is “valid” or if the target is “invalid.” A target is considered “valid” if it represents the starting location of a function within a module loaded in the process. This means that the bitmap represents the entire process address space. Each module that is compiled with CFG has its own set of bits in the bitmap, based on where it was loaded in memory. As described in the ASLR section, Windows only randomizes the address space per-boot, so this bitmap is typically mostly shared among all processes, which saves significant amounts of memory. Generally, at a very high level, indirect user-mode function calls are passed to aguard_check_icall
function (or guard_dispatch_icall
in other cases). This function then dereferences the function _guard_check_icall_fptr
and performs a jump to the pointer, which is a pointer to a function LdrpValidateUserCallTargetES
(or LdrpValidateUserCallTarget
in other cases).
A series of bitwise operations and assembly functions are performed, which results in checking the bitmap to determine if the function within the indirect function call is a valid function within the bitmap. An invalid function will result in a process termination.
kCFG has a very similar implementation in that indirect function calls are checked by kCFG. Most notably, this “breaks” the
primitive that adversaries and researchers have used to execute code in context of the kernel by invoking nt!KeQueryIntervalProfile
, which performs an indirect function call to
on 64-bit systems.
kCFG uses a slightly different modification of CFG, in that the bitmap is stored inside the variable nt!guard_icall_bitmap
. Additionally, nt!_guard_dispatch_icall
starts theroutine to validate a target and no other function calls are needed. CFG mitigates the fact that at some point during the exploit development lifecycle, a function pointer may need to be overwritten to point to a different function pointer which can be beneficial to the adversary (such as VirtualProtect).
CFG is a forward edge CFI mitigation. This means that it does not take
ret
instructions into account, which is a backward edge case. Since CFG doesn’t check return addresses, CFG could be bypassed by utilizing an information leak, which may allow an action such as parsing the Thread Environment Block (TEB) to leak the stack. Utilizing this knowledge, it may be possible to overwrite the return address of a function on the stack with malign intent.CFG has been found, over time, to have a few shortcomings. For example, note that modules make use of an Import Address Table (IAT) for imports, such as Windows API functions. These IAT tables are essentially virtual addresses within a specific module that point to Windows API functions. The IAT is read-only by default and generally cannot be modified. Microsoft has deemed these functions as “safe” due to their read-only state, meaning CFG/kCFG does not protect these functions. If an adversary could modify, or add a malicious entry to the IAT, it would be possible to call a user-defined pointer.
Additionally, adversaries could leverage additional OS functions for code execution. By design, CFG/kCFG only validates if a function begins at the location indicated by the bitmap — not that a function is what it claims to be. If an adversary or researcher could locate additional functions marked as valid in the CFG/kCFG bitmap, it may be possible to overwrite a function’s pointer with another function’s pointer to “proxy” code execution. This could lead to, for example, a type-confusion attack, where a different, unexpected, function is now running with the parameters/objects of the original expected function. As mentioned earlier, kCFG is only enabled when VBS is enabled. One interesting characteristic of kCFG is that even when VBS is not enabled, kCFG dispatch functions and routines are still present and function calls are still passed through them. With or without VBS enabled, kCFG performs a bitwise check on the “upper” bits of a virtual address to determine if an address is sign-extended (also known as a kernel-mode address). If a user-mode address is detected, regardless of HVCI being enabled, kCFG will cause a bug check of
KERNEL_SECURITY_CHECK_FAILURE
. This is one mitigation against kernel-mode code being coerced into calling user-mode code, which we saw was a potential technique to bypass DEP. In the next section, we’ll talk about Supervisor Mode Execution Prevention (SMEP), which is a modern mitigation against this attack as well.
It is also worth noting that the kCFG bitmap is protected by HVCI, or Hypervisor-Protected Code Integrity. HVCI will be referenced in the second part of this blog series.
Contemporary Mitigation #2: SMEP
Supervisor Mode Execution Prevention is a hardware-based CPU mitigation that was implemented specifically against kernel exploits. WhenNonPagedPoolNx
was introduced, researchers and adversaries could no longer write shellcode directly to kernel mode and execute it. This led to the idea that a Windows API function like VirtualAlloc()
could be used to allocate shellcode in user mode, and then pass the returned pointer to the shellcode back to kernel mode. The kernel would then execute the user mode code “in context” of the kernel, meaning the shellcode would run with full kernel privileges.
SMEP works to mitigate this attack by disallowing execution of user-mode code from the kernel. More specifically, x86-based CPUs have an internal state known as the Code Privilege Level (CPL). These CPUs have four different CPLs known as rings. Windows only utilizes two of these rings: ring 3, relating to anything residing in user mode and ring 0, which relates to anything residing in kernel mode. SMEP disallows code belonging to CPL 3 to be executed in the context of CPL 0.
SMEP is enabled via the 20th bit of the CR4 control register. A control register is a register used to change or enable certain features of the CPU such as the implementation of virtual memory through paging, etc. Although SMEP is enabled via the 20th bit of the CR4 register, it is then enforced through the PTE of a memory address. SMEP enforcement works by checking the User vs. Supervisor (U/S
) bit of a PTE for any paging structure. If the bit is set to U(ser)
, the page is treated as a user-mode page. If the bit is cleared, meaning the bit is represented as S(upervisor)
, the page is treated as a supervisor (kernel-mode) page.
As Alex Ionescu explained at Infiltrate 2015, if only one of the paging entries are set to “S” — SMEP won’t cause a crash. This realization is important, as SMEP can be bypassed by “tricking” the CPU into executing shellcode from user mode through an arbitrary-write.
First, locate the PTE for an allocation in user mode, and then clear the U/S
bit to cause it to be set to S. When this occurs, the CPU will treat this user-mode address as a kernel-mode page — allowing execution to occur.
An older technique for bypassing SMEP is to disable it systemwide by leveraging ROP. An adversary could leverage a ROP gadget in kernel mode by finding one that allows overriding the value of the CR4 register to one with the 20th bit cleared, which enables SMEP, back into the CR4 register.The downside to this method is that you must use kernel-mode ROP gadgets to keep execution in the kernel, in order to adhere to SMEP’s rules. Additionally, as with all code reuse attacks, the offset between gadgets may change between versions of Windows, and control of the stack is a must for ROP to work. A protection called HyperGuard, which is beyond the scope of this blog, also protects against CR4 modification on modern systems.
kd> bp nt!part_2
In this blog, legacy mitigations were revisited along with contemporary mitigations such as CFG and SMEP that look to challenge vulnerability researchers and raise the bar and quality of exploits. These topics set the stage for more modern mitigations, such as ACG, XFG, CET, and VBS, which add complexity — increasing the impact of exploitation and challenging readers to become more inquisitive about the return on investment of modern exploit development. Read “The Current State of Exploit Development, Part 2” where we discuss more contemporary exploitation paths and the many mitigations Microsoft has developed to deal with them.
Additional Resources
- Learn more about how CrowdStrike can help your organization improve your cybersecurity readiness by visiting the CrowdStrike Services webpage.
- Read about CrowdStrike Red Team / Blue Team exercises by downloading the data sheet.
- Learn more about the powerful CrowdStrike Falcon® platform by visiting the webpage.
- Test CrowdStrike next-gen AV for yourself. Start your free trial of Falcon Prevent™ today.