Stack Cutting

This example builds on Simple Loader (Hooking) to push sensitive Win32 API calls through a stack-cutting call proxy.

Project Files

NOTES

The Proxy

proxy.c is our call proxy. It's position-independent code. Its only job is to munge the stack, call a Win32 API function with the right arguments, fix the stack, and return the result.

Our goal, in this fixing process, is to create the illusion of a full stack (e.g., RtlUserThreadStart on down) without the walked frames containing our DLL loader, injected DLL, and other things in between. I call this stack cutting, because I'm cutting the bad frames out.

System Informer screenshot comparing the callstack of Simple Loader to this project using test.x64.dll

Finding a Valid Frame

For this to work, we need a valid stack frame to point things to. In this example, I simply grabbed the return address and frame pointer for our DLL loader's caller. I then propagate these to eventually reach the proxy function.

This depends on the loader executing from a context where sane frames come before it.

If this loader is fired via CreateRemoteThread, we may find ourselves in a context where we don't have a good frame behind us. If we spam a return address without a good frame pointer, our stack unwinding becomes less than predictable. This implementation detects this situation, and opts to NULL the return address and frame pointer, when the caller context frame pointer is NULL. This won't give us the illusion of a complete callstack, but in many situations, the stack unwinding should stop at our proxy function--giving us a truncated stack.

Hiding the Call proxy

Even with a valid frame and some stack munging, we still have a problem. Our proxy function WILL show up in the callstack. It's position-independent code. We want to place it somewhere more advantageous than private memory. My solution was to find a code cave (e.g., slack space between the .text section of a module and the nearest page size) to store the PIC. Here, I try the executable module itself, and if it's not big enough I try kernel32. If neither work, I just VirtualAlloc unbacked memory. I wouldn't choose kernel32 for production use, but the pattern of trying modules until one you like works is demonstrated in this code.

loader.c is where the call proxy is setup. And, setting it up is the FIRST action taken by the loader. Better, I also proxy the follow-on VirtualAlloc made by loader.c (for our PICOs and DLL) through this function too.

Cutting the Stack

When I started this project, I sought to reproduce Mariusz Banach's results from ThreadStack Spoofer. I expected that I could set my call proxy's stack-stored frame address and return address 0 and this would stop the stack walking. Better, I hypothesized that if I updated both to VALID values, much further up the stack, I could create the illusion of a complete callstack.

My hypothesis didn't go as planned.

For x86 things worked as expected. x64? Not so much.

Why? The call proxy lives in a position-independent context. It's stack does not unwind the same way as a function baked into a compiled executable. Why? Because there's no .pdata section with UNWIND_INFO structures to assist this process. Why do we have a .pdata section? Because unwinding an x64 stack without it requires a lot of guesswork. Commonly, this guess work is carried out by APIs like StackWalk64.

To fool the stack walking algorithm (when there's no .pdata), I found I had to set the frame address and return address as I did before BUT I also had to place the desired return address at the top of (bottom-most memory address) of my proxy function's frame.

This works, but there's another problem. This space at the top of the frame is the x64 shadow space, for the callee function to use (as it wishes), to save register content. If the callee overwrites our spammed return value, the illusion breaks. Sometimes, it breaks in a way where the unwinding just terminates at our proxy function. Sometimes, it breaks in a way that's more suspicious. It depends on what the callee puts into that first slot.

Sometimes, we can work around the above problem. For example KERNEL32$Sleep stomps this slot with the ebx register content. This is easy to work with. We just write our desired return address to ebx and KERNEL32$Sleep will put it where we want it.

But, some functions (e.g., VirtualAlloc) stomp this slot with a register meant for argument passing. We can't overwrite that. Fortunately, in the VirtualAlloc case, the result is to terminate our stack unwinding at our proxy function. One way to work around this, might be to call another function that cooperates better (e.g., directly going to NtAllocateVirtualMemory).

I share the above to warn that this implementation is by no means universal and it doesn't give the stack illusion with every Win32 API call. It requires some quality time in a debugger validating the result.

Hooking Win32 APIs

The hooking part of this tradecraft expands on Simple Loader (Hooking) and follows the pattern there to intercept a few Win32 APIs of interest. Specifically, I intercept VirtualAlloc, VirtualProtect, LoadLibraryA, Sleep, and MessageBoxA. The only task for these hooks is to push the functions and their arguments through the call proxy. Certainly, I could mix other things in here, like masking the DLL content before running these functions. That's on the table.

One place where I did go a little crazy, is I opted to abuse the Crystal Palace import command to bring a bunch of symbols into the hook PICO. Specifically, I've disguised two pointers (SpoofReturnAddr and SpoofFrameAddr) as functions to allow populating them via import. The comments in hook.c explain this in more detail.

Conversation

  • Writing a Debugger From Scratch - DbgRs Part 6 - Stacks (2023) by Tim Misiak. A gentle introduction to stack unwinding, Frame Pointer Omission, and using .pdata/UNWIND info written from the context of writing a debugger.
  • ThreadStackSpoofer (2021) by Mariusz Banach is a similar POC to hook the Sleep function, stomp the return address to 0, call SleepEx, and return.

License

This project is licensed under the GNU General Public License version 2 (GPLv2) or later..