x86 Nirvana Hooks & Manual Syscall Detection
While researching an offensive capability related to syscalls and trying to decide if I wanted to publish my work. I decided that if I were to release my offensive research, I would first like to publish something that would be helpful in detecting the technique I was researching. With that goal in mind, I set out to detect manual syscalls in x86.
Existing Published Syscall Research Overwhelmingly Focused on x64
The first problem that I ran into was there there is not a tremendous amount of recent information or examples describing x86 syscalls. The overwhelming majority of recent research has been focused on executing syscalls in x64. This makes sense in a world where 32-bit systems and applications are dwindleing. On the other hand, should we ignore their existance? Some examples recent x64 syscal research include:
I did manage to find some things that would be helpful in making syscalls from x86 but nothing as convienient at the SysWispers project for x86:
The direct-syscall repository using the Heaven’s Gate technique was the closest example of making a syscall from a 32-bit process that could be made to work somewhat like SysWhispers. (With a fair amount of work.) Primarily, it seems that the work around x86 syscalls revolves around performing manual syscalls. I did come up with something but, I’ll discuss that in a future blog post. For now, let’s stay on topic: Detecting syscalls in 32-bit processes.
How to Instrument a x86 Process
One of the first projects I ran across while looking into how to detect syscalls made from x86 processes was Jack Ullrich’s (@winternal_t) syscall-detect project. In fact, the project I am releasing with this blog is based on the syscall-detect project. I have heavily modified and changed it to improve it and make it work with 32-bit processes. He has an excellent blog post on how his tool works, that I recommend giving a read, located here. As he points out in his blog, his technique is based on Nirvana Hooks, a technique that Alex Ionescu gave a talk about in 2015 at RECON, titled: Hooking Nirvana: Stealthy Instrumentation Hooks. (talk, slides) Alex’s talk, and the majority of the examples I found, again, focused on x64 processes:
- Hooking Via InstrumentationCallback
- Weaponizing Mapping Injection with Instrumentation Callback for stealthier process injection
- Windows x64 System Services Hooks and Advanced Debugging
I did find a couple of examples of performing Nirvana Hooks in x86 but, the information was limited, and I ran into some trouble putting the information I found to use.
- Windows 10 Hooking Nirvana explained
- Instrumentationcallback and advanced debugging
What I did find was enough to lead me down the path I needed to to understand how Instrumentation Callbacks (Nirvana Hooks) work in x86. The remainder of this blog post will detail what I was able to discover and how I implemented a manual syscall detecetion tool that works in both x86 and x64.
How Instrumentation Callbacks Work in x86
There were not many good examples of how Nirvana Hooks work in x86 processes. I wanted to understand how exactly the Instrumentation Callback routine was called and what values were stored where when the Instrumentation Callbak is made. I started by doing more searches to see what information I could find. The information in the Wolf’s IT Thoughts blog post titled Winodws 10 Hooking Nirvana explained (archive.org) contained a lot of good information that got me pointed in the right direction. The “assumed” assessment of what the Wow64SetupForInstrumentationReturn function does in Wolf’s blog is pretty accurate but was too high-level for me. I wanted to understand what wa going on in more detail. With that goal, I fired up IDA, Cutter, and WinDbg and got going.
In the following sections I am going to describe the basic sequence of events that leads to code execution being redirected to a Nirvana Hook instrumentation Callback. I will then dig deeper into several elements to help the reader gain a better understanding of what is taking place in the background when Nirvana Hooks are enabled. Finally, I will describe and provide code that will demonstrate functioning Nirvana Hooks in an x86 process.
Path to Nirvana
The road to executing a Nirvana Hook’s InstrumentationCallback routine starts in the Wow64SystemServiceEx function. In a Wow64 process this function is responsible for resolving and triggering syscalls. At the end of this function, the third element (What I’m calling the InstrumentationCallbackAddr) of the Wow64Info structure is checked and if it exists, the Wow64SetupForInstrumentionReturn function is called and the value of the third element is passed as an argument.
Figure 01: Wow64SetupForInstrumentationReturn Function Call
What is Wow64Info?
Another myserious structure I knew nothing about was the WOW64INFO structure. Before discussing how Wow64SetupForInstrumentationReturn works, first I discuss this new structuer to help better understand what it is and what information it contains. Others have reversed this structure. 16 19 20 The existing work was extremely helpful but I wished to understand this structure better. Also, something about the previous reversing didn’t add up with what I observed while reversing wow64.dll. After spending many hours reversing functions in wow64.dll and wow64cpu.dll I came to understand the structure better and what I came up with was just slightly different than what others had come up with before. It’s possible that it has been restructured at some point? The following is how I understand the structure is formatted now:
Code 01: Reversed Wow64Info Structure.
This structure is stored in two locations. The first location is just past the 32-bit PEB structure and in 64-bit Thread Local Storage slot 10 (e.g. TEB.TlsSlot). After reversing it and cleaning it up a bit, the follwoing code is responsible for storing the location of the WOW64INFO structure. It is part of the Wow64LdrpInitializer function located in wow64.dll.
Figure 02: Code that Sets Wow64InfoPointer
As you can see from my comments, the pTEB pointer is reused and changed from the 64-Bit TEB to the 32-bit TEB. This means that instead of pointing to TEB->NtTib.Self, it instead points to TEB->ProcessEnvironmentBlock. 0x480 happens to the the size of the PEB structure in a 32-bit process, as demonstrated in the following truncated WinDbg output.
Code 02: Truncated dump of the PEB32 structure in WinDbg demonstrating the structures size.
Going a step further, it is possible to dump the memory at this location to view the values stored in the stucture. In the following screenshot, the value 0x02c31000 is the address of the PEB structure in a 32-bit process:
Figure 03: Dump of Wow64Info Structure at PEB32+0x480
Next, let’s fill in the values for the elements of the WOW64INFO structure:
|0x8664 (IMAGE_FILE_MACHINE_AMD64 21)
|0x014C (IMAGE_FILE_MACHINE_I386 21)
Table 01: Wow64Info Initial Values
WOW64INFO Structure Value Initialization
The next task was to determine how each of the elements stored in the WOW64INFO structure are initialized. The first element, PageSize is initialezed in the wow64!ProcessInt function. The following reversed code sets the value to 0x1000 (4096), the size of a memory page:
Figure 04: Wow64Info->PageSize Set to 4096 Bytes
We will skip down to the last two elements, NativeCpuArch and EmulatedCpuArch and come back to CpuFlags last. Both NativeCpuArch and EmulatedCpuArch are initialized in the ntdll!Wow64LdrpInitialize function. NativeCpuArch is initialized to equal 0x8664 which equates the IMAGE_FILE_MACHINE_AMD64 21 and EmulatedCpuArch is initialized to 0x014C which equates to IMAGE_FILE_MACHINE_I386 21. The following code is responsible for the assignments. The variable v31 contains 0x014C:
Figure 05: Wow64Info->NativeCpuArch & Wow64Info->EmulatedCpuArch Values Populated
Finally, the CpuFlags element. This was a bit more difficult to track down becaue it is actually initalized from an entirely different DLL. This value seems to be initialized using the wow64cpu!BTCpuProcessInit function. The following code seems to be responsible, if anyone knows of another location please contact me:
Figure 05: Wow64Info->CpuFlags set to 0x00000001
How Windows Checks if Callbacks are Enabled?
Now, armed with a little bit of information about the WOW64INFO structure, lets take a look at what happens in the wow64!Wow64SystemServiceEx function. This function, as previously stated, resolves and initiates a call that will wind up sending the CPU from x86 to x64 bit mode to execute a syscall. The interesting part of Wow64SystemServiceEx is towards the end of the function. The specific instructions are displayed in Figure 06:
Figure 06: Checking for and Initiating the Nirvana Hook Instrumentation Callback
The value of WOW64INFO->InstrumentationCallbackAddr (What I’m calling it anyway…) is checked. If it exists, it is passed to the wow64!Wow64SetupForInstrumentationReturn function as an argument. This value is later used to redirect execution. This means that the address of our Nirvana Hook callback is stored in the WOW64INFO structure that lives just beyond the PEB structure of a 32-bit process.
Instrumentation Callback Setup
Now that we know how the Instrumentation Callback function address is obtained, lets take a look at the wow64!Wow64SetupForInstrumentationReturn function. The following steps describe what this function does:
Gets the current CpuArea and stores it.
The first element of the stored CpuArea is checked to see if it equals 0x014C (IMAGE_FILE_MACHINE_I386 21).
If the previous condition is true, the CONTEXT->ContextFlags value is set to 0x10003 in an empty CONTEXT variable. A ContextFlags value of 0x10003 coppies the following registers:
- EBP, EIP, SegCs, EFlags, ESP, SegSs, EDI, ESI, EBX, ECX, EDX, EAX
The same CONTEXT variable is passed to the CpuGetContext function where the requested values are populated.
The value from CONTEXT->EIP is stored in CONTEXT->ECX. This means that the address our instrumentation will need to return execution will be accessible to our Instrumentation Callback code in the ECX register.
CONTEXT->ContextFlags is set to 0x10002. A ContextFlags setting of 0x10002 only sets EDI, ESI, EBX, ECX, EDX, and EAX.
CpuSetContext is called to update the CPU context. The relevent code resposible for copying the registers can be found in the ntdll!RtlpCopyLegacyContextX86 function.
Figure 07: Code in ntdll!RtlpCopyLegacyContextX86 Function Responsible for Copying Registers
- CpuSetInstructionPointer is called to redirect execution to a1, which is the address obtained from the WOW64INFO structure and passed to Wow64SetupForInstrumentationReturn from the Wow64SystemServiceEx function.
The Figure 08 is a screen capture of the cleaned up code responsible for the process described in the previous list, along with the notes from my attempt to reverse engineer it.
Figure 08: Wow64SetupForInstrumentationReturn with Notes
We now understand where the callback address is stored, how the CPU context is manipulated, and code execution is redirected to our Nirvan Hook Callback. The only thing that remains, is to write a callback that will:
- Preserve the registers, including the saved EIP address that is now stored in the ECX register.
- Preserve the CPU Flags
- Perform whatever processing we require. Without triggering any new syscalls before setting a flag to disable instrumentation first.
- Restore the CPU Flags
- Restore the CPU Registers
- Redirect execution back to the oritginal EIP address that is stored in the ECX register.
Using Nirvana Hooks
In this section, I will detail the modifications I made to Jack Ullrich’s Syscall-Detect 8 project to use Nirvana Hooks to detect manual syscalls made from User space instead of Kernel space. It was the first project I found when looking into detecting manual syscalls and what served to send me down the rabbit hole that resulted in the research I performed for this blog post. I was attracted by his use of RtlCaptureContext and RtlRestoreContext to capture and restore the CPU context. It is an awesome project, only one problem… I was dealing with a 32-bit application but could only find 64-bit solutions. On top of that, RtlCaptureContext and RtlRestoreContext are not available in 32-bit processes.
Project Architecture and Target Changes
I have made the following changes to the Platform Target and Configuration Type changes:
- I have written my version of the instrumentation to support both x86 and x64 CPUs
- It is also capable of being compiled as a DLL so that it can be loaded into another process as Jack Ullrich demonstrated or as a EXE that can be run for demonstration and testing purposes.
To facilitate logging in an application that does not, or cannot use a console, I changed the logging facility from the console to use DebugView. 23
x86 Specific Changes
Working top to bottom in the source, I will point out notable changes I needed to make to support x86. One thing that I will not discuss in detail but is worth pointing out is the pointer type changes. The types I selected were specifically chosen to support both 32-bit and 64-bit pointers depending on the project’s target CPU architecture.
New Sanity Checks
RIP does not exist in 32-bit mode. To support this, I changed the RIP_SANITY_CHECK and changed the argument names to be more architecture agnostic.
Code 03: Changes to the sanity check macro to make it more architecture agnostic.
New InstrumentationCallback Function Deleration
Because RtlCaptureContext and RtlRestoreContext are not supported in 32-bit processes, I had to modify the InstrumentationCallback function declaration so that it would be different based on the CPU architecutre the project is compiled for.
Code 04: Architecture specific InstrumentationCallback function declaration.
Again, due to the RtlCaptureContext and RtlRestoreContext, the registers are being stored and restored completely in the Asembly code. This means that the variables being set in the next section are completly for show when running in 32-bit mode. I also chose to restore all of the registers so that they would be set exactly as they where when the Nirvana Hook was triggered.
Code 05: Restoring register values from saved locations in the TEB.
New SetInstrumentationCallbackHook Function
To support 32-bit and 64-bit, I borrowed and modified a portion of function that ScyllaHide 22 uses to set the Instrumentation Callback hook.
Code 06: New SetInstrumentationCallbackHook function to handle both x86 and x64 processes.
Because I verified that the value of EIP was being stored in the ECX register prior to the Instrumentation Callback being made. The task of writing some Assembly to store the current of the registers and restore them before returning execution back to it’s normal flow was fairly simple. I also added a check of the InstrumentationCallbackDisabled flag to resume execution without calling the InstrumentationCallback function if a syscall is already being instrumented. This is the code that I ended up with:
Code 07: x86 Assembly InstrumentationCallbackProxy
x64 Specific Changes
Not much of the core functionality was changed for the x64 version of the code from the original. The main functional differences are in the Assembly language used in the callback. Most notably:
- I added a routine to check the InstrumentationCallbackDisabled flag and resume execution if a syscall is already being instrumented.
- I fixed the call to RtlCaptureContext to include shadow space on the stack before making the call.
- Since I modified the InstrumentationCallback function, I needed to provide the additional arguments in the RDX and R8 registers.
Code 08: x64 Assembly InstrumentationCallbackProxy
I am releasing the code I wrote for this research on GitHub. You can view the project here:
The following video demonstrates the DLL version of the project being loaded into a project that makes a manual syscall to demonstrate the ability of the tool to detect a manual syscall in x86.
Figure 08: Demo of inst-callback loaded into another project.