Locating Win32 API Function Addresses
Welcome back old readers, hello new readers! It has been a little over a month since my last blog post. Life is full and keeps me busy! In the last blog post I wrote about encoding function names using a hashing routine. As a reminder, encoding the function names served the purpose saving space as well as hiding the function names from being stored in an easily detectable string format.
In this installment of the shellcodee writing series, we will examine a routine that will use what we learned in the previous blog posts. To find the address of a Win32 API function we will need the base address of kernel32, and an encoded and stored Win32 API function value. To keep things simple and consistent we will focus on the function name from the previous blog post LoadLibraryA which encodes to 0xEC0E4E8E.
This post is going to get a bit into the weeds. We will attempt to explain how the assembly code is navigating the PE File Header to find the Export Directory structure where the list of exported functions resides. We will then explain how the code steps through each of the exported function names, hashes them, and finally checks for a match. This will be a fairly involved post, do not be discouraged if you need to read through it a few times. I would encourage you to put the code into a debugger and step through it instruction by instruction to gain a better understanding of what is happening if needed.
Win32 API Locating Assembly, in 32-Bit
Enough yammering, time for some assembly code. The following assembly program will locate the address of the LoadLibraryA function and store it at at an address pointed to by EBP. The code is a bit on the long side. Do not worry, we will break down what each section is doing following the main listing.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 [SECTION .text] BITS 32 _start: jmp main ; Constants win32_library_hashes: call win32_library_hashes_return ; LoadLibraryA dd 0xEC0E4E8E ; ======== Function: find_kernel32 find_kernel32: push esi xor eax, eax mov eax, [fs:eax+0x30] mov eax, [eax+0x0C] mov esi, [eax+0x1C] lodsd mov eax, [eax+0x08] pop esi ret ; ======= Function: find_function find_function: pushad mov ebp, [esp+0x24] mov eax, [ebp+0x3C] mov edx, [ebp+eax+0x78] add edx, ebp mov ecx, [edx+0x18] mov ebx, [edx+0x20] add ebx, ebp find_function_loop: jecxz find_function_finished dec ecx mov esi, [ebx+ecx*4] add esi, ebp compute_hash: xor edi, edi xor eax, eax cld compute_hash_again: lodsb test al, al jz compute_hash_finished ror edi, 0x0D add edi, eax jmp compute_hash_again compute_hash_finished: find_function_compare: cmp edi, [esp+0x28] jnz find_function_loop mov ebx, [edx+0x24] add ebx, ebp mov cx, [ebx+2*ecx] mov ebx, [edx+0x1C] add ebx, ebp mov eax, [ebx+4*ecx] add eax, ebp mov [esp+0x1C], eax find_function_finished: popad ret ; ======== Function: resolve_symbols_for_dll resolve_symbols_for_dll: lodsd push eax push edx call find_function mov [edi], eax add esp, 0x08 add edi, 0x04 cmp esi, ecx jne resolve_symbols_for_dll resolve_symbols_for_dll_finished: ret main: sub esp, 0x88 ; Allocate space on stack for function addresses mov ebp, esp ; Set ebp as frame ptr for relative offset on stack call find_kernel32 ; Find base address of kernel32.dll mov edx, eax ; Store base address of kernel32.dll in EDX jmp short win32_library_hashes win32_library_hashes_return: pop esi lea edi, [ebp+0x04] ; This is where we store our function addresses mov ecx, esi add ecx, 0x04 ; Length of kernel32 hash list call resolve_symbols_for_dll
Code Listing 1: Full 32-Bit Function Locating Assemly Listing
The Main Function
Setup the Stack and Storage
We start on line 6 with a jump to the main function located on line 83. The first couple of lines make some space on the stack and setup a frame pointer, EBP which will be used throughout the assembly code to reference stored data. This will be important because, once we find the value or address of something, we need a way to reference it later.
Find the Base Address of Kernel32
On line 86 we call the find_kernel32 function to find the base address of kernel32.dll in memory. If you have not read the previous blog posts and would like to understand how the find_kernel32 function works, check out this blog entry. Once the base address is located it is stored in the EDX register on line 87 for safe keeping.
Get the Location of the Encoded Function Names
On line 88 the win32_library_hashes is called which in turn calls win32_library_hashes_return. The return address is then popped into ESI. This section is explained in the previous blog post. The assembly code is simply taking advantage of the behavior of a CALL instruction to obtain the memory address where the hashed function names are stored.
Lines 91 through 93 are setting up our storage location. ESI, and EDI will be used to point to locations we need to reference. ESI will be used to point to the location of the hashed function names and EDI will be used to point to the location where we will store the resolved function addresses.
This is where things start to get more interesting. On line 94 the resolve_symbols_for_dll function is called. It is important to remember that the base address of kernel32 is currently stored in the EDX register. This will be important later.
For the moment, we will focus on just this function and ignore the call to the find_function function. We’ll reference Code Listing 2 to avoid scrolling back-and-forth too much.
On line 3, the lodsd instruction loads the value stored at ESI into the EAX register and increments ESI to point to the next address. If our code were looking for more than one function ESI would be ready and pointing to the next encoded function name.
EAX, now containing the encoded value of our first function name and EDX, which contains the address of kernel32, are then pushed to the stack on lines 4 and 5.
We will skip over the call to find_function for now. All that is important to know is that EAX now contains the address of the function we are looking for after it returns.
On line 7, the address of the resolved function is stored at the address where EDI currently points.
Line 8 restore the stack to its original state before EAX and EDX were pushed to it.
On line 9, EDI is incremented to point to the next location where a function address can be stored.
ESI and ECX is then compared to see if the end of the hashed function list has been reached. If we have not, the function loops until the end of the list is reached. If the end has been reached, the function returns.
1 2 3 4 5 6 7 8 9 10 11 12 13 ; ======== Function: resolve_symbols_for_dll resolve_symbols_for_dll: lodsd push eax push edx call find_function mov [edi], eax add esp, 0x08 add edi, 0x04 cmp esi, ecx jne resolve_symbols_for_dll resolve_symbols_for_dll_finished: ret
Code Listing 2: 32-Bit Resolve Symbols Function
Finding the Function Addresses
Now to cover the two most important sections of Assembly code in this installment. The first bit of Assembly code is responsible for locating the _IMAGE_EXPORT_DIRECTORY by navigating the PE file Structure. Once the _IMAGE_EXPORT_DIRECTORY structure is located the second section of Assembly code in Code Listing 4 will iterate through a list of function names to find the function that matches the function being searched for. Finally, the located address will be saved to a location where it can be later referenced to make function calls.
Locating the Exported Function Names
The first section of code in Code Listing 3 will locate the _IMAGE_EXPORT_DIRECTORY structure. Once the structure has been located, the number of exported functions that is stored in the NumberOfFunctions variable and the RVA of a list of exported function names stored in the AddressOfNames variable will be collected. These two values are needed to iterate through the exported function names to find the function that matches what is being searched for. Once the matching function is located the AddressOfNameOrdinals will be used to obtain the exported functions address.
1 2 3 4 5 6 7 8 9 10 ; ======= Function: find_function find_function: pushad mov ebp, [esp+0x24] mov eax, [ebp+0x3C] mov edx, [ebp+eax+0x78] add edx, ebp mov ecx, [edx+0x18] mov ebx, [edx+0x20] add ebx, ebp
Code Listing 3: 32-Bit Find Function: Locate _IMAGE_EXPORT_DIRECTORY
The following step-by-step walk-through will refer to line numbers from Code Listing 3:
One line 3, a PUSHAD instruction is used to store all of the current registers on the stack. Once the function completes a POPAD instruction will be used to restore the registers. The PUSHAD command places the registers on the stack in the following order (top-down): EAX, ECX, EDX, EBX, Original ESP, EBP, ESI, and EDI
The base address of kernel32.dll, currently stored on the stack at ESP + 0x24 (36 bytes) is then moved to the EBP register on line 4.
The move instruction on line 5 loads the value located at an offset of 0x3C from the base address of kernel32.dll into the EAX register. According to the PE Structure diagram, the base address of the PE Header is located at an offset of 0x3C from the base address of a PE file. EAX now contains the offset from the base of kernel32.dll to the address of the PE Header.
The move instruction on line 6 adds EBP (kernel32.dll’s Base Address), EAX (PE Header offset) and, 0x78 together and stores the result in the EDX register. Looking back at the PE Structure diagram, the offset value of the ExportTable structure is located at that location. EDX now contains the value of the offset of the Export Table from the base of kernel32.dll.
EBP (kernel32.dll’s Base Address) is added to EDX (The offset to the _IMAGE_EXPORT_DIRECTORY) on line 7. EDX now points to the _IMAGE_EXPORT_DIRECTORY structure. The structure’s layout can be seen in Figure 1.
Figure 1: Export Table Structure
0x18 is added to EDX and stored in ECX on line 8. According to the _IMAGE_EXPORT_DIRECTORY structure in Figure 1, the NuberOfNames variable is located at an offset of 0x18. ECX now holds the number of exported functions.
0x20 is added to EDX and stored in EBX on line 9. According to the _IMAGE_EXPORT_DIRECTORY structure in Figure 1, the AddressOfNames variable is located at an offset of 0x20. This variable contains the relative offset from the base of kernel32.dll to a list of exported function names.
On line 10 EBP is added to EDX. EDX now points to a list of exported function names.
Iterating the Function Names to Find a Match
The next section of code will iterate through the AddressOfNames list pointed to by EBX. It will work through the list, backwards, hashing each of the names and comparing them to the hashed value that is stored in our assembly code. When a match is found, the value will be stored where EDI pointed, prior to calling the find_function function. That previous value of EDI is currently located on the stack for safe keeping since this function is going to reuse EDI to store the calculated hash for the current function name being checked. The following step-by-step walk-through will refer to the code in Code Listing 4.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 find_function_loop: jecxz find_function_finished dec ecx mov esi, [ebx+ecx*4] add esi, ebp compute_hash: xor edi, edi xor eax, eax cld compute_hash_again: lodsb test al, al jz compute_hash_finished ror edi, 0x0D add edi, eax jmp compute_hash_again compute_hash_finished: find_function_compare: cmp edi, [esp+0x28] jnz find_function_loop mov ebx, [edx+0x24] add ebx, ebp mov cx, [ebx+2*ecx] mov ebx, [edx+0x1C] add ebx, ebp mov eax, [ebx+4*ecx] add eax, ebp mov [esp+0x1C], eax find_function_finished: popad ret
Code Listing 4: 32-Bit Iterate Function List
Line 2 checks to see if ECX is zero. If ECX has reached zero the function will finish by restoring the registers and returning.
On line 3, ECX is decreased by one, This indicates that the code will iterate the list in reverse.
On line 4, the value pointed to by EBX (The list of function names) is added to the sum of ECX (Number of exported names) multiplied by 4 and stored in ESI. ESI now contains the offset from the base of kernel32 to the last string in the lsit.
EBP (kernel32.dll’s Base Address) is added to ESI on line 5. ESI now points to the last exported function name, a NULL terminated string.
On lines 7 through 18 the hash of function name is calculated and stored in EDI. This process is covered in detail in the previous blog post. If you need to, please refer to it.
The computed hash, stored in the EDI register, is compared with the value stored at [ESP + 0x28] on line 20. The hashed function names is stored at that location. It was pushed to the stack prior to calling the find_function function.
If the values do not match, the loop moves to the next function name. If a match is found it continues on.
The next few steps (line 22 through 28) can be a bit difficult to follow, please refer to Figure 2 to help follow along. 0x24 is added to EDX (_IMAGE_EXPORT_DIRECTORY Structure) and stored in EBX on line 22. According to the _IMAGE_EXPORT_DIRECTORY structure in Figure 1, the AddressOfOrdinals is located at an offset of 0x24. This variable contains the relative offset of a list of ordinal values that correspond with exported functions.
Figure 2: Export Table Visualization
On line 23, EBP (kernel32.dll’s Base Address) is added to EBX. EBX now points to the list of ordinals referenced by the AddressOfOrdinals variable.
EBX (Ordinal list address) is added to the sum of ECX (Function number) multiplied by 2 and stored in CX. CX now contains the offset in the AddressOfFunctions list that will contain the function address that is being searched for.
0x1C is added to EDX (_IMAGE_EXPORT_DIRECTORY Structure) and stored in EBX on line 25. EBX now contains the offset of the AddressOfFunctions variable in the _IMAGE_EXPORT_DIRECTORY structure.
On line 26, EBP (kernel32.dll’s Base Address) is added to EBX to make EBX point to the list of function addresses.
On line 27, ECX (The function number) is multiplied by 4 and added to EBX (The function address list) and stored in EAX. EAX now contains the offset from the base of kernel32.dll to the address of the function that is being searched for.
On line 28, EAX (Offset to function address) is added to EBP (kernel32.dll’s Base Address). EAX now contains the address of the function that is being searched for.
The value is stored at [ESP + 0x1C], which contains the value of EDI that was pushed to the stack by the PUSHAD command at the beginning of the function. This stores the functions address where we can later reference it as needed.
The POPAD function restores the registers from the stack to their previous state and execution returns to the resolve_symbols_for_dll function.
Now, Again But 64-Bit
I’m going to dispense with the play-by-play for the 64-Bit version of this assembly code. I will point out a few of the differences. The following assembly program will locate the address of two functions: LoadLibraryA and CreateProcessA.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 [SECTION .text] BITS 64 _start: jmp main ; Constants win32_library_hashes: call win32_library_hashes_return ; LoadLibraryA R13 dd 0xEC0E4E8E ; CreateProcessA R13 + 0x08 dd 0x16B3FE72 ; ======== Function: find_kernel32 find_kernel32: push rsi mov rax, [gs:0x60] mov rax, [rax+0x18] mov rax, [rax+0x20] mov rax, [rax] mov rax, [rax] mov r11, [rax+0x20] ; Kernel32 Base Stored in R11 pop rsi ret ; ======= Function: find_function find_function: mov eax, [r11+0x3C] mov edx, [r11+rax+0x88] add rdx, r11 ; RDX now points to the IMAGE_DATA_DIRECTORY structure mov ecx, [rdx+0x18] ; ECX = Number of named exported functions mov ebx, [rdx+0x20] add rbx, r11 ; RBX = List of exported named functions find_function_loop: jecxz find_function_finished dec ecx ; Going backwards lea rsi, [rbx+rcx*4] ; Point RSI at offset value of the next function name mov esi, [rsi] ; Put the offset value into ESI add rsi, r11 ; RSI now points to the exported function name compute_hash: xor edi, edi ; Zero EDI xor eax, eax ; Zero EAX cld ; Reset direction flag compute_hash_again: mov al, [rsi] ; Place the first character from the function name into AL inc rsi ; Point RSI to the next character of the function name test al, al ; Test to see if the NULL terminator has been reached jz compute_hash_finished ror edi, 0x0D ; Rotate the bits of EDI right 13 bits add edi, eax ; Add EAX to EDI jmp compute_hash_again compute_hash_finished: find_function_compare: cmp edi, r12d ; Compare the calculated hash to the stored hash jnz find_function_loop mov ebx, [rdx+0x24] ; EBX contains the offset to the AddressNameOrdinals list add rbx, r11 ; RBX points to the AddressNameOrdinals list mov cx, [rbx+2*rcx] ; CX contains the function number matching the current function mov ebx, [rdx+0x1C] ; EBX contains the offset to the AddressOfNames list add rbx, r11 ; RBX points tot he AddressOfNames List mov eax, [rbx+4*rcx] ; EAX contains the offset of the desired function address add rax, r11 ; RAX contains the address of the desired function find_function_finished: ret ; ======== Function: resolve_symbols_for_dll resolve_symbols_for_dll: mov r12d, [r8d] ; Move the next function hash into R12 add r8, 0x04 ; Point R8 to the next function hash call find_function mov [r15], rax ; Store the resolved function address add r15, 0x08 ; Point to the next free space cmp r9, r8 ; Check to see if the end of the hash list was reached jne resolve_symbols_for_dll resolve_symbols_for_dll_finished: ret main: sub rsp, 0x110 ; Allocate space on stack for function addresses mov rbp, rsp ; Set ebp as frame ptr for relative offset on stack call find_kernel32 ; Find base address of kernel32.dll jmp win32_library_hashes win32_library_hashes_return: pop r8 ; R8 is the hash list location mov r9, r8 add r9, 0x08 ; R9 marks the end of the hash list lea r15, [rbp+0x10] ; This will be a working address used to store our function addresses mov r13, r15 ; R13 will be used to reference the stored function addresses call resolve_symbols_for_dll int3
When operating in 64-bit with Assembly, there are a few commands that no longer exist, or are not useful anymore. PUSHAD and POPAD do not work on 64-bit registers. This being the case, they’re of no use and we needed to find an alternative. The LODSB command also does not work on 64-bit registers. It was necessary to replace it with a MOV AL, [RSI] and INC RSI sequence. It does the exact same thing, only with more bytes and instructions.
64-bit Assembly has many more registers available. Registers R8 through R15 have been added. That gives us 8 new places to store things that we need. With this additional storage it was fairly easy to compensate for the missing PUSHAD and POPAD commands. There are some consideration however. To do anything with the function addresses we find now later, we will need to call the functions. 64-Bit stdcall function calls are a bit different according to the documentation. According to the section on Parameter passing, the R8, and R9 registers are used to pass the 3rd and 4th parameters to a function. We should avoid storing anything long-term in them. The same goes for the RCX and RDX parameters.
PE File Offset Differences
The PE Header offset value is still at the offset of 0x3C from the base of kernel32.dll. From there, finding the offset of the _IMAGE_EXPORT_DIRECTORY was a little different. Examining the table dump of the _IMAGE_DOS_HEADER and _IMAGE_NT_HEADERS64 from WinDbg in Figure 3, we can see that the layout is just slightly different. The _IMAGE_DATA_DIRECTORY is located at an offset of 0x70 from the OptionalHeader at offset 0x18. That means that the total offset needs to be 0x88, not 0x78 as it was in 32-Bit mode. The good news is that the layout of the IMAGE_DATA_DIRECTORY is still the same as in Figure 1.
Figure 3: Finding the Export Table in WinDbg
It is all clear as mud, right? It’s OK if it is, it took me a while and a lot of reading to understand what was going on. Spend some time reading about the PE Header and find whatever information you can on the various structures. Run the code in debuggers and see what is happening. WinDbg is excellent for getting a better understanding of the structures. The following, in no particular order, are some links I collected along the way while I was attempting to understand what the above assembly codes do. I hope they will help you on your journey of learning: