Syscalls on Windows 11
Reversing the Windows 11 Syscall Handler
Author: Andrea Di Dio
If you have any further questions or suggestions after reading this writeup feel free to contact me at a.didio@student.vu.nl or on Twitter (@hammertux). I will try to answer any questions or adopt any suggestions :)
Introduction
The aim of this post is to briefly explain the kernel routines involved in a Windows system call (in Windows terminology, a system service). In order to retrieve such information I used Ghidra in order to reverse engineer the relevant parts of the ntoskrnl.exe
binary and WinDbg in order to step through the kernel execution upon a system service.
Syscalls (aka System Services)
Relevant Components
ntdll.dll
: special system support library which resides in user-space and offers system service dispatch stubs to Windows executive system services.- Relevant MSRs:
STAR
(0xc0000081)LSTAR
(0xc0000082)SFMASK
(0xc0000084)
- Kernel Processor Control Region (
KPCR
): Contains information about the current processor. Each processor has its own KPCR. Contains plenty of metadata including a pointer (Prcb
) to aKPRCB
(see below). - Kernel Processor Control Block (
KPRCB
): Massive structure which contains more information about the state of a processor. Notably, it holds a pointer (CurrentThread
) which points to aKTHREAD
structure that olds the state of the current running kernel thread. The offset of this member is hardcoded to beKPCR + 0x188
.
The syscall life-cycle
Syscall Dispatch:
- User issues call to a specific syscall (e.g.,
NtCreateFile()
). - This invokes the
ntdll!Nt*
stub which stores the corresponding syscall number inEAX
(forNtCreateFile()
this is0x55
) and executes thesyscall
instruction to trap into the kernel. - When the syscall instruction is executed:
- The Code Segment (
CS
) is loaded from Bits 32 to 47 inSTAR
, which Windows sets to 0x0010 (KGDT64_R0_CODE
). - The Stack Segment (
SS
) is loaded from Bits 32 to 47 inSTAR
+ 8, which gives us 0x0018 (KGDT_R0_DATA
). - The Program Counter (
RIP
) is saved inRCX
and the new value (i.e., the address of the syscall handler) is loaded fromLSTAR
. This either resolves to thent!KiSystemCall64
ornt!KiSystemCall64Shadow
functions. - The processor flags (
RFLAGS
) are saved inR11
and then masked withSFMASK
, which Windows sets to 0x4700 (Trap Flag, Direction Flag, Interrupt Flag, and Nested Task Flag). - The Stack Pointer (
RSP
) and all other segments (DS
,ES
,FS
, andGS
) are kept to their current user-space values.
- The Code Segment (
- In the
nt!KiSystemCall64
syscall handler, after theswapgs
instruction, theGS
points to the Kernel Processor Control Region (KPCR
). - The current
RSP
is saved into theUserRsp
field of theKPCR
- The new stack pointer is loaded from the
RspBase
field of theKPRCB
- Now that the kernel stack is loaded, the function builds a trap frame. This includes storing in the frame the
SegSs
set toKGDT_R3_DATA (0x2B)
, Rsp from the UserRsp in the PCR, EFlags from R11, SegCs set to KGDT_R3_CODE(0x33), and storing Rip from RCX. - Load
RCX
fromR10
to comply with the x64 ABI which requires the first argument of any function to be inRCX
. - Flush uarch buffers and temporarily disable SMAP with the
stac
instruction.
Returning to Userland:
Calls:
nt!KiSystemCall64
- Sets up the kernel stack and executes the
stac
instruction to disable SMAP.
- Sets up the kernel stack and executes the
nt!KiSystemServiceUser
- Retrieves
KTHREAD
structure’s address from theGS
register and stores it inRBX
. - Sets the
FirstArgument
(fromRCX
) andSystemCallNumber
(fromEAX
) fields of theKTHREAD
structure.
- Retrieves
nt!KiSystemServiceStart
- Carves out of the syscall number two fields:
Table Identifier
(bits [12-13]) and theSystem Call Number
(bits [0-11]). The first can only have the value 00 (for native syscalls i.e., those coming fromntdll.dll
) or 01 (for GUI functions i.e., those coming fromwin32u.dll
) as there are two ‘syscall tables’. After this function, the table identifier is inEDI
and the true syscall number inEAX
.
- Carves out of the syscall number two fields:
nt!KiSystemServiceRepeat
- Loads in
R10
andR11
twoService Descriptor Tables
(SDT
) namely, thent!KeServiceDescriptorTable
and thent!KeServiceDescriptorTableShadow
. The first contains theSystem Service Table
a.k.aSystem Service Dispatch Table
—SSDT
(nt!KiServiceTable
) for native system calls while the latter holds the same table plus the System Service Table for GUI functions (win32k!W32pServiceTable
). - Check
RBX
(where the currentKTHREAD
is stored) to see if theGuiThread
bit is set in the flag member ofKTHREAD + 0x78
. (test dword ptr [RBX + 0x78], 80h
). Note that on the first call to this function, theGuiThread
flag is not set as the kernel cannot know whether the current thread is executing a GUI function or not. - If the
GuiThread
flag is not set, i.e., a native syscall has been issued from a thread, the code jumps to a check to see ifEAX
is above the address[R10 + RDI + 0x10]
(i.e., System Call Number > [nt!KeServiceDescriptorTable
+ Table Identifier + 0x10]). This essentially check that the System Call Number is below thent!KiServiceTable->ServiceLimit
. - If the check fails (i.e., the syscall number is above the possible range), the code checks to see if
EDI == 0x20
to check if the function is a GUI function. If it is, the thread is converted to a GUI thread (nt!KiConvertGuiThread
) and the code jumps back to the start ofnt!KiSystemServiceRepeat
. IfEDI != 0x20
, the syscall number is out of range and the routine exits the syscall processing. - In the case of a GUI thread, the code checks for another flag in the
KTHREAD
structure (test dword ptr [RBX + 0x78], 200000h
) namely, theRestrictedGuiThread
flag. If this flag is set,nt!KeServiceDescriptorTableFilter
SSDT is loaded inR10
. If not, the normal GUI SSDT is loaded (nt!KeServiceDescriptorTableShadow
). - After this code block,
R10
is incremented byRDI
(Table Identifier) meaning that it will hold eithernt!KeServiceDescriptorTable + 0x00
(i.e.,nt!KiServiceTable
) for native syscalls, ornt!KeServiceDescriptorTableShadow + 0x20
(i.e.,win32k!W32pServiceTable
)for GUI functions (disregarding win32k filters for simplicity). - The entry in the SSDT for the corresponding System Call Number is loaded in
R11
with the instructionmovsxd R11, dword ptr [R10 + RAX * 0x4]
i.e.,R11 = SSDT + System Call Number * 4
where the multiplication by 4 is needed as each entry in the SSDT is 4 bytes long. Each entry in the SSDT contains two values. The lowest order nibble (bits [0-3]) holds the number of arguments for the function passed via the stack (Following x64 convention, the first 4 arguments are passed via registersRCX
,RDX
,R8
and,R9
meaning that this part of the SSDT entry is 0 for all syscalls with at most 4 arguments). The rest of the entry (bits [4-31]) contain the Relative Value Address (RVA
) in the SSDT of the correct native function. - At this point, the last operation this function executes is to add the RVA of the function to the correct SSDT which resolves to the address of the internal routine corresponding to the syscall issued in userland. (
add R10, R11
). If the function is a GUI function, the last step is to load the Thread Execution Environment structure (TEB
) inR11
with the following instructionmov R11,qword ptr [RBX + 0xf0]
. If we are dealing with a normal NT syscall we jump in the middle of the next function (nt!KiSystemServiceGdiTebAccess
) to skip some GUI related checks.
- Loads in
nt!KiSystemServiceGdiTebAccess
- In the case of a GUI function, at the start of the function, there is a check to see if the
Teb->GDIBatchCount
is 0 or not (cmp dword ptr [R11 + 0x1740],0x0
). If this field is 0, it skips the next 19 instructions and jumps to the same point as for a normal NT syscall. - Following the code path for normal NT syscalls, the function will once again check to see if the syscall has arguments on the stack
and EAX, 0xF
. If there are no arguments passed on the stack for the syscall, the function jumps directly tont!KiSystemServiceCopyEnd
. On the other hand, if the syscall has arguments, the code makes space on the stack for the maximum number of arguments, multiplies the number of arguments by 8 (shl EAX, 0x3
). - Check if the provided arguments are in the address range reserved for userspace by comparing
RSI
tont!MMUserProbeAddress
. - Finally, the code loads the address of the
nt!KiSystemServiceCopyEnd
routine intoR11
. SubtractsEAX
(number of args * 8) fromR11
. This is done because thent!KiSystemServiceCopyStart
routine precedes thent!KiSystemServiceCopyEnd
routine and only containsmov
instructions which are 4 bytes each. For each argument twomov
instructions are needed to copy the user data in kernel space. The code then jumps to the value stored inR11
.
- In the case of a GUI function, at the start of the function, there is a check to see if the
nt!KiSystemServiceCopyStart
- This function is only ever called when the system call has arguments passed on the stack. I.e., the code in this function is only executed if the syscall being executed has more than 4 arguments.
- The sole purpose of this function is to copy the arguments from the user space stack (in
RSI
) to the kernel space buffer (inRDI
).
nt!KiSystemServiceCopyEnd
- In this function, the kernel system service routine is finally called, e.g.,
nt!NtCreateFile()
. This is done by copyingR10
inEAX
and jumping toEAX
.
- In this function, the kernel system service routine is finally called, e.g.,
nt!KiSystemServiceExit
- Cleanup and
swapgs / sysret
- Cleanup and
WinDbg Flow:
Tracing a function from a usermode application in the kernel
Setup:
- Compile the uspace binary without ASLR (
/DYNAMICBASE:NO
) - Note down the address of the
main
function for the uspace binary. Can be found with some static analysis tool like ghidra or IDA.
In kd:
!gflag +ksl #loads kern symbols
sxe ld <uspace_binary> #breaks whenever the application <uspace_binary> is started
g #resume execution of target machine
Open the uspace_binary on the target machine. This will hit the breakpoint.
!gflag -ksl
!process -1 0 #Check that the current process is running
.process #Set the process to be the implicit process
bp <addr_main> #Set a breakpoint in the main function of the uspace_binary
g # Resume execution
Conclusion
In this short post we have explored the kernel routines which are involved in handling a system service on Windows 11 by reversing parts of the binary in order to uncover what happens when a userspace process issues such a system service.