PatchGuard Analysis - Part 3

PatchGuard Analysis - Part 3

August 8, 2025·0xbekoo
0xbekoo

Triggering a check

As we have seen before, the several methods used to setup some contexts. In this section, we will see that how these contexts are triggered.

DPC Execution

The frequently way to trigger a check is to use a DPC. The routine set as DeferredRoutine are picked among the following:

0 CmpEnableLazyFlushDpcRoutine
1 ExpCenturyDpcRoutine
2 ExpTimeZoneDpcRoutine
3 ExpTimeRefreshDpcRoutine
4 CmpLazyFlushDpcRoutine
5 ExpTimerDpcRoutine
6 IopTimerDispatch
7 IopIrpStackProfilerDpcRoutine
8 KiBalanceSetManagerDeferredRoutine
9 PopThermalZoneDpc
10 KiTimerDispatch OR KiDpcDispatch
11 KiTimerDispatch OR KiDpcDispatch
12 KiTimerDispatch OR KiDpcDispatch

Among index 0 to 9, functions use an exception handler to fire the check. KiTimerDispatch and KiDpcDispatch functions call the DPC directly without any exception handler. In other words, these functions do their job while PatchGuard hijacks them to hiding itself.

Non-Canonical DeferredContext Pointer

The first objective is to determine whether or not the DPC stacked is a PatchGuard DPC or a usual DPC when one of these pointers is called. All of these function take a DPC structure pointer as parameter and it will be used to determine if the DPC comes from PatchGuard or not.

Also there is cunning. In x64 systems, the memory managements must obey certain rules, in PatchGuard case, it itself is breaking the rule. In the normal case, the addresses are seen like below:

  • Canonical (Normal): 0xFFFF803C98DABD1 (With 0xFFF)
  • Non-Canonical (Anormal): 0x803C98DABD1 (Without 0xFFF)

With breaking the rule, PatchGuard uses exactly Non-Canonical address. So, the checks for the PatchGuard DPC are occurs with this non-canonical address.

The check is done regarding the argument KDPC.DeferredContext, whether it has a canonical address or not. The check is really simple. Here’s an example simple snippet from ExpCenturyDpcRoutine at 0x1403556C5:

mov rax, rbx
sar rax, 2Fh
inc rax
cmp rax, 1
jbe ContextIsNotPatchGuard

mov eax,1
jmp Exit

ContextIsNotPatchGuard:
xor eax,eax

Exit:
ret

If the DeferredContext paramater, which mentioned above has non-canoncial address, KiCustomAccessRoutineX is called. An example from ExpCenturyDpcRoutine:

There are KiCustomAccessRoutine functions between 0 to 9. Each function calls KiCustomRecurseRoutineX (Again between 0 to 9) which corresponds to KiCustomAccessRoutine function. Now we will see these function in next header.


Triggering the Exception Handler

Like i said, a KiCustomAccessRoutineX function calls KiCustomRecurseRoutineX function with two parameters: a counter and the non-canonical DeferredContex. The counter is obtained from the last two bits from the deferred context and plus one.

These KiCustomRecurseRoutineX functions doing simple task in a circle: Decrementing counter and call the next function until the counter is zero. Here’s a diagram:

The counter, which given to KiCustomAccessRoutineX decreases in KiCustomRecurseRoutineX function. An example from KiCustomRecurseRoutine0:

Or A Pseudocode:

int counter = (DeferredContext & 0x3) + 1;  

void KiCustomRecurseRoutine0(int count) { 
if (--count == 0) { 
	// BOOM! 
	*(int*)0xDEADBEEF = 0; 
} else { 
	KiCustomRecurseRoutine1(count); 
}

void KiCustomRecurseRoutine1(int count) { 
if (--count == 0) { 
	// BOOM! 
	*(int*)0xDEADBEEF = 0; 
} else { 
	KiCustomRecurseRoutine2(count); 
}
...

Until the counter is zero, another KiCustomRecurseRoutineX is called. The idea is that PatchGuard will keep decrementing it and eventually, an invalid pointer will be dereferenced. Depending of each original function, a combination of try/except/finally handler will eventually lead to the decryption of the PatchGuard Context Structure.


PatchGuard Context Decryption

The Exception handler is responsible for decrypting the first layer of the PatchGuard Context Structure. Essentially there are two roughly two layer of decryption:

  • First Layer

The first layer of decryption focuses on the whole context structure. There are multiple different code to do so which is summarized in the following list:

Index Routine Layer Encryption
0 CmpEnableLazyFlushDpcRoutine Method 1
1 ExpCenturyDpcRoutine Method 1
2 ExpTimeZoneDpcRoutine Method 1
3 ExpTimeRefreshDpcRoutine Method 2
4 CmpEnableLazyFlushDpcRoutine Method 1
5 ExpTimerDpcRoutine Method 2
6 IopTimerDispatch Method 2
7 IopIrpStackProfilerDpcRoutine Method 1
8 KiBalanceSetManagerDeferredRoutine Method 1
9 PopThermalZoneDpc Method 2
10-12 KiTimerDispatch Method 1 + hardcoded key
10-12 KiDpcDispatch No 1st layer encryption

These encryption/decryption routines use random values from KiWaitNever and KiWaitAlways. These global variables holding random values, generated at boot time and used by KiInitPatchGuardContext to encrypt the PatchGuard Context Structure. You can check an example at 0x1406C8E10 address of PopThermalZoneDpc function. Also, don’t remember that an attacker which want to interact with the structure can also use these global variables.


  • Before Second Layer: First Layer (With a Half)

Before applying the second layer of decryption, PatchGuard rewrites four bytes at beginning of the context structure. Ironically, these bytes represent the code that will decrypt the context structure. For instance, we can see it at 0x1406C587C of ExpCenturyDpcRoutine:

mov byte ptr [r11], 2Eh
mov byte ptr [r11+1], 48h
mov byte ptr [r11+2], 31h
mov byte ptr [r11+3], 11h

Also at 0x1406C9064 (PopThermalZoneDpc), it uses xor with two hardcoded values:

*PatchGuardCtx = 0xAD1B6FF5 ^ 0xBC2A27DB; ; Result = 0x1131482E

In ExpTimeZoneDpcRoutine, it rewrites directly a DWORD32 and rotate it (0x1406C9508):

mov     qword ptr [rbp+38h], 31482E11h
mov     rdx, [rbp+38h]
shl     edx, 18h
mov     rcx, [rbp+38h]
shr     rcx, 8
or      rcx, rdx
mov     [rbp+38h], rcx ; 0x1131482E
mov     rax, [rbp+38h] 
mov     [r11], eax

; Call 0x1131482E
xor     r9d, r9d
xor     r8d, r8d
mov     rdx, [rbp+40h]
mov     rcx, r11
mov     rax, r11
call    _guard_dispatch_icall_no_overrides

At this point, the usage of XOR is typical of Just-In-Time code, and since the code around is not very clear this is a possibility. Otherwise, these “tricks” were introduced volontarely to prevent some magic values to be searchable in the code, but it doesn’t sound like something difficult to overcome.


  • Second Layer

This layer is relevant to CmpAppendDllSection. Remember this function from First Section of PatchGuard Context Structure. The First section of the structure holds the codes of CmpAppendDllSection. The function is called directly at the end of the previous decryption layer called.

Essentially this function has two parts. One of them is that it rewrites its own instruction and decrypt the very next instruction:

CmpAppendDllSection proc near
; Rcx points to address which is current instruction
xor [rcx], rdx
xor [rcx+8], rdx
xor [rcx+10h], rdx
xor [rcx+18h], rdx
xor [rcx+20h], rdx
xor [rcx+28h], rdx
xor [rcx+30h], rdx
xor [rcx+38h], rdx
...

And last part is relevant to decryption loop for the whole context structure:

loc_140BFC65F:
xor [rdx+rcx*8+0C0h], rax
ror rax, cl
btc rax, rax
loop loc_140BFC65F

And then, the context structure will be ready to use.

Passing Control to Verification Routine

After the decryption, two functions are called after that. The first function is called directly from the structure. Thıs function is a copy of sub_140BCADF0 function. This function does two things:

  • Verify the integrity of the PatchGuard Context structure and 47 Routines

Its mainly job is that verify the integrity of the structure and 47 routines. For instance, the first code to be checked is the epilogue of ExpWorkerThread calling KeBugCheckEx at 0x1402C375B:

loc_1402C375B:
mov     r9, rsi
mov     [rsp+78h+BugCheckParameter4], 0FFFFFFFFFFFFFFFFh ; BugCheckParameter4
mov     r8, r15         ; BugCheckParameter2
mov     edx, 5          ; BugCheckParameter1
mov     ecx, 0E4h       ; BugCheckCode
call    KeBugCheckEx

The second check is exception handler of ExpWorkerThread and the last check is KeIpiGenericCall.


  • Initialize WORK_QUEUE_ITEM Structure

As we can see at 0x140BCBB8C, it initialize a WORK_QUEUE_ITEM structure:

StubIndex = Ctx + *(Ctx + 2064);
if ( (*(Ctx + 2520) & 0x8000000) != 0 ) {
	GetRandomValue = __rdtsc();
    RotatedValue = __ROR8__(GetRandomValue, 3); // Shift random value 3 bits right
    HashMix = ((RotatedValue ^ GetRandomValue) * 0x7010008004002001uLL) >> 64;
    StubIndex = KiMachineCheckControl + 16 * (((RotatedValue ^ GetRandomValue) ^ HashMix) & 0xF);
  }
...

*(Ctx + 1992) = StubIndex; /* 0x140BCBB8C */
*(Ctx + 2000) = v137;
*(Ctx + 1976) = 0;

The WorkerRoutine is picked out of three stub that will call a verification routine as a WorkItem. The three stubs are:


  1. As we see previously (check the method 7), a random stub picked from KiMachineCheckControl array if the method 7 selected. In this case the field Parameter points to the PatchGuard context.
  2. The copy of FsRtlUninitializeSmallMcb in the PatchGuard context structure. In this case the Parameter is also the PatchGuard context sub_1401812E0, which is only a stub to call the deferred routine from a DPC passed as a parameter.

The second call is relevant to ExQueueWorkItem. The WORK_QUEUE_ITEM structure which initialized is passed as parameter. We can see it at 0x140520E05 from RtlpComputeEpilogueOffset:

RtlpComputeEpilogueOffset is called by these functions which relevant to DPC Execution method:

Notice that it’s also called by FsRtlTruncateSmallMcb function. In FsRtlTruncateSmallMcb, it is called twice after KiCustomAccessRoutine0 routine (Check 0x1406C9B20 and 0x14068FC7E).

System Thread Method

As we discussed before, PatchGuard creates a system thread in method 3. Pg_InitMethod3SystemThread is called directly in KiInitPatchGuardContext.

Triggering the Exception Handler

In this method, PsCreateSystemThread is called via exception handler. Actually, the codes of triggering the exception handler is strange.

We can start our journey with 0x140BD7F02 address. Here, cpuid instruction is executed by KiInitPatchGuardContext:

loc_140BD7F02:
sti
xor     ecx, ecx
mov     eax, 80000008h
cpuid
mov     edi, eax
shr     edi, 8          ; Shift 8 bytes
mov     [r14+940h], dil ; Pass the result to the structure. 

Firstly, the eax gets 0x80000008 value, then cpuid is executed. After the execution, the result is shifted 8 bytes to the right and the result is passed to 0x940 offset of the structure.

This value 0x80000008 corresponds to linear/physical address size data. From Felix Clouiter[5]:

“Two types of information are returned: basic and extended function information. If a value entered for CPUID.EAX is higher than the maximum input value for basic or extended function for that processor then the data for the highest basic information leaf is returned. For example, using some Intel processors, the following is true: … CPUID.EAX = 80000008H ( Returns linear/physical address size data. )”

Then i wanna see the result and i created a driver with MASM Assembly:

extern DbgPrintEx:PROC

.data
	UnloadedMsg db "Driver Unloaded!",0

.code

UnloadDriver PROC
	sub rsp,28h
	lea r8,[UnloadedMsg]
	xor rdx,rdx
	xor rcx,rcx
	call DbgPrintEx
	add rsp,28h

	xor rax,rax
	ret
UnloadDriver ENDP

DriverEntry PROC
	; /* Prepare UnloadDriver */
	mov rax,rcx
	lea rcx,[UnloadDriver]
	mov qword ptr [rax+68h],rcx

	mov eax,80000008h
	cpuid

	mov edi,eax
	shr edi,8

	xor rax,rax
	ret
DriverEntry ENDP
END

Here’s result:

the size result is 0x302d. After the shift process, edi register is taken 0x30 value:

kd> t
cpuid!DriverEntry+0x17:
fffff800`76591035 c1ef08          shr     edi,8
kd> r edi
edi=302d
kd> t
cpuid!DriverEntry+0x1a:
fffff800`76591038 33c0            xor     eax,eax
kd> r edi 
edi=30

So, we can see that 0x940 offset contains 0x30 value.

This value is used at 0x140BFAEAD to trigger the exception in Pg_InitMethod3SystemThread:

loc_140BFAE88:
...
mov     al, [rsi+940h]  ; Get 0x30
dec     al
movzx   r10d, al        ; r10d -> 0x2F
mov     r11d, 3Fh ; '?'
sub     r11d, r10d      ; Result: 0x10
...
div     r11

The value is subtracted with 0x3F value, so that if the maximum virtual address size is 0x3F, then rbx is 0 and the exception will be triggered:

Breakpoint 0 hit
nt!KiFilterFiberContext+0x29443:
fffff804`a6f5e973 49f7f3          div     rax,r11

kd> r r11
r11=0000000000000010
kd> r r11 = 0 

kd> g
Breakpoint 1 hit
nt!KiFilterFiberContext+0x2949b:
fffff804`a6f5e9cb 803dbf70370000  cmp     byte ptr [nt!KdDebuggerNotPresent (fffff804`a72d5a91)],0

The div operation triggers the loc_140BFAF3A function, then PsCreateSystemThread is called.

Also the value which held in the 0x940 offset is used by FsRtlMdlReadCompleteDevEx. I didn’t tell about the function through this article but this function is PatchGuard routine and contains really long codes. Seems like this value is used with the same purpose at 0x140BC462C:

These registers which used for the math is prepared at 0x140BC45B8:

loc_140BC45B8:
mov     r13d, 28h ; '('
lea     rcx, [r12+918h]
mov     r8d, r13d
lea     rdx, [rbp+8D0h+var_668]
lea     r9d, [r13-23h]
lea     r10d, [r13-27h] ; R10D: 0x1

But i wasn’t sure where it was directed.

Creating New Thread

The System Thread is created at 0x140BFAF46 address:

Recall that the structure KI_FILTER_FIBER_PARAM contains the pointer of PsCreateSystemThread and used by PatchGuard. The StartContext parameter given to PsCreateSystemThread is a pointer to a new type of structure which can be defined as follow:

struct pg_StartContext
{
	ULONG64 Event; Just a pointer to the event in the very
	; ... same structure
	ULONG64 Random_ShouldRunKeRundownApcQueues; set at 0x140BFAE17
	ULONG64 unknown_0x10;
	KEVENT_ Event;
};

The Even Object is initialized before the exception handler, and in sub_14068F650 function, KeWaitForSingleObject is waiting on this object to be signaled at 0x14068F6F3:

This event is notified at the end of the KiInitPatchGuardContext. Also note that there is no timeout (set to 0) for the first time this method is used. Function Pg_InitMethodSystemThread returns a pointer to the structure and the event is notified at the end of KiInitPatchGuardContext at 0x140BF9E3F:

Then the decryption process will be started.

The Decryption Process

The decryption process is basically the same as the one used by DPCs: a two stages decryption with an additional hard-coded prologue. The first stage uses KiWaitNever and KiWaitAlways and the second stage is performed by CmpAppendDllSection’s copy, just like in the DPC case, which eventually calls the verification routine.

Post verification for this case only

Once the verification routine ended, the context is restored to a waiting state with either KeDelayExecutionThread or KeWaitForSingleObject, but this time with a timeout set between 2’ and 2'10". From FsRtlMdlReadCompleteDevEx:

loc_140BC26F1:
	...
	mov r10, [rbp+8D0h+arg_8] ; Get Timeout

[...]

loc_140BC274A:
	mov r9, [rbp+8D0h+var_950] ; Get the address of KeDelayExecutionThread

loc_140BC274E:
	test r10, r10 ; If the timeout is 0, then execute KeDelayExecutionThread
	jz short loc_140BC2770

	; Call KeWaitForSingleObject
	mov rax, [rbp+8D0h+var_8F0] ; Get the address of the function
	lea r8, [rbp+8D0h+var_850]
	mov edx, r14d
	mov [rsp+9D0h+BugCheckParameter4], r10 ; Pass the timeout
	mov rcx, rsi
	call KeGuardDispatchICall

loc_140BC2770:
	; Call KeDelayExecutionThread
	xor edx, edx
	test r11, r11
	jnz short loc_140BC278A
	lea r8, [rbp+8D0h+var_850]
	xor ecx, ecx
	mov rax, r9 ; Pass the address KeDelayExecutionThread to rax
	call KeGuardDispatchICall

The timeout is initialized at 0x140BC24E1:

loc_140BC24E1:
...
mov r10, [rsi+0AB8h]
mov ecx, 2
mov eax, [rsi+9DCh]
mov r14d, [rsi+804h]
mov r11, [rsi+0A40h]
mov r12d, [rsi+828h]
mov [rbp+8D0h+arg_8], r10

Also these functions address are initialized at 0x140BC252F, and the variables which used for the functions are prepared randomly at the same address:

loc_140BC252F:
mov rax, [rsi+2C8h] ; Get the Address of KeWaitForSingleObject
mov r9, [rsi+170h]  ; Get the address of KeDelayExecutionThread
mov [rbp+8D0h+var_8D8], rax
mov rax, [rsi+340h]
mov [rbp+8D0h+var_8F0], rax
mov [rbp+8D0h+var_950], r9
APC insertion

As we saw previously, the fourth method insert an APC in system thread queue. The system thread must have the pointer of PopIrpWorkerControl as a StartAddress Parameter. From 0x140BFB2AF:

Also recall that The KernelRoutine parameter which given to KiInsertQueueApc is KiDispatchCallout.

Similar to the DPC and system thread methods, this method uses a two-stage decryption process and rewrites the beginning of the context with a fixed XOR value. This approach is quite fast because APCs are delivered quickly. However, unlike the previous methods, it waits during verification to make sure a minimum time has passed—between 2 minutes and 2 minutes 10 seconds—before continuing.

KiSwInterruptDispatch Method

This method uses the Global PatchGuard Context Structure, which is in cleartext. This means that there is no decryption process and the verification routine is called directly at some point in KiSwInterrupt.

Breadcrumbs

Breadcrumbs methods are special because they work automatically without specific code to start their checks. However, they do not run all the time. For instance, recall CcInitializeBcbProfiler. It either queues a work item using a related function or continues running on its own. The other two verification functions, PspProcessDelete and KiInitializeUserApc, only use a simple timer (just a time counter stored in a global variable, not the TIMER structure) to control when they run.

Last updated on