Analyzing 0x9E BSOD–Part II

Continuing some tips from the previous blog on troubleshooting a 0x9E BSOD, I mentioned that we should use hang dump analysis tactics to this particular blue screen.

Using the same kernel dump from the blog, we checked that !exqueue /f showed several Pending I/O’s. This is a key symptom to any hang dump and besides that we need to understand what does it mean? Here it is a good read from msdn – http://msdn.microsoft.com/en-us/library/windows/hardware/ff564587(v=vs.85).aspx .

Good extract to understand it:

A DPC that needs to initiate a processing task that requires lengthy processing or that makes a blocking call should delegate the processing of that task to one or more work items. While a DPC runs, all threads are prevented from running. Additionally, a DPC, which runs at IRQL = DISPATCH_LEVEL, must not make blocking calls. However, the system worker thread that processes a work item runs at IRQL = PASSIVE_LEVEL. Thus, the work item can contain blocking calls. For example, a system worker thread can wait on a dispatcher object.

Because the pool of system worker threads is a limited resource, WorkItem and WorkItemEx routines can be used only for operations that take a short period of time. If one of these routines runs for too long (if it contains an indefinite loop, for example) or waits for too long, the system can deadlock. Therefore, if a driver requires long periods of delayed processing, it should instead call PsCreateSystemThread to create its own system thread.”

If you check the output from the exqueue /f, you notice a lot of PENDING IO which means there are no more System Worker Threads to process those, we need to check how long the threads in WAIT State are there.

Looking at the output

6: kd> !exqueue
**** Critical WorkQueue ( Threads: 26/512, Concurrency: 0/12 )
THREAD fffffa8024c5b360  Cid 0004.001c  Teb: 0000000000000000 Win32Thread: 0000000000000000 WAIT
THREAD fffffa8024c4b040  Cid 0004.0020  Teb: 0000000000000000 Win32Thread: 0000000000000000 WAIT
THREAD fffffa8024c4bb50  Cid 0004.0024  Teb: 0000000000000000 Win32Thread: 0000000000000000 WAIT
THREAD fffffa8024c4b660  Cid 0004.0028  Teb: 0000000000000000 Win32Thread: 0000000000000000 WAIT
THREAD fffffa8024c4a040  Cid 0004.002c  Teb: 0000000000000000 Win32Thread: 0000000000000000 WAIT
THREAD fffffa8024c4ab50  Cid 0004.0030  Teb: 0000000000000000 Win32Thread: 0000000000000000 WAIT
THREAD fffffa8024c4a660  Cid 0004.0034  Teb: 0000000000000000 Win32Thread: 0000000000000000 WAIT
THREAD fffffa8024c49040  Cid 0004.0038  Teb: 0000000000000000 Win32Thread: 0000000000000000 WAIT
THREAD fffffa8024c49b50  Cid 0004.003c  Teb: 0000000000000000 Win32Thread: 0000000000000000 WAIT
THREAD fffffa8024c49660  Cid 0004.0040  Teb: 0000000000000000 Win32Thread: 0000000000000000 WAIT
THREAD fffffa803d6d3b50  Cid 0004.2e10  Teb: 0000000000000000 Win32Thread: 0000000000000000 WAIT
THREAD fffffa8040e58b50  Cid 0004.24e8  Teb: 0000000000000000 Win32Thread: 0000000000000000 WAIT
THREAD fffffa804cfbdb50  Cid 0004.1e7c  Teb: 0000000000000000 Win32Thread: 0000000000000000 WAIT
THREAD fffffa80352a8710  Cid 0004.2d9c  Teb: 0000000000000000 Win32Thread: 0000000000000000 WAIT
THREAD fffffa8024c89b50  Cid 0004.1fdc  Teb: 0000000000000000 Win32Thread: 0000000000000000 WAIT
THREAD fffffa8036e74850  Cid 0004.3214  Teb: 0000000000000000 Win32Thread: 0000000000000000 WAIT
THREAD fffffa80316fdb50  Cid 0004.2028  Teb: 0000000000000000 Win32Thread: 0000000000000000 WAIT
THREAD fffffa8030f53b50  Cid 0004.2e34  Teb: 0000000000000000 Win32Thread: 0000000000000000 WAIT
THREAD fffffa8040c2db50  Cid 0004.32ec  Teb: 0000000000000000 Win32Thread: 0000000000000000 WAIT
THREAD fffffa8033f90670  Cid 0004.1a60  Teb: 0000000000000000 Win32Thread: 0000000000000000 WAIT
THREAD fffffa802fe4ba80  Cid 0004.1b20  Teb: 0000000000000000 Win32Thread: 0000000000000000 WAIT
THREAD fffffa80479ea040  Cid 0004.14f0  Teb: 0000000000000000 Win32Thread: 0000000000000000 WAIT
THREAD fffffa80311d32f0  Cid 0004.1a1c  Teb: 0000000000000000 Win32Thread: 0000000000000000 WAIT
THREAD fffffa803b49fb50  Cid 0004.2d54  Teb: 0000000000000000 Win32Thread: 0000000000000000 WAIT
THREAD fffffa8030da6640  Cid 0004.182c  Teb: 0000000000000000 Win32Thread: 0000000000000000 WAIT
THREAD fffffa803c989b50  Cid 0004.26ec  Teb: 0000000000000000 Win32Thread: 0000000000000000 WAIT

Dumping the first thread, but you should examine all of them to check if they have the same behavior.

6: kd> !thread fffffa8024c5b360
THREAD fffffa8024c5b360  Cid 0004.001c  Teb: 0000000000000000 Win32Thread: 0000000000000000 WAIT: (Executive) KernelMode Non-Alertable
fffff8800252ddf0  NotificationEvent
IRP List:
fffffa802fd05790: (0006,0430) Flags: 00060043  Mdl: fffffa8055b40bd0
Not impersonating
DeviceMap                 fffff8a0000087d0
Owning Process            fffffa8024c39040       Image:         System
Attached Process          N/A            Image:         N/A
Wait Start TickCount      10307049       Ticks: 933617 (0:04:02:44.518)
Context Switch Count      949185         IdealProcessor: 1
UserTime                  00:00:00.000
KernelTime                00:00:58.983
Win32 Start Address nt!ExpWorkerThread (0xfffff800064ce4a8)
Stack Init fffff8800252edb0 Current fffff8800252d790
Base fffff8800252f000 Limit fffff88002529000 Call 0
Priority 16 BasePriority 13 UnusualBoost 0 ForegroundBoost 0 IoPriority 2 PagePriority 5
Child-SP          RetAddr           : Args to Child                                                           : Call Site
fffff880`0252d7d0 fffff800`064baf72 : fffff880`0252de01 fffffa80`24c5b360 00000000`00000000 00000000`00000000 : nt!KiSwapContext+0x7a
fffff880`0252d910 fffff800`064cc39f : fffff880`0252de80 fffffa80`53852030 00000000`00000000 fffff880`0252db40 : nt!KiCommitThreadWait+0x1d2
fffff880`0252d9a0 fffff880`0182309f : fffffa80`53a53100 00000000`00000000 fffffa80`53a45700 fffff880`0141c600 : nt!KeWaitForSingleObject+0x19f
fffff880`0252da40 fffff880`0182bd74 : fffff880`0252de80 fffffa80`2fd05790 00000000`00000000 00000000`00000000 : Ntfs!NtfsNonCachedIo+0x23f
fffff880`0252dc10 fffff880`018271b5 : fffff880`0252de80 fffffa80`2fd05790 fffffa80`53a45720 00000000`03e92000 : Ntfs!NtfsNonCachedUsaWrite+0x64
fffff880`0252dca0 fffff880`01827e73 : fffff880`0252de80 fffffa80`2fd05790 fffff880`0252e000 00000000`00001000 : Ntfs!NtfsCommonWrite+0x2ca4
fffff880`0252de50 fffff880`0141dbcf : fffffa80`2fd05b30 fffffa80`2fd05790 fffffa80`54349ad0 00000000`00000000 : Ntfs!NtfsFsdWrite+0x1c3
fffff880`0252e0d0 fffff880`0141c6df : fffffa80`3073e470 fffffa80`55548570 fffffa80`3073e400 fffffa80`2fd05790 : fltmgr!FltpLegacyProcessingAfterPreCallbacksCompleted+0x24f
fffff880`0252e160 fffff880`0141dbcf : fffffa80`2fd05b78 fffff880`0141e329 fffff880`0252e200 00000000`000c1004 : fltmgr!FltpDispatch+0xcf
fffff880`0252e1c0 fffff880`0141c6df : fffffa80`53a42380 fffffa80`53a45630 fffffa80`53a42300 fffffa80`2fd05790 : fltmgr!FltpLegacyProcessingAfterPreCallbacksCompleted+0x24f
fffff880`0252e250 fffff800`064aa8ff : fffffa80`2fd05790 fffff880`0252e7e0 fffff880`0252e460 fffff880`009f0180 : fltmgr!FltpDispatch+0xcf
fffff880`0252e2b0 fffff800`06508adb : 00000000`03e92000 fffff880`0252e7e0 fffffa80`53a45570 00000000`00000000 : nt!IoSynchronousPageWrite+0x24f
fffff880`0252e330 fffff800`06507168 : fffff8a0`020b2490 fffff8a0`020b2498 fffffa80`53a45570 fffffa80`53a45570 : nt!MiFlushSectionInternal+0xb7b
fffff880`0252e570 fffff800`065066cc : 00000000`00001000 00000000`00001000 00000000`00001000 00000004`f47d2500 : nt!MmFlushSection+0x1f4
fffff880`0252e630 fffff880`0182b70b : fffffa80`53a4df00 00000004`00000000 00000000`00001000 00000000`00000000 : nt!CcFlushCache+0x7bc
fffff880`0252e730 fffff880`018d6dfb : fffff8a0`02090e20 00000004`f47d266e 00000000`00000000 fffff880`00000000 : Ntfs!LfsFlushLfcb+0x647
fffff880`0252e8b0 fffff880`018d8f10 : fffff8a0`020c7760 00000004`f47d266e fffff880`0252ec58 fffff880`0252ec58 : Ntfs!LfsFlushToLsnPriv+0x143
fffff880`0252e940 fffff800`06507a31 : fffff8a0`020c7760 00000004`f47d266e 00000004`f47d266e fffffa80`53a69790 : Ntfs!LfsFlushToLsn+0xa0
fffff880`0252e970 fffff800`065061d2 : fffffa80`53a69790 fffffa80`3241dd80 fffffa80`00000000 fffff880`0252eaa8 : nt!CcAcquireByteRangeForWrite+0x7e7
fffff880`0252ea50 fffff800`06509e46 : fffffa80`53a5d240 00000000`00000001 fffffa80`00000001 fffffa80`00001000 : nt!CcFlushCache+0x2c2
fffff880`0252eb50 fffff800`0650a814 : fffff800`066ce900 fffff880`0252ec58 fffffa80`30d198f0 00000002`c4310000 : nt!CcWriteBehind+0x1c6
fffff880`0252ec00 fffff800`064ce5b9 : fffffa80`24c66530 fffff800`067be300 fffff800`066ce920 00000000`00000000 : nt!CcWorkerThread+0x1c8
fffff880`0252ecb0 fffff800`06763d6a : 00000000`00000000 fffffa80`24c5b360 00000000`00000080 fffffa80`24c39040 : nt!ExpWorkerThread+0x111
fffff880`0252ed40 fffff800`064b6266 : fffff880`021b7180 fffffa80`24c5b360 fffff880`021c21c0 00000000`00000000 : nt!PspSystemThreadStartup+0x5a
fffff880`0252ed80 00000000`00000000 : fffff880`0252f000 fffff880`02529000 fffff880`0252dd30 00000000`00000000 : nt!KiStartSystemThread+0x16

We notice that the threads are doing IO against the file system and there is a IRP ( IO Request Packet ) associate with that. Another piece of information is the tick count which is absurdly high!!!  And if you dump all other threads in the output you see that the WAIT time is very similar. Another tip is to dump the SYSTEM process and look for threads with that high WAIT time associated with a Worker Thread.

Now you can dump the IRP associated with that thread:

6: kd> !irp fffffa802fd05790
Irp is active with 12 stacks 7 is current (= 0xfffffa802fd05a10)
Mdl=fffffa8055b40bd0: No System Buffer: Thread fffffa8024c5b360:  Irp stack trace.
cmd  flg cl Device   File     Completion-Context
[  0, 0]   0  0 00000000 00000000 00000000-00000000   

            Args: 00000000 00000000 00000000 00000000
[  0, 0]   0  0 00000000 00000000 00000000-00000000   

            Args: 00000000 00000000 00000000 00000000
[  0, 0]   0  0 00000000 00000000 00000000-00000000   

            Args: 00000000 00000000 00000000 00000000
[  0, 0]   0  0 00000000 00000000 00000000-00000000   

            Args: 00000000 00000000 00000000 00000000
[  0, 0]   0  0 00000000 00000000 00000000-00000000   

            Args: 00000000 00000000 00000000 00000000
[  0, 0]   0  0 00000000 00000000 00000000-00000000   

            Args: 00000000 00000000 00000000 00000000
>[  4,34]  1c e0 fffffa80537a1790 00000000 fffff880012831b0-00000000 Success Error Cancel
\Driver\Disk    partmgr!PmReadWriteCompletion
Args: 00001000 00000000 bbd82000 00000000
[  4, 0]  1c e0 fffffa80537a2040 00000000 fffff88001293010-fffffa80537e0e20 Success Error Cancel
\Driver\partmgr    volmgr!VmpReadWriteCompletionRoutine
Args: 613ffaf49d 00000000 bbd82000 00000000
[  4, 0]   c e0 fffffa80537e0cd0 00000000 fffff880011c41b0-fffffa803283a5d0 Success Error Cancel
\Driver\volmgr    volsnap!VspWriteContextCompletionRoutine
Args: 00001000 00000000 613ffaf49b 00000000
[  4, 0]   c e1 fffffa8053852030 00000000 fffff88001823fe4-fffff8800252dde8 Success Error Cancel pending
           \Driver\volsnap    Ntfs!NtfsMasterIrpSyncCompletionRoutine
Args: 00001000 00000000 bbd72000 00000000
[  4, 0]   0 e0 fffffa8053a55030 fffffa8053a45630 fffff88001420e40-fffffa8055548570 Success Error Cancel
\FileSystem\Ntfs    fltmgr!FltpPassThroughCompletion
Args: 00001000 00000000 03e92000 00000000
[  4, 0]   0  1 fffffa8053a42380 fffffa8053a45630 00000000-00000000    pending
\FileSystem\FltMgr
Args: 00001000 00000000 03e92000 00000000

We actually see there is some pending IO !!!

For the next blog we are going to chase down the IO and recognize in which part of the IO stack it is actually pending so we can address it.

Thanks for the company and keep debugging on,

Alessandro

Advertisements

About smartwindows

Support professional for Microsoft technologies with interest in Performance and Debugging
This entry was posted in BSOD, Hang, Performance and tagged . Bookmark the permalink.

One Response to Analyzing 0x9E BSOD–Part II

  1. Pingback: Analyzing 0x9E BSOD–Part III | SmartWindows

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s