Bugcheck 0x124–Hardware error has occurred

There are some bugcheck codes that scream the root cause right in their description and 0x124 is one of them.

13: kd> !analyze -show 0x124
WHEA_UNCORRECTABLE_ERROR (124)
A fatal hardware error has occurred. Parameter 1 identifies the type of error
source that reported the error. Parameter 2 holds the address of the
WHEA_ERROR_RECORD structure that describes the error condition.
Arguments:
Arg1: 0000000000000000, Machine Check Exception
Arg2: 0000000000000000, Address of the WHEA_ERROR_RECORD structure.
Arg3: 0000000000000000, High order 32-bits of the MCi_STATUS value.
Arg4: 0000000000000000, Low order 32-bits of the MCi_STATUS value.

Whenever we see this bugcheck you can be sure that some hardware component failed and we need to find it out. Luckily this is rather simple today with WHEA – Windows Hardware Error Architecture .

Suppose you received the following bugcheck

13: kd> .bugcheck
Bugcheck code 00000124
Arguments 00000000`00000005 fffffa80`0d514028 00000000`00000000 00000000`00000000

The first thing is to dump the Error Record structure from the bugcheck with !errrec extension.

13: kd> !errrec fffffa800d514028
===============================================================================
Common Platform Error Record @ fffffa800d514028
——————————————————————————-
Record Id     : 01cfa323af4475e4
Severity      : Fatal (1)
Length        : 408
Creator       : Microsoft
Notify Type   : Generic
Timestamp     : 7/20/2014 23:56:37 (UTC)
Flags         : 0x00000000

===============================================================================
Section 0     : PCI Express
——————————————————————————-
Descriptor    @ fffffa800d5140a8
Section       @ fffffa800d5140f0
Offset        : 200
Length        : 208
Flags         : 0x00000001 Primary
Severity      : Fatal

Port Type     : Root Port
Version       : 1.0
Command/Status: 0x0546/0x4010
Device Id     :
VenId:DevId : 8086:3410 —> DEVICE REPORTING THE PROBLEM
Class code  : 060400
Function No : 0x00
Device No   : 0x09
Segment     : 0x0000
Primary Bus : 0x00
Second. Bus : 0x07
Slot        : 0x0000
Sec. Status   : 0x6000
Bridge Ctl.   : 0x0007
Express Capability Information @ fffffa800d514124
Device Caps : 00008021 Role-Based Error Reporting: 1
Device Ctl  : 012c UR FE nf ce
Dev Status  : 0004 ur FE nf ce
Root Ctl   : 000e FS NFS cs

If you want to confirm check the time reported in the entry with the bugcheck time with .time

With the vendor and device ID you can find it from !pcitree extension .

Here it is the section with the device

(d=9,  f=0) 80863410 devext 0xfffffa800a2061b0 devstack 0xfffffa800a206060 0604 Bridge/PCI to PCI
Bus 0x7 (FDO Ext fffffa800a227e00)
(d=0,  f=0) 111d8061 devext 0xfffffa800a276b60 devstack 0xfffffa800a276a10 0604 Bridge/PCI to PCI
Bus 0x8 (FDO Ext fffffa800a266850)
(d=2,  f=0) 111d8061 devext 0xfffffa800a27bb60 devstack 0xfffffa800a27ba10 0604 Bridge/PCI to PCI
Bus 0x9 (FDO Ext fffffa800a273850)
(d=0,  f=0) 14e41639 devext 0xfffffa800a2801b0 devstack 0xfffffa800a280060 0200 Network Controller/Ethernet
(d=0,  f=1) 14e41639 devext 0xfffffa800a2811b0 devstack 0xfffffa800a281060 0200 Network Controller/Ethernet
(d=4,  f=0) 111d8061 devext 0xfffffa800a27cb60 devstack 0xfffffa800a27ca10 0604 Bridge/PCI to PCI
Bus 0xa (FDO Ext fffffa800a27c7b0)
(d=0,  f=0) 14e41639 devext 0xfffffa800a284b60 devstack 0xfffffa800a284a10 0200 Network Controller/Ethernet
(d=0,  f=1) 14e41639 devext 0xfffffa800a285b60 devstack 0xfffffa800a285a10 0200 Network Controller/Ethernet

In this case it is a PCI Bridge , and by the devices attached to it ( Network Controller ) you can determine which PCI bridge you are investigating.

It is worth noticing that WHEA entries are stored in the event viewer, so you actually see all the entries much easier: Applications and Services Logs\Microsoft\Windows\Kernel-WHEA .

Before replacing make sure you are in the latest firmware for the devices and motherboard according to vendor. If you continue having the issue and it is possible to isolate the hardware in the bridge by removing one at a time.

This technique is similar from a previous post I had with pci error and you can use it to further identify the issue.

Thanks and good hunting,

Alessandro

Advertisements

About smartwindows

Support professional for Microsoft technologies with interest in Performance and Debugging
This entry was posted in WinDBG Trick and tagged . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s