Recently I noticed that every few days the kernel ring buffer dmesg on my work laptop gives the following error:
[Jun11 23:49] mce: [Hardware Error]: Machine check events logged
However, when I navigate to /var/log/ I cannot see any file named mcelog. Some old posts floating around the Internet recommend redirecting mcelog to some output file, i.e. /usr/sbin/mcelog > mcelog.out but this didn't work for me. Make sure you have the mcelog package (as it is called in Arch) installed . To enable the daemon in systemd, systemctl enable mcelog. When running mcelog on a Linux machine running systemd instead of the old syslog, you need to make some changes to /etc/mcelog/mcelog.conf
What led me astray was the Archwiki page on MCE Handling, which recommends uncommenting the line
syslog = yes
If you are running systemd you do NOT want the above setting! The problem is that systemd handles system logging through journalctl. You can follow the other suggestions in the Archwiki to run mcelog as a daemon (daemon = yes), but make sure the syslog lines are commented out. Also you need to specify an output log file for mce errors by uncommenting the following in /etc/mcelog/mcelog.conf:
logfile = /var/log/mcelog
Also uncomment the following in /etc/mcelog/mcelog.conf
run-credentials-user = root
Restart the mcelog service
systemctl restart mcelog
Next time a Machine Check Event occurs, it will be written to /var/log/mcelog. Here is some sample output:
Hardware event. This is not a software error.
MCE 0
CPU 0 BANK 5
MISC b8a0000086 ADDR ffb07500
TIME 1434034181 Thu Jun 11 23:49:41 2015
MCG status:
MCi status:
Error overflow
Uncorrected error
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: corrected filtering (some unreported errors in same region)
Generic CACHE Level-2 Generic Error
STATUS ee0000000040110a MCGSTATUS 0
MCGCAP c07 APICID 0 SOCKETID 0
CPUID Vendor Intel Family 6 Model 69
Hardware event. This is not a software error.
MCE 0
CPU 0 BANK 5
MISC 78a0000086 ADDR ffb07500
Hardware event. This is not a software error.
MCE 0
CPU 0 BANK 5
MISC b8a0000086 ADDR ffb07500
TIME 1434034181 Thu Jun 11 23:49:41 2015
MCG status:
MCi status:
Error overflow
Uncorrected error
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: corrected filtering (some unreported errors in same region)
Generic CACHE Level-2 Generic Error
STATUS ee0000000040110a MCGSTATUS 0
MCGCAP c07 APICID 0 SOCKETID 0
CPUID Vendor Intel Family 6 Model 69
Hardware event. This is not a software error.
MCE 0
CPU 0 BANK 5
MISC 78a0000086 ADDR ffb07500
This seems to be indicating a memory error in the CPU cache.
Here is my /etc/mcelog/mcelog.conf file:
Here is my /etc/mcelog/mcelog.conf file:
Thanks for the info.Very neat and concise.
답글삭제