|author||Thomas Mingarelli <firstname.lastname@example.org>||2009-06-04 19:50:45 +0000|
|committer||Wim Van Sebroeck <email@example.com>||2009-06-18 07:32:06 +0000|
[WATCHDOG] hpwdt: Add NMI sourcing
Add NMI sourcing functionality (Can only be active if nmi_watchdog is inactive). Signed-off-by: Thomas Mingarelli <firstname.lastname@example.org> Signed-off-by: Wim Van Sebroeck <email@example.com>
Diffstat (limited to 'Documentation/watchdog')
1 files changed, 84 insertions, 0 deletions
diff --git a/Documentation/watchdog/hpwdt.txt b/Documentation/watchdog/hpwdt.txt
new file mode 100644
@@ -0,0 +1,84 @@
+Last reviewed: 06/02/2009
+ HP iLO2 NMI Watchdog Driver
+ NMI sourcing for iLO2 based ProLiant Servers
+ Documentation and Driver by
+ Thomas Mingarelli <firstname.lastname@example.org>
+ The HP iLO2 NMI Watchdog driver is a kernel module that provides basic
+ watchdog functionality and the added benefit of NMI sourcing. Both the
+ watchdog functionality and the NMI sourcing capability need to be enabled
+ by the user. Remember that the two modes are not dependant on one another.
+ A user can have the NMI sourcing without the watchdog timer and vice-versa.
+ Watchdog functionality is enabled like any other common watchdog driver. That
+ is, an application needs to be started that kicks off the watchdog timer. A
+ basic application exists in the Documentation/watchdog/src directory called
+ watchdog-test.c. Simply compile the C file and kick it off. If the system
+ gets into a bad state and hangs, the HP ProLiant iLO 2 timer register will
+ not be updated in a timely fashion and a hardware system reset (also known as
+ an Automatic Server Recovery (ASR)) event will occur.
+ The hpwdt driver also has three (3) module parameters. They are the following:
+ soft_margin - allows the user to set the watchdog timer value
+ allow_kdump - allows the user to save off a kernel dump image after an NMI
+ nowayout - basic watchdog parameter that does not allow the timer to
+ be restarted or an impending ASR to be escaped.
+ NOTE: More information about watchdog drivers in general, including the ioctl
+ interface to /dev/watchdog can be found in
+ Documentation/watchdog/watchdog-api.txt and Documentation/IPMI.txt.
+ The NMI sourcing capability is disabled when the driver discovers that the
+ nmi_watchdog is turned on (nmi_watchdog = 1). This is due to the inability to
+ distinguish between "NMI Watchdog Ticks" and "HW generated NMI events" in the
+ Linux kernel. What this means is that the hpwdt nmi handler code is called
+ each time the NMI signal fires off. This could amount to several thousands of
+ NMIs in a matter of seconds. If a user sees the Linux kernel's "dazed and
+ confused" message in the logs or if the system gets into a hung state, then
+ the user should reboot with nmi_watchdog=0.
+ 1. If the kernel has not been booted with nmi_watchdog turned off then
+ edit /boot/grub/menu.lst and place the nmi_watchdog=0 at the end of the
+ currently booting kernel line.
+ 2. reboot the sever
+ Now, the hpwdt can successfully receive and source the NMI and provide a log
+ message that details the reason for the NMI (as determined by the HP BIOS).
+ Below is a list of NMIs the HP BIOS understands along with the associated
+ code (reason):
+ No source found 00h
+ Uncorrectable Memory Error 01h
+ ASR NMI 1Bh
+ PCI Parity Error 20h
+ NMI Button Press 27h
+ SB_BUS_NMI 28h
+ ILO Doorbell NMI 29h
+ ILO IOP NMI 2Ah
+ ILO Watchdog NMI 2Bh
+ Proc Throt NMI 2Ch
+ Front Side Bus NMI 2Dh
+ PCI Express Error 2Fh
+ DMA controller NMI 30h
+ Hypertransport/CSI Error 31h
+ -- Tom Mingarelli