-
Notifications
You must be signed in to change notification settings - Fork 0
Import of safte-monitor from http://oss.metaparadigm.com/safte-monitor/
andrewbasterfield/safte-monitor
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
safte-monitor - Linux SAF-TE SCSI enclosure monitor safte-monitor reads disk enclosure status information from SAF-TE capable enclosures (SCSI Accessible Fault Tolerant Enclosures). SAF-TE is common on many intelligent SCSI disk enclosures and some rackmount servers with SCSI hotswap drive bays. safte-monitor can monitor multiple SAF-TE devices and will automatically probe and detect them. The information retreived includes power supply, fan, temperature, audible alarm, drive faults, array critical / failed / rebuilding state and door lock status. safte-monitor logs changes in the status of these enclosure elements to syslog and can optionally execute an alert help program with details of the component failure. This could send a pager message for example. Temperature alert limits can also be set. safte-monitor is specifcally useful if you have equipment deployed in remote locations or unattended over weekends when no-one may hear an audible alarm. By default safte-monitor will run in the background and log to syslog. It can be run in one-shot scan mode to print found SAF-TE device info using the -p flag. It will also respond to web requests on port 8123 and display HTML output of enclosure status. usage: ./safte-monitor [-h] [-p] [-n] [-a] [-T] [-t <max_temp>] \ [-A <alert_prog>] -h show this help message -p print - print device scan information then exit -T log temperature changes -N alert for non critical state changes -A <f> program to run for alerts -t <n> max temperature (default 35.0 celcius) -n numeric sg device names eg. /dev/sg0 (default) -a alpha sg device names eg. /dev/sga By default temperatures and temperature limits are in Celcius. This can be changed to Farenheit by removing the -DUSE_CELCIUS from the Makefile. Example alert helper program: ----------------------------- #!/bin/sh echo ALERT device=$1 message=$2 system=$3 partno=$4 code=$5 | \ mail -s 'safte alert' root Example output from a scan: --------------------------- # ./safte-monitor -p SAF-TE Device ESG-SHV SCA HSBP M10 (0:0:6:0) no. of fans = 0 no. of power supplies = 1 no. of device slots = 4 door lock installed = 0 no. of temp sensors = 1 audible alarm = 0 no. of thermostats = 0 power supply 0 is okay and on device slot 0 disk present,active,unconfigured device slot 1 insert ready device slot 2 insert ready device slot 3 insert ready temp sensor 0 is 25.0 celcius and okay overall temperature is okay SAF-TE Device CNSi G8324 (2:0:0:50) no. of fans = 4 no. of power supplies = 2 no. of device slots = 6 door lock installed = 1 no. of temp sensors = 3 audible alarm = 1 no. of thermostats = 0 power supply 0 is okay and on power supply 1 is okay and on fan 0 is operational fan 1 is operational fan 2 is operational fan 3 is operational device slot 0 disk not present,no error device slot 1 disk not present,no error device slot 2 disk not present,no error device slot 3 disk present,no error device slot 4 disk present,no error device slot 5 disk present,no error door lock is locked speaker is off temp sensor 0 is 25.6 celcius and okay temp sensor 1 is 25.6 celcius and okay temp sensor 2 is 21.7 celcius and okay overall temperature is okay Example syslog messages: ------------------------ a temperature change (if option is selected to log temp changes) SAF-TE Device CNSi G8324 (2:0:0:52): temp sensor 2 changed from '23.9 degrees' to '22.8 degrees' a power supply failing SAF-TE Device CNSi G8324 (2:0:0:51): power supply 1 changed from 'okay and on' to 'malfunctioning and commanded on' SAF-TE Device CNSi G8324 (2:0:0:51): ALERT power supply 1 malfunctioning and commanded on the power supply back up again SAF-TE Device CNSi G8324 (2:0:0:51): power supply 1 changed from 'malfunctioning and commanded on' to 'okay and on' a drive in an array fails SAF-TE Device CNSi G8324 (2:0:0:51): device slot 4 changed from 'disk present,no error' to 'disk present,critical array' SAF-TE Device CNSi G8324 (2:0:0:51): ALERT device slot 4 is disk present,critical array SAF-TE Device CNSi G8324 (2:0:0:51): device slot 5 changed from 'disk present,no error' to 'disk present,faulty' SAF-TE Device CNSi G8324 (2:0:0:51): ALERT device slot 5 is disk present,faulty after a rebuild has been started SAF-TE Device CNSi G8324 (2:0:0:51): device slot 4 changed from 'disk present,critical array' to 'disk present,rebuilding,critical array' SAF-TE Device CNSi G8324 (2:0:0:51): ALERT device slot 4 is disk present,rebuilding,critical array SAF-TE Device CNSi G8324 (2:0:0:51): device slot 5 changed from 'disk present,faulty' to 'disk present,rebuilding,critical array' SAF-TE Device CNSi G8324 (2:0:0:51): ALERT device slot 5 is disk present,rebuilding,critical array and now the array is okay again SAF-TE Device CNSi G8324 (2:0:0:51): device slot 4 changed from 'disk present,rebuilding,critical array' to 'disk present,no error' SAF-TE Device CNSi G8324 (2:0:0:51): device slot 5 changed from 'disk present,rebuilding,critical array' to 'disk present,no error' Bugs ---- * Sometimes doesn't probe devices on first startup if sg module isn't loaded Please send bug reports to michael@metaparadigm.com Anonymous CVS ------------- # export CVSROOT=:pserver:anoncvs@cvs.metaparadigm.com:/cvsroot # cvs login Logging in to :pserver:anoncvs@cvs.metaparadigm.com:2401/cvsroot CVS password: <enter 'anoncvs'> # cvs co safte-monitor To do ----- * Make celcius/farenheit a command line option. * Set temp limits seperately for different enclosures or individual sensors. * implement SNMP traps for alerts - this could be done with an external alert helper program. * Add a remote interface for checking status (SNMP) * Add mon compatible service test agent * Finish off autoconfigure support * Make safte-monitor dynamically rescan for safte devices after startup to eliminate the need for a restart to the daemon * Integrate with software RAID The SAF-TE spec can be found here: http://www.intel.com/design/servers/ipmi/saf-te.htm Copyright Metaparadigm Pte. Ltd. 2001-2005. Author: Michael Clark <michael@metaparadigm.com> This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License version 2 as published by the Free Software Foundation.
About
Import of safte-monitor from http://oss.metaparadigm.com/safte-monitor/
Resources
Stars
Watchers
Forks
Packages 0
No packages published