Monitoring HDD temperature
While the Antec P180 case is a favourite amongst enthusiasts it really is a poor option for a RAID array, the bottom cage especially. It houses 4 drives and has a single fan (in my case the stock fan was replaced with a Scythe SFLEX fluid bearing fan for lower noise – the server & desktop under my desk are virtually noiseless as I like them – but that’s another post) in between the cage and the power supply, with limited space for a lot of cabling. The fan also have no grills on either side. I have already lost one 500GB Seagate drive (full of data) when a cable got stuck in the fan and caused the drives to overheat. The bottom cage has no ventilation at all when the fan stops working for one or another reason. It really should give you the ability to add another fan in front of the cage, like the the center cage. Perhaps I can jury rig something.
This prompted me to create a 3x500GB linux software RAID5, using all WD RE2 drives this time around. So far so good, until last night on a whim I checked hdd temperatures (as I do from time to time – call me paranoid) and found the 3 drives in the bottom cage sitting at 60C!! I dove under my desk and found that the bottom fan was not working, looks like the power connector got bumped out when I cleaned up a bit there with velcro ties recently after upgrading some hardware. No idea how long they’ve been running like that, could be up to a few days 😦 Short of replacing the case with some better suited, for the short term I thought some pro active monitoring was in order. I installed hddtemp with yum (no need to parse the response otherwise could have just used smartmontools instead) and whipped up a quick script which I run through cron every 15 minutes:
[root@gatekeeper bin]# crontab -l 15,30,45,59 * * * * /usr/local/bin/monitorhddtemp.sh [root@gatekeeper bin]# cat monitorhddtemp.sh #!/bin/bash HDDS="/dev/sda /dev/sdb /dev/sdc /dev/sdd" HDT=/usr/sbin/hddtemp LOG=/usr/bin/logger DOWN=/sbin/shutdown ALERT_LEVEL=40 SHUTDOWN_LEVEL=55 for disk in $HDDS do if [ -b $disk ]; then HDTEMP=$($HDT -n $disk) if [ $HDTEMP -ge $ALERT_LEVEL ]; then $LOG "hard disk : $disk temperature $HDTEMP°C crossed its alert limit" echo "hard disk : $disk temperature $HDTEMP°C crossed its alert limit" | mail -s "HDD TEMPERATURE WARNING" your@email.here fi if [ $HDTEMP -ge $SHUTDOWN_LEVEL ]; then $LOG "System going down as hard disk : $disk temperature $HDTEMP°C crossed its critical limit" sync;sync $DOWN -h 0 fi fi done
This script will email me when any of the listed hdd’s temperature exceeds the alert level (40C) and shutdown the machine when they reach over (55C – manufacturer’s operating limit)
No comments yet.
-
Recent
- Kernel panics on Leopard
- Resizing off screen windows in OS X (Leopard)
- H264 quality differences in players
- ImageMagick on Leopard
- Automount on Leopard
- Monitoring HDD temperature
- Eclipse Europa and Leopard problems
- Macbook Pro wireless woes
- Budget reconciliation … UNIX style
- Load testing AJAX with JMeter
- MySQL undefined “user”
- SSH throughput
-
Links
-
Archives
- June 2008 (2)
- March 2008 (1)
- February 2008 (1)
- November 2007 (3)
- March 2007 (3)
- February 2007 (3)
- January 2007 (3)
- December 2006 (3)
- November 2006 (3)
-
Categories
-
RSS
Entries RSS
Comments RSS
Leave a comment