Thermal Shutdown and disk/by-id

Got any great tips or tricks you're dying to share with the rest of the world? Then please post them here.
They don't have to be specifically related to building a media server but please do try to keep them suitably "geeky".
Beeblebear
Member
Member
Posts: 8
Joined: July 31st, 2012, 9:55 pm

Thermal Shutdown and disk/by-id

Postby Beeblebear » August 1st, 2012, 10:40 am

Last night, I posted a thread about modifying the Thermal Shutdown script to use UUIDs instead of adapter position (hda, sda, etc.), in order to prevent drive reassignment on boot-up from messing things up:

http://forum.havetheknowhow.com/viewtopic.php?p=1991#p1991

I have, however thought of a better solution, which will make the output for logging and email content much more readable and informative.

To do this, I have used the contents of the dev/disk/by-id folder.

Firstly, in a terminal, I enter:

Code: Select all

~$ ls -l /dev/disk/by-id
total 0
llrwxrwxrwx 1 root root  9 Aug  1 09:13 ata-Hitachi_HDS724040ALE640_PK1310PAG0VMBJ -> ../../sdb
lrwxrwxrwx 1 root root 10 Aug  1 09:13 ata-Hitachi_HDS724040ALE640_PK1310PAG0VMBJ-part1 -> ../../sdb1
lrwxrwxrwx 1 root root  9 Aug  1 09:13 ata-MATSHITABD-MLT_UJ240AS_WJ42_003694 -> ../../sr0
lrwxrwxrwx 1 root root  9 Aug  1 09:13 ata-OCZ-NOCTI_OCZ-F412PBYMZ7MZ4E6W -> ../../sdc
lrwxrwxrwx 1 root root 10 Aug  1 09:13 ata-OCZ-NOCTI_OCZ-F412PBYMZ7MZ4E6W-part1 -> ../../sdc1
lrwxrwxrwx 1 root root  9 Aug  1 09:13 ata-SAMSUNG_SSD_830_Series_S0XZNEAC711934 -> ../../sda
lrwxrwxrwx 1 root root 10 Aug  1 09:13 ata-SAMSUNG_SSD_830_Series_S0XZNEAC711934-part1 -> ../../sda1
lrwxrwxrwx 1 root root 10 Aug  1 09:13 ata-SAMSUNG_SSD_830_Series_S0XZNEAC711934-part2 -> ../../sda2
lrwxrwxrwx 1 root root 10 Aug  1 09:13 ata-SAMSUNG_SSD_830_Series_S0XZNEAC711934-part3 -> ../../sda3
lrwxrwxrwx 1 root root 10 Aug  1 09:13 dm-name-Server-root -> ../../dm-0
lrwxrwxrwx 1 root root 10 Aug  1 09:13 dm-name-Server-swap_1 -> ../../dm-1
lrwxrwxrwx 1 root root 10 Aug  1 09:13 dm-name-Server-System -> ../../dm-2
lrwxrwxrwx 1 root root 10 Aug  1 09:13 dm-uuid-LVM-Z8LZg70hTKbj7AoTEU12IP81IeP5fgLdb9h3Rs23fJ2io8zxPjbpedP4eUrC3OVw -> ../../dm-2
lrwxrwxrwx 1 root root 10 Aug  1 09:13 dm-uuid-LVM-Z8LZg70hTKbj7AoTEU12IP81IeP5fgLdG5s3dBd4rFdt34hdJoyhJxB5oCw7l6RI -> ../../dm-1
lrwxrwxrwx 1 root root 10 Aug  1 09:13 dm-uuid-LVM-Z8LZg70hTKbj7AoTEU12IP81IeP5fgLdVJQezcxe7NQVplQeVfQFMlqY4dAwq73D -> ../../dm-0
lrwxrwxrwx 1 root root  9 Aug  1 09:13 scsi-SATA_Hitachi_HDS7240_PK1310PAG0VMBJ -> ../../sdb
lrwxrwxrwx 1 root root 10 Aug  1 09:13 scsi-SATA_Hitachi_HDS7240_PK1310PAG0VMBJ-part1 -> ../../sdb1
lrwxrwxrwx 1 root root  9 Aug  1 09:13 scsi-SATA_OCZ-NOCTI_OCZ-F412PBYMZ7MZ4E6W -> ../../sdc
lrwxrwxrwx 1 root root 10 Aug  1 09:13 scsi-SATA_OCZ-NOCTI_OCZ-F412PBYMZ7MZ4E6W-part1 -> ../../sdc1
lrwxrwxrwx 1 root root  9 Aug  1 09:13 scsi-SATA_SAMSUNG_SSD_830S0XZNEAC711934 -> ../../sda
lrwxrwxrwx 1 root root 10 Aug  1 09:13 scsi-SATA_SAMSUNG_SSD_830S0XZNEAC711934-part1 -> ../../sda1
lrwxrwxrwx 1 root root 10 Aug  1 09:13 scsi-SATA_SAMSUNG_SSD_830S0XZNEAC711934-part2 -> ../../sda2
lrwxrwxrwx 1 root root 10 Aug  1 09:13 scsi-SATA_SAMSUNG_SSD_830S0XZNEAC711934-part3 -> ../../sda3
lrwxrwxrwx 1 root root  9 Aug  1 09:13 wwn-0x5000cca22bc063f2 -> ../../sdb
lrwxrwxrwx 1 root root 10 Aug  1 09:13 wwn-0x5000cca22bc063f2-part1 -> ../../sdb1
lrwxrwxrwx 1 root root  9 Aug  1 09:13 wwn-0x5002538043584d30 -> ../../sda
lrwxrwxrwx 1 root root 10 Aug  1 09:13 wwn-0x5002538043584d30-part1 -> ../../sda1
lrwxrwxrwx 1 root root 10 Aug  1 09:13 wwn-0x5002538043584d30-part2 -> ../../sda2
lrwxrwxrwx 1 root root 10 Aug  1 09:13 wwn-0x5002538043584d30-part3 -> ../../sda3
lrwxrwxrwx 1 root root  9 Aug  1 09:13 wwn-0x5e83a97edd3455aa -> ../../sdc
lrwxrwxrwx 1 root root 10 Aug  1 09:13 wwn-0x5e83a97edd3455aa-part1 -> ../../sdc1


I'm after the ID of the physical drive sdb here, for my purposes, so the line;

lrwxrwxrwx 1 root root 9 Aug 1 09:13 scsi-SATA_Hitachi_HDS7240_PK1310PAG0VMBJ -> ../../sdb

is what I'm looking for.

The path to this symbolic link is therefore:

/dev/disk/by-id/scsi-SATA_Hitachi_HDS7240_PK1310PAG0VMBJ

So the modified Thermal Shutdown script for this method will look like:

Code: Select all

#!/bin/bash

#PURPOSE: Script to check temperature of installed hard drives and report/shutdown if specified temperatures exceeded
#
# Modified for this server!!
#
# AUTHOR: feedback[AT]HaveTheKnowHow[DOT]com

# Expects three arguments:
#    1. Warning temperature
#    2. Critical shutdown temperature
#    3. If argument 3 is present then just check that drive letter
#    eg. using ./DriveTemps.sh 35 45
#    will warn when temperature of one or more drives reaches 35degrees and shutdown when any one of them hits 45
#    eg. using ./DriveTemps.sh 35 45 c
#    will warn when temperature of drive sdc reaches 35degrees and shutdown when it hits 45

# NOTES:
#  Change the string ">>/home/htkh" as required
#  Substitute string "myemail@myaddress.com" with your own email address in the string which starts "/usr/sbin/ssmtp myemail@myaddress.com"
#  Change the command   MyList='a b c d e' to the number of drives you have. In this case I'm using 6 drives

# Assumes  /usr/sbin/smartctl -n standby -a /dev/sd$i returns the string 'Temperature_Celsius' somewhere

echo "JOB RUN AT $(date)"
echo '============================'
echo ''
echo 'Drive Warning Limit set to =>' $1
echo 'Drive Shutdown Limit set to =>' $2
echo ''
echo ''

if [ $# -eq 2 ]
 then
  MyList='scsi-SATA_Hitachi_HDS7240_PK1310PAG0VMBJ'
  echo 'Testing all drives'
  else
   MyList=($3)
   echo 'Testing only the system drive'
fi

echo ''

for i in $MyList
 do
  echo 'Drive /dev/disk/by-id/'$i
  /usr/sbin/smartctl -n standby -a /dev/disk/by-id/$i | grep Temperature_Celsius
 done

echo ''
echo ''

for i in $MyList
 do
  #Check state of drive 'active/idle' or 'standby'
  stra=$(/sbin/hdparm -C /dev/disk/by-id/$i | grep 'drive' | awk '{print $4}')

  echo 'Testing Drive with ID: '$i

  if [ ${stra} = 'standby' ]
   then
    echo '    Drive with ID: '$i ' s in standby'
    echo ''
   else

    str1='/usr/sbin/smartctl -n standby -a /dev/disk/by-id/'$i
    str2=$($str1 | grep Temperature_Celsius | awk '{print $10}')

   if [ ${str2} -ge $1 ]
    then

     echo '========================================'                             >>/home/server/Logs/DriveWarning.Log
     echo $(date)                                                                >>/home/server/Logs/DriveWarning.Log
     echo ''                                                                     >>/home/server/Logs/DriveWarning.Log
     echo 'WARNING: TEMPERATURE FOR DRIVE with ID: '$i 'EXCEEDED' $1 '=>' $str2  >>/home/server/Logs/DriveWarning.Log
     echo ''                                                                     >>/home/server/Logs/DriveWarning.Log
     echo '========================================'                             >>/home/server/Logs/DriveWarning.Log

     echo '========================================'
     echo $(date)
     echo ''
     echo 'WARNING: TEMPERATURE FOR DRIVE with ID: '$i 'EXCEEDED' $1 '=>' $str2
     echo ''
     echo '========================================'

    fi

    if [ ${str2} -ge $2 ]
     then

      echo '========================================'                             >>/home/server/Logs/DriveWarning.Log
      echo $(date)                                                                >>/home/server/Logs/DriveWarning.Log
      echo ''                                                                     >>/home/server/Logs/DriveWarning.Log
      echo 'CRITICAL: TEMPERATURE FOR DRIVE with ID: '$i 'EXCEEDED' $2 '=>' $str2 >>/home/server/Logs/DriveWarning.Log
      echo ''                                                                     >>/home/server/Logs/DriveWarning.Log
      echo '========================================'                             >>/home/server/Logs/DriveWarning.Log

      echo '========================================'
      echo $(date)
      echo ''
      echo 'CRITICAL: TEMPERATURE FOR DRIVE with ID: '$i 'EXCEEDED' $2 '=>' $str2
      echo ''
      echo '========================================'

      /usr/sbin/pm-hibernate
      /usr/sbin/ssmtp ******@****** </home/server/Logs/DriveWarning.Log
      echo 'Email Sent.....'
      exit
     else

      echo ''
      echo '    Temperature of Drive with ID: '$i' is OK at =>' $str2
      echo ''
    fi
   fi
  done

echo 'All Drives are within limits'
echo ''

As you can see, the script is set to hibernate the system instead of shutting down and sends all output (Warning as well as Critical) to the log file.
To enable the hibernation feature, just install pm-utils:

Code: Select all

sudo apt-get install pm-utils

Running the script should give a similar result to this:

Code: Select all

server@Server:~/Scripts$ sudo ./DriveTempShutdown.sh 40 55
[sudo] password for server:
JOB RUN AT Sat Aug  4 19:32:44 BST 2012
============================

Drive Warning Limit set to => 40
Drive Shutdown Limit set to => 55


Testing all drives

Drive /dev/disk/by-id/scsi-SATA_Hitachi_HDS7240_PK1310PAG0VMBJ
194 Temperature_Celsius     0x0002   139   139   000    Old_age   Always       -       43 (Min/Max 22/47)


Testing Drive with ID: scsi-SATA_Hitachi_HDS7240_PK1310PAG0VMBJ
========================================
Sat Aug 4 19:32:47 BST 2012

WARNING: TEMPERATURE FOR DRIVE with ID: scsi-SATA_Hitachi_HDS7240_PK1310PAG0VMBJ EXCEEDED 40 => 43

========================================

    Temperature of Drive with ID: scsi-SATA_Hitachi_HDS7240_PK1310PAG0VMBJ is OK at => 43

All Drives are within limits

All working!

I hope that this as useful to others as I find it to be. Changing to the disk/by-id/ symlink prevents the script from breaking on a hardware change and also tells exactly which drive has overheated.
Further improvements could be made by perhaps grep-ing the drive label to give sd#; ie: scsi-SATA_Hitachi_HDS7240_PK1310PAG0VMBJ -> ../../sdb. Maybe adding something along the lines of: ls -l /dev/disk/by-id | grep $i might work? Giving volume label would also be useful; perhaps this could be grep-ed from the result of the last function (ie. sdb):

Code: Select all

server@Server:~/Scripts$ ls -l /dev/disk/by-label
total 0
lrwxrwxrwx 1 root root 10 Aug  4 18:46 4TB_Storage -> ../../sdb1
lrwxrwxrwx 1 root root 10 Aug  4 18:46 Recordings -> ../../sdc1

This is, of course, all very unnecessary and tends to make everything more complicated, but every little bit of info helps us to deal with problems more quickly.
Food for thought, anyway. :idea:

User avatar
Ian
Moderator
Posts: 751
Joined: January 1st, 2011, 8:00 am

Re: Thermal Shutdown and disk/by-id

Postby Ian » August 5th, 2012, 5:16 pm

Nice tip, thank you :clap:


Return to “Tips 'n' Tricks”