Wednesday, November 27, 2013

Alarm on high non-global zone memory usage

We use a lot of Solaris non-global zones for various reasons. Our tools admin enabled memory usage alarms on these ng zones and they alerted incorrectly (due to the misleading internal resource stats). We still use native resource counters to alarm on global zones, but scripted the following bit of perl to alarm when ng zones exceed a defined threshold:

use strict;
use warnings;
use Time::localtime;

my $ram_threshold = "90";

my $zones = `prstat -Z 1 1 | /usr/sfw/bin/ggrep -A 99 ZONEID | egrep -vi 'total|global|zoneid' | awk '{ print \$5","\$8 }'`;
my @zoneSplit = split("\n",$zones);

foreach my $zone (@zoneSplit) {
                my @zone_info = split(",",$zone);
                my $zone_ram = substr($zone_info[0], 0, -1);
                my $zone_name = $zone_info[1];

                if ($zone_ram gt $ram_threshold) {
                                $ConsoleMessage->Severity("Normal");
                                $ConsoleMessage->Application("zone_pool");
                                $ConsoleMessage->Object("object");
                                $ConsoleMessage->MsgText("$zone_name memory exceeded $ram_threshold, value is $zone_ram");
                                $ConsoleMessage->Send();
                }

}

Tuesday, November 19, 2013

Catalogue your UNIX/Linux servers via a shell script

Taking my last post one step further, I've written a script that cuts up my entire UNIX/Linux server list, and outputs flat files containing all HP servers, Linux servers, V100 or V125 SunFire servers, Dell, HP Proliant servers, all offline and online servers, and servers by business unit.

This is incredibly useful for automating a lot my work, as often I will need to (for example) disable a service on my Dell Linux servers (but not my HP Linux servers). Configuration changes, etc.

For those wondering, yes I have setup a puppet master server, and hope to manage all 370+ branch servers with this soon.

Here's the script:


#!/bin/bash

# Delete our server lists
rm -f ../server_lists/*

# Remove any non branch servers from the list

cat serverlist | egrep -vi '^cusrv|^musrv|^dusrv' > serverlist.tmp
mv serverlist.tmp serverlist

for x in $(cat serverlist)

do
  echo ""
  echo ${x}:

  # Ping host

  hostAlive=$(ping ${x} 2)

  case "$hostAlive" in

    *alive*)  echo "Host is online, checking OS..."
  
          # Add to online list
          echo ${x} >> "../server_lists/online"
  
          # Determine Business Unit
          bu=$(echo ${x} | awk '{print substr($0,0,2)}')

          case $bu in

            cg|CG)  # Add to CG list 
                echo ${x} >> "../server_lists/cg"
                ;;

            cd|CD)  # Add to CD list

                echo ${x} >> "../server_lists/cd"
                ;;

            tl|TL)  # Add to TL list

                echo ${x} >> "../server_lists/tl"
                ;;

            mp|MP)  # Add to MP list

                echo ${x} >> "../server_lists/mp"
                ;;

            mt|MT)  # Add to MT list

                echo ${x} >> "../server_lists/mt"
                ;;

            cy|CY)  # Add to CY list

                echo ${x} >> "../server_lists/cy"
                ;;

            *)    # Add to other list

                echo ${x} >> "../server_lists/other"
                ;;
          esac

          # Check operating system

          os=$(ssh -oStrictHostKeyChecking=no root@${x} "uname")

          case $os in


            SunOS)  echo "SunOS detected"


            # Add to Solaris list

            echo ${x} >> "../server_lists/solaris"  

            # Check hardware

            hardware=$(ssh -oStrictHostKeyChecking=no root@${x} "/usr/sbin/prtdiag | head -1 | cut -d: -f2")

            case $hardware in

              *V100*)  echo "V100 detected"
                  echo ${x} >> "../server_lists/v100"  
                  ;;
            
              *V125*)  echo "V125 detected"
                  echo ${x} >> "../server_lists/v125"
                  ;;
            esac
            ;;

            Linux)  echo "Linux detected"


                # Add to Linux list

                echo ${x} >> "../server_lists/linux"

                # Check hardware

                hardware=$(ssh -oStrictHostKeyChecking=no root@${x} "dmidecode | grep -i vendor | cut -d: -f2")
                # ANOTHER METHOD - "lshal | grep system\.hardware\.product | cut -d\' -f2")

                case $hardware in

                  *HP*)  echo "HP ProLiant detected"
                      echo ${x} >> "../server_lists/proliant"
                      ;;

                  *Dell*) echo "Dell detected"

                      echo ${x} >> "../server_lists/dell"
                      ;;
                esac  
                ;;

            *)      echo "Some other os detected, not interested"

                ;;

          esac

          ;;

    *unknown*)  echo "Unable to reach host"


          # Add to offline list

          echo ${x} >> "../server_lists/offline"
          ;;

    *answer*)  echo "Unable to reach host"


    # Add to offline list

          echo ${x} >> "../server_lists/offline"
          ;;

    *)    echo "Unable to reach host"


        # Add to offline list

        echo ${x} >> "../server_lists/offline"
        ;;
  esac
done

The output looks like:

# ls -la
drwxr-xr-x   2 root     other        512 Nov 19 16:01 ./
drwxr-xr-x  15 root     other        512 Nov 19 11:47 ../
-rw-r--r--   1 root     other         84 Nov 19 15:57 cd
-rw-r--r--   1 root     other         62 Nov 19 15:57 cg
-rw-r--r--   1 root     other        532 Nov 19 16:00 dell
-rw-r--r--   1 root     other       1738 Nov 19 16:10 linux
-rw-r--r--   1 root     other       1617 Nov 19 16:00 mp
-rw-r--r--   1 root     other         56 Nov 19 16:00 mt
-rw-r--r--   1 root     other       1272 Nov 19 16:02 offline
-rw-r--r--   1 root     other       8732 Nov 19 16:10 online
-rw-r--r--   1 root     other       1175 Nov 19 16:10 proliant
-rw-r--r--   1 root     other       6963 Nov 19 16:10 solaris
-rw-r--r--   1 root     other       6913 Nov 19 16:10 tl
-rw-r--r--   1 root     other       6498 Nov 19 16:10 v100
-rw-r--r--   1 root     other        465 Nov 19 16:10 v125

Now if I want to run a command against all v125 UNIX servers for example, I can do something like:

# for x in $(cat v125); do ssh root@${x} "foo"; done

Shell script to determine OS

I look after hundreds of UNIX and Linux machines, and often need to roll a change out to them, but the change might be slightly different based on the server's operating system. I often use the following script to determine the OS, and execute the appropriate commands on each server (populate serverlist with your list of servers to check):

#!/bin/bash

for x in $(cat serverlist)
do
        echo Checking ${x}:

        # Determine OS

        os=$(ssh root@${x} "uname")

        case $os in

        SunOS)  echo "SunOS detected, doing something.."
                ssh -oStrictHostKeyChecking=no root@${x} "something Solaris specific"
                ;;

        Linux)  echo "Linux detected, doing something"
                ssh -oStrictHostKeyChecking=no root@${x} "something Linux specific"
                ;;

        *)      echo "Something unheard of"
                ;;

        esac

done

Thursday, November 14, 2013

Motivation via /etc/motd

It's about the time of year when everything and everyone bugs me. To amplify this typical November feeling, the company I work for was purchased a couple of years back, and are really ramping up the outsourcing (sorry....cost reduction initiative). So things generally suck anyhow.

To combat the feelings of despair, and total lack of motivation..I thought it would be a nice idea to remind myself how many days until I go on Christmas break. My main Ops server doesn't have a lot of GNU packages (like gnu date) so I'll use perl.



Script looks like this:


#!/usr/bin/perl
use strict;
use warnings;
use Time::Local;
use POSIX;

my @today = localtime();
my $time = timelocal(@today);

my @holiday = (0, 30, 15, 23, 11, 2013);
my $holidayTime = timelocal(@holiday);

print "\nDays Until Holidays = " . floor((((($holidayTime - $time) / 60) / 60) / 24)) . "\n\n";

exit 0;


Crontab looks like this:

# Update MOTD

0 1 * * * /usr/bin/perl /usr/local/scripts/motd/motd.pl > /etc/motd 2>&1


Output looks like this: