Wednesday, January 22, 2014

Ops Center - Orphaned LDom after live migration

UPDATE: We recently upgraded to 12.2 and I can confirm this issue is still present in 12.2. The work-around listed below still works thankfully...

Recently I was charged with upgrading the system firmware across our T4-4 fleet. One of the provisos was that no LDom could be shutdown during the upgrade. This of course meant I needed to lean heavily on Oracle Enterprise Ops Center's "live migration" feature. Which at best, is more of a warm migration.

We are still running 12c Update 2 (not for long hopefully), so there were many issues encountered. One of the more annoying issues was after each migration, the LDom appeared to not be associated with a CDom anymore. It was still associated, and nothing actually was broken, just the view from the BUI showed it sitting at the top of the All Assets tree. And given it's aparent location, I was unable to migrate it again, until the hierarchy display was resolved.

An easy fix I discovered was restarting the proxy service on our colocated EC/Proxy server.

# /opt/SUNWxvmoc/bin/proxyadm stop -vw
# /opt/SUNWxvmoc/bin/proxyadm start -vw

Refresh the browser, and it's fixed. I logged an SR about this, and aparently it is resolved in Update 4 and/or the upcoming "Diamond" release. The engineer I spoke to did mention that a proxyadm restart fixed a lot of various BUI display issues...

Extending a Linux Virtual Disk

Something that popped up at work the other day, figured it was worth sharing given it's probably a fairly common task for a Sysadmin these days.
Check current disk size/free space
# df -h

Shut the machine down
# shutdown -h now

Using Microsoft Virtual Machine Manager (2008 R2):
- Right click the machine, and click on Properties
- Select the Hardware Configuration tab
- Select the disk from the left hand pane
- check Expand virtual hard disk, enter the new disk size and click OK

It will take a minute to rewrite the configuration and expand the current fixed vhd. Once complete, power the machine on again.

Delete current partition table for disk:
# fdisk device (eg. /dev/sda1)
d (delete partition)
n (create new partition)
p (primary)
1 (partition 1)
Enter (use default)
Enter (use default)
w (write new label)

Reboot machine:
# shutdown -r now

Resize filesystem:
# resize2fs (eg. /dev/sda1)

Confirm your changes have worked:
# df -h the happy dance

Thursday, January 16, 2014

Solaris 9 shell script to find why a directory keeps filling up

We have thresholds set on certain file systems at work, and one continues to be exceeded at random intervals in the night. Rather than just bumping the threshold up, I thought I'd script something up to help understand what's happening.

The threshold was < 6GB available space on /tmp (hence the 629145 KB's within the if statement). It compares the last 10 minute snapshot (ls of /tmp) to the current snapshot, and outputs the difference to a file. Once space drops below 6GB, I receive an email with the last 50 lines of that difference file. It's not the cleanest code, but you get the picture:


if [ -f /var/tmp/de.tmp.1 ]

        ls -lh /tmp > /var/tmp/de.tmp.2
        date >> /var/tmp/de.tmp.diff
        echo -e '\n' >> /var/tmp/de.tmp.diff
        diff /var/tmp/de.tmp.1 /var/tmp/de.tmp.2 >> /var/tmp/de.tmp.diff
        echo -e '\n-----------------------------------------------\n' >> /var/tmp/de.tmp.diff
        rm -f /var/tmp/de.tmp.1
        rm -f /var/tmp/de.tmp.2
        ls -lh /tmp > /var/tmp/de.tmp.1

avail=$(df -k | grep \/tmp | awk '{print $4}')

if [ $avail -lt 6291456 ]
        emailMessage=$(tail -50 /var/tmp/de.tmp.diff)
        echo "$emailMessage" | mailx -s "/tmp on myserver less than 6291456 KB ($avail)" -r


Once I tested it a few times, I created a cron entry to mirror our alarm policy (execute every 10 minutes of every hour ):

0,10,20,30,40,50 * * * * /usr/local/bin/

Now to wait for it to happen, and analyse the email and /var/tmp/de.tmp.diff file.