Friday, November 7, 2014

Convert a list of decimal numbers into hex (Solaris/UNIX)

Another one liner that came in handy today:

for x in $(cat my_dec_list); do echo "obase=16; ${x}" | bc; done >> my_hex_list

Total size of all disk presented to a UNIX server

I needed a way to add up the total storage presented to a few UNIX machines. Here's what I came up with:

# iostat -En | grep Size | awk {'print $6'} | sed s/GB// | awk '{s+=$1} END {print "Total Storage: "s "GB"}'

The result is:

Total Storage: 8398.69GB

Wednesday, October 8, 2014

Send an email when a Windows process completes

I was running a process that would take a couple of days to complete and wanted to be notified via email when it did finish. The following batch file did the trick. Note: You will need to download and extract the opensource blat mail tool ( http://sourceforge.net/projects/blat/ ). The script uses a stop file, so that once it completes and emails you, it won't continue to email you (if you have the check scheduled for every 5 minutes for example):


@ECHO OFF
set ERRORLEVEL=""


IF EXIST c:\scripts\script.stop (

    echo "Stop file exists, exiting..."
    exit

) ELSE (

    tasklist /FI "IMAGENAME eq your_process.exe" 2>NUL | find /I /N "your_process">NUL
    
    if "%ERRORLEVEL%"=="0" (
        echo Program is running
    ) ELSE (

        c:\scripts\blat.exe -server YOUR_EMAIL_SERVER -port MAIL_PORT -f FROM_EMAIL_ADDRESS -t YOUR_EMAIL_ADDDRESS -s "YOUR SUBJECT" -body "YOUR BODY OF TEXT"
        echo. 2>c:\scripts\script.stop

    )
)

Wednesday, September 24, 2014

Move a file into a directory of the same name

The need arose recently at $work that required a script to search a directory for all PST files, create a directory with the same name of each file (excluding the file extension) and move that file into the directory of the same name.

This was accomplished using  powershell:

get-childitem . *.pst | `

foreach-object{
$cur=$_.BaseName
new-item -type directory $cur
move-item $_.name $cur
}

Display the start and finish time of a batch file

Not an overly difficult task, but something useful I now use when a batch file is required in Windows land.

To Display the start time of a script

@ECHO OFF
setlocal enableDelayedExpansion

echo Started: !DATE! - !TIME!

The enableDelayedExpansion line stops the script form expanding the date and time variables at compile time, and instead executes them real time. So you can add the above to the start of the script and the following at the end to display the date and time before script execution completes:

echo Finished: !DATE! - !TIME!

Tuesday, September 2, 2014

Display the second instance of a word

A quick post to share a handy command that displays the second instance of a word. For this example I was trying to quickly get the IP address for a list of switches by using nslookup and only printing the device name, and it's IP (not other details like dns server).

for x in `cat /tmp/switch.list`; do echo ${x}:; nslookup ${x} | awk '/Address/{i++}i==2'; done

Wednesday, July 16, 2014

Delegating Failover Cluster access

Recently I had to delegate Failover Cluster access to a non Domain Admin user. I couldn't find a lot of information on the subject, so after a little trial and error, I found the following works fine:

1) Right Click the cluster in Failover Cluster and select Properties

2) Select Cluster Permissions tab

3) Add the user or group that requires access

4) Select the newly added user/group and check "Read", then click Ok

5) RDP to each of the cluster nodes and grant this user (or group) Local Admin access.

6) Do the happy dance.

Monday, May 26, 2014

in.dhcpd[1099]: [ID 603263 daemon.notice] No more IP addresses

I came across this one at $work on a Solaris 9 server, and I must admit I was a little stumped. The full error is something like:

in.dhcpd[1099]: [ID 603263 daemon.notice] No more IP addresses on 192.168.254.0 network ()

When a additional SunRay terminal was plugged into the network. These are SunRay servers so I checked out how utadm was configured:


# utadm -l
LAN connections: 0ff
Subnetwork: 192.168.254.0
        Interface=      dmfe1 (192.168.254.1)
        Netmask=        255.255.255.0
        Broadcast=      192.168.254.255
        Router=         192.168.254.1
        AuthSrvr=       192.168.254.1
        FirmwareSrvr=   192.168.254.1
        NewTver=        3.1_120879-06_2007.03.13.15.14
        IP assignment=  10/239 (192.168.254.100)


Notice the last line, it specified the network - 192.168.254.100

So I checked DHCP leases:

#  pntadm -P 192.168.254.100

I found a bunch of leases displaying a lease expiration of 10+ years ago. They looked like:


0100144F6F70EE  00      192.168.254.194 10.16.16.40     04/02/2004       SunRay-dmfe1


So the fix, was to remove the leases that are now no longer in use and re-add the address back to the DHCP table for a new lease:

(Note: Where 192.168.254.100 is the network, and SunRay-dmfe1 is the DHCP macro code)

Ping the list to make sure they're not active:

# for x in $(pntadm -P 192.168.254.100 | grep 2004 | cut -f 3); do echo ${x}:; ping ${x} 2; done

If no replies, delete and re-add the IP address:

# for x in $(pntadm -P 192.168.254.100 | grep 2004 | cut -f 3); do echo Removing ${x}:; pntadm -D ${x} 192.168.254.100; echo Adding ${x}:; pntadm -A ${x} -m SunRay-dmfe1 192.168.254.100; done

Had the site plug the new terminal back in to the network and it now works!

Happy days...


Tuesday, April 15, 2014

RDP to Windows 2012 Server - The Local Security Authority cannot be contacted.

I recently came across a problem where if you attempted to RDP to a Windows 2012 server you received the following error:

An authentication error has occurred.
The Local Security Authority cannot be contacted

Remote computer: hostname


Turns out by default we were building 2012 servers with NLA turned on. This caused a few issues with RDP connections from our VPN support accounts, and of course RDP'ing to machines when your password has expired (or set to 'User must change password at next login').

Now this is a security vs. convenience trade-off so you need to decide if turning this off is the right thing to do in your environment. For us, turning it off on a couple of key management servers would reduce the nightmare of admins being prevented from logging in when their passwords expire.


To do this, Windows+R (to get a run box) and execute sysdm.cpl This will open the System Properties screen. Click the Remote tab at the top, uncheck "Allow connections only from computers running Remote Desktop with network Level Authentication (Recommended)" and click Okay.


Wednesday, April 2, 2014

Deploying the Puppet Agent on Solaris 9

I manage ~300 UNIX/Linux servers, responsible for serving up POS sessions at our stores. Around 200 of these are Solaris 9. The following is how I automated the deployment of the Puppet agent onto these Solaris 9 servers. If anyone finds this useful but has some questions, feel free to ask. There's not a lot of info out there on Solaris 9 Puppet deployment, so this did take a little bit of research to accomplish.

A few points:

* Our Solaris 9 server's hostname does not match our DNS name.
* We use an authenticated proxy solution, so the local script will set the http_proxy env variable for this reason (pkgutil likes to check inventory before installing)
* I have included all of the files referenced below in one bundle.tar
* The patch 113713-29 was required for OpenCSW to work (to allow me to install puppet)
* During the rollout phase, I turned auto-signing on, on my Puppet Master. To do this, add the following line to /etc/puppet/puppet.conf on the master server, under [master]:
autosign = true
* Both scripts below are required to install the puppet agent.


---- remote-solaris9-install.sh ----

# Puppet OpenSource Client Installation script for Solaris 9
# Author Daniel Eather
# Usage for x in $(cat server_list); do ./remote-install-solaris-9.sh ${x}; done

#!/bin/bash

# Check we have a hostname specified as an argument
if [ "$#" == 0 ] || [ "$#" -gt 1 ]
        then
                echo "ERROR: You must only supply the hostname as an argument."
                exit 1
fi

# Store and convert hostname to lower case (required for custom certname variable in agent's puppet.conf)
HOSTNAME=$1
HOSTNAME=`echo $HOSTNAME | tr '[:upper:]' '[:lower:]'`

# Time stamp
timeStamp=`/bin/date \+\%Y\%m\%d\%H\%M\%S`

# Check Puppet Agent is not already installed
checkPuppet=$(ssh root@${HOSTNAME} "ls /opt/csw/bin/puppet")
if [ $checkPuppet ]
        then
                echo "Puppet Agent is already installed on this system."
                checkPuppet=""
                exit 1
        else
                echo "Puppet Agent not found on system."
fi


# Generate default puppet config file
echo "# Puppet OpenSource Client Configuration File" > /tmp/puppet.conf.tmp
echo "# Author: Daniel Eather" >> /tmp/puppet.conf.tmp
echo "# Date:   ${timeStamp}" >> /tmp/puppet.conf.tmp
echo "" >> /tmp/puppet.conf.tmp
echo "[agent]" >> /tmp/puppet.conf.tmp
echo "    certname=${HOSTNAME}" >> /tmp/puppet.conf.tmp
echo "    node_name=${HOSTNAME}" >> /tmp/puppet.conf.tmp

# Copy over required files
scp 113713-29.zip root@${HOSTNAME}:/var/spool/pkg/
scp pkgutil-sparc.pkg root@${HOSTNAME}:/tmp/
scp local-solaris9-install.sh root@${HOSTNAME}:/tmp/
scp /tmp/puppet.conf.tmp root@${HOSTNAME}:/tmp/
scp sol9_puppet_dep.tar root@${HOSTNAME}:/tmp/

# Execute install script locally on server
ssh root@${HOSTNAME} "chmod +x /tmp/local-solaris9-install.sh"
ssh root@${HOSTNAME} "/tmp/local-solaris9-install.sh"


---- local-solaris9-install.sh ----

#!/bin/bash

# Set http proxy environment variable
export http_proxy=http://YOUR_DOMAIN\\USERNAME:PASSWORD@PROXY_ADDRESS:PROXY_PORT

# Unpack and install patch 113713-29
cd /var/spool/pkg; unzip /var/spool/pkg/113713-29.zip
yes | patchadd /var/spool/pkg/113713-29

# Install pkgutil
yes | pkgadd -d /tmp/pkgutil-sparc.pkg all

# Unpack install files for puppet, to save downloading them each time
mv /tmp/sol9_puppet_dep.tar /var/opt/csw/pkgutil/packages/
cd /var/opt/csw/pkgutil/packages/; tar xvf sol9_puppet_dep.tar

# Install Puppet
/opt/csw/bin/pkgutil -i -y  puppet

# Drop in puppet agent configuration file
mv /tmp/puppet.conf.tmp /etc/puppet/puppet.conf

# Connect to Puppet master and authenticate
/opt/csw/bin/puppet agent --waitforcert 60 -t

# Start Puppet
/etc/init.d/cswpuppetd start

# Cleanup large dependency file
rm -f /var/opt/csw/pkgutil/packages/sol9_puppet_dep.tar

Wednesday, March 12, 2014

Check multiple websites for the instance of a word

Just a quick and simple script that downloads a webpage, and checks it for a specific word (like ERROR). If the word is found, it displays it to the console, and logs it to a file (website_errors). As I've stated in the usage, you can for loop this on a list of servers. I've elected to keep the page's source, if you don't want to, just omit the "-O ${HOSTNAME}.source" from the wget command.

#!/bin/bash
#
# Author: Daniel Eather
# Date: 20140311
# Description: This script will download the main index.cgi page and check for the keyword ERROR.
#
# Usage: for x in $(cat server_list); do ./check_website.sh ${x}; done
#

HOSTNAME=$1

echo "Checking ${HOSTNAME}..."
wget -q -O ${HOSTNAME}.source "http://${HOSTNAME}/cgi-bin/index.cgi?A=blah&B=blah"
ERROR_STATE=`grep ERROR ${HOSTNAME}.source`
if [ "$ERROR_STATE" ]
then
        echo "${HOSTNAME}: ERROR Detected with index.cgi file!" | tee -a website_errors
fi

Wednesday, February 12, 2014

WARNING: Restarting the Agent of the Control Domain will stop this Guest (it might get even fully destroyed) !!!

Yikes, not the type of warning you want to see after upgrading your Oracle Enterprise Manager - Ops Center to Update 4! I received this error after upgrading my EC/Proxy and Agents to 12c U4, along with OCDoctor 4.26. Here is the error in its entirety (unique identifiers stripped for privacy):

---------------
Checking guest [LDOM_NAME] ...


ERROR: Critical issue detected !!!


The following config file does not list this Control Domain as current owner of guest [LDOM_NAME]:


File: /var/mnt/virtlibs/XXXXXXXXX/XXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXXX/guest-domain.properties


Content:


#Store the owner during create itself. DO NOT HAND EDIT.

#Thu Aug 15 09:51:57 EST 2013
owner.hostname=[INCORRECT_CDOM_NAME]
owner.vsc.id=[INCORRECT_CDOM_IDENTIFIER]

WARNING: Restarting the Agent of the Control Domain will stop this Guest (it might get even fully destroyed) !!!


This issue needs to get corrected immediately (Bug 16935439) 


There is a related config file 'guest-domain-FO.properties' which is listing a wrong owner as well:


File: /var/mnt/virtlibs/XXXXXXXXX/XXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXXX/guest-domain-FO.properties


Content:


#The future owner file. DO NOT HAND EDIT.

#Thu Aug 15 09:51:58 EST 2013
previous.owner.processing=false
owner.vsc.id=[INCORRECT_CDOM_IDENTIFIER]

This file should get corrected as well!

----------------------------------------

Turns out, there are two files that live within your LDom Meta directory for each logical domain, that dictate which Control Domain currently owns the LDom, and which Control Domain will be the "future owner".

Source code comments, define the future owner as:


39 /* Key under which the owner of this virtserver shall be stored
40 on the device rather than the database.
41 This key shall be having the latest and current owner always
42 and any differences in the ownership shall result in the
43 destruction of the instantiated virtserver runtime objects on
44 any given xVM Server system.


So basically, if there's an internal decision to make the CDom owner of the LDom's configuration (not the EC's and it's database), it will go to the CDom, whose key is listed in this guest-domain-FO.properties file.

Knowing all this, we need to backup these two files, and modify them to reflect the current CDom Owner, and that CDom's "vsc.id", or key. To get the current CDom's key, I listed all .properties files for each LDom to find one that was set correctly. Something like the following should list everything:

# find /var/mnt/virtlibs/XXXXXXXXX/ -name guest-domain-FO.properties | xargs cat

Modify these files and run your OCDoctor --troubleshoot again, and the issue should now be gone! The bug that the error quotes (16935439) is not public, but I did manage to get the following information from Oracle:

BUG 16935439 ---> Bug 17995697 - WRONG METADATA AFTER LDOM MIGRATION FAILURE (Related bug)

So I can only assume a failed migration has caused this condition. Hopefully it's resolved in Update 4, but I think I'll keep an eye on it following any future failed migration tasks.

Friday, February 7, 2014

Updating OCDoctor on Guest Domains Lacking Internet Connectivity

I've recently been upgrading Oracle Enterprise Manager Ops Center from Update 2, to Update 4. One of my many post-checks was to run an OCDoctor on every system, be it LDom, CDom, EC or Proxy Controller. Our guest domains can't reach the internet, and so I needed a quick way to push out and install the latest OCDoctor. The following line of code enumerates the list of LDoms (dropping the last character to match the hostname), copy over the bundle, and install.

# for x in $(ldm list | egrep -v NAME\|primary | awk '{print $1}' | sed 's/.$//'); do echo ${x}:; scp /var/tmp/OCDoctor-4.26.zip root@${x}:/var/opt/sun/xvm/; ssh root@${x} "cd /var/opt/sun/xvm; mv OCDoctor OCDoctor-20140207; unzip OCDoctor-4.26.zip"; done

Wednesday, January 22, 2014

Ops Center - Orphaned LDom after live migration

UPDATE: We recently upgraded to 12.2 and I can confirm this issue is still present in 12.2. The work-around listed below still works thankfully...

Recently I was charged with upgrading the system firmware across our T4-4 fleet. One of the provisos was that no LDom could be shutdown during the upgrade. This of course meant I needed to lean heavily on Oracle Enterprise Ops Center's "live migration" feature. Which at best, is more of a warm migration.

We are still running 12c Update 2 (not for long hopefully), so there were many issues encountered. One of the more annoying issues was after each migration, the LDom appeared to not be associated with a CDom anymore. It was still associated, and nothing actually was broken, just the view from the BUI showed it sitting at the top of the All Assets tree. And given it's aparent location, I was unable to migrate it again, until the hierarchy display was resolved.






An easy fix I discovered was restarting the proxy service on our colocated EC/Proxy server.

# /opt/SUNWxvmoc/bin/proxyadm stop -vw
# /opt/SUNWxvmoc/bin/proxyadm start -vw

Refresh the browser, and it's fixed. I logged an SR about this, and aparently it is resolved in Update 4 and/or the upcoming "Diamond" release. The engineer I spoke to did mention that a proxyadm restart fixed a lot of various BUI display issues...


Extending a Linux Virtual Disk

Something that popped up at work the other day, figured it was worth sharing given it's probably a fairly common task for a Sysadmin these days.
 
Check current disk size/free space
# df -h

 
Shut the machine down
# shutdown -h now

Using Microsoft Virtual Machine Manager (2008 R2):
- Right click the machine, and click on Properties
- Select the Hardware Configuration tab
- Select the disk from the left hand pane
- check Expand virtual hard disk, enter the new disk size and click OK

It will take a minute to rewrite the configuration and expand the current fixed vhd. Once complete, power the machine on again.


Delete current partition table for disk:
# fdisk device (eg. /dev/sda1)
d (delete partition)
n (create new partition)
p (primary)
1 (partition 1)
Enter (use default)
Enter (use default)
w (write new label)

Reboot machine:
# shutdown -r now

Resize filesystem:
# resize2fs (eg. /dev/sda1)

Confirm your changes have worked:
# df -h

...do the happy dance

Thursday, January 16, 2014

Solaris 9 shell script to find why a directory keeps filling up

We have thresholds set on certain file systems at work, and one continues to be exceeded at random intervals in the night. Rather than just bumping the threshold up, I thought I'd script something up to help understand what's happening.

The threshold was < 6GB available space on /tmp (hence the 629145 KB's within the if statement). It compares the last 10 minute snapshot (ls of /tmp) to the current snapshot, and outputs the difference to a file. Once space drops below 6GB, I receive an email with the last 50 lines of that difference file. It's not the cleanest code, but you get the picture:

#!/bin/bash

if [ -f /var/tmp/de.tmp.1 ]

then
        ls -lh /tmp > /var/tmp/de.tmp.2
        date >> /var/tmp/de.tmp.diff
        echo -e '\n' >> /var/tmp/de.tmp.diff
        diff /var/tmp/de.tmp.1 /var/tmp/de.tmp.2 >> /var/tmp/de.tmp.diff
        echo -e '\n-----------------------------------------------\n' >> /var/tmp/de.tmp.diff
        rm -f /var/tmp/de.tmp.1
        rm -f /var/tmp/de.tmp.2
else
        ls -lh /tmp > /var/tmp/de.tmp.1
fi

avail=$(df -k | grep \/tmp | awk '{print $4}')

if [ $avail -lt 6291456 ]
then
        emailMessage=$(tail -50 /var/tmp/de.tmp.diff)
        echo "$emailMessage" | mailx -s "/tmp on myserver less than 6291456 KB ($avail)" -r monitor@company.com myemail@company.com

fi


Once I tested it a few times, I created a cron entry to mirror our alarm policy (execute every 10 minutes of every hour ):

0,10,20,30,40,50 * * * * /usr/local/bin/tmp_space_alert.sh

Now to wait for it to happen, and analyse the email and /var/tmp/de.tmp.diff file.