Tag Archives: FunWith

Fun with APC USV

About two years ago I bought two APC Back-UPS BX. Everything worked fine and they helped to survive one or two power cuts. Occasionally on one of them I got a message about a power outage whereas the other remained silent. Yet everything worked fine and I didn’t pay attention. After a while these messages did not appear again.

Some days ago I wanted to look at the battery state and was quite surprised to only get an error message about a lost connection to the USV. The USV was not available on the USB bus and disconnecting/connecting the USV or rebooting the attached computer did not help. I had to switch off the USV, really pull the plug and wait sometime. Afterwards everything worked fine again. Of course the other one had the same problem :-(.

And the moral of this story: Missing messages might have a reason and if something can fail, write a test and let nagios (or icinga or check_mk or whatever) tell you that there is something wrong.

Fun with Hetzner: How long does it take to fix a bug?

For some time I have a CX11 instance running at Hetzner. It is connected via vpn to my internal network. There was only IPv4 traffic sent over that connection and everything was fine. Until an idea occured to me: Why not use IPv6 over that tunnel as well? I mean, I did that on others servers, so why not on this one? Copy and paste of some lines of configuration and voila, I could “ping6″ from the vpn server to the client. That was easy …

But suddenly my nagios went red because the external IPv6 connection was gone. Did I do some routing wrong? Stopping openvpn and everything was fine again. So I deactivated IPv6 and wondered why this tiny little server behaves different from any other.

After some time I found this article and I can confirm that Hetzner was not able to fix their bug after about two years of reporting it. Well done Hetzner!

Fun with openvpn: how to jam a line

After upgrading my internet connection from something like 50MBit/s download and 5MBit/s upload to 100Mbit/s download and 50Mbit/s upload, my openvpn connection to an external server became really slow. Before the upgrade, the answer of a ping came after about 40ms, after the upgrade it needed about 1000ms and from time to time even 10000ms. Yes, really 10s to get the reply though I did not change any openvpn configuration.

The solution: Never ever use TCP for an openvpn connection, always use UDP!

There are more options to tune the connection (like don’t use compression nowadays) but really do not use TCP!

Fun with puppet: Is puppet really running?

I am using puppet to configure most of my machines. Unfortunately I am not perfect and introduce errors in my modules. Of course I only test such modules on machines that are not affected. On an affected machine puppet starts running, works on some modules, detects an error and stops. So sometimes I have a happily running puppet that does only half of the tasks it should do. Using stages in puppet I can hopefully detect such situations.

First I define stages in my manifest/nodes.pp:

stage { 'start':
before => Stage['main'],
}
stage { 'last': }
Stage['main'] -> Stage['last']

class { 'createstamp':
stage => 'last',
}

class { 'resolv_conf':
stage => 'start',
}

I have one stage start that is executed at the beginning and one stage last that shall be done when everything else is ready. Everything else will run in stage main.
At the moment I only have one module resolv_conf at the beginning. DNS should always work as expected. The only module in the last stage is createstamp that just creates a temporary file containing a time stamp.


class createstamp {
file { 'stamp':
path => "/usr/local/nagios/createStamp",
ensure => file,
mode => '0644',
owner => 'root',
group => 'root',
source => [
"puppet:///modules/createstamp/stamp"
],
}
}

The file in this module will be created on the puppetmaster with a cronjob that runs every two hours:

#!/bin/bash
STAMPFILE=/etc/puppet/code/environments/production/modules/createstamp/files/stamp
s2000=`date +%s --date="Jan 1 00:00:00 UTC 2000"`
now=`date +%s`
echo $((now-s2000)) > $STAMPFILE

No I just have to check this file with nagios and a custom nrpe check like:

#!/bin/sh
STAMPFILE=/usr/local/nagios/createStamp
s2000=`date +%s --date="Jan 1 00:00:00 UTC 2000"`
if [ ! -f $STAMPFILE ]; then
echo "CRITICAL - no stampfile available here"
exit 2
fi
now=`date +%s`
if [ -f $STAMPFILE ]; then
stampTime=`cat $STAMPFILE`
fi
diff=$((now-s2000-stampTime))
if [ $diff -gt 60000 ]; then
echo "CRITICAL - stamp to old: $now / $((now-s2000)) $stampTime"
exit 2
else
echo "OK - stamp ok $now / $((now-s2000)) $stampTime"
fi
exit 0

In this case I wait for 60000s before nagios complains. This is due to some external machines running nagios only every 8h. So I wait 16h before everything goes red.

Fun with puppet — runinterval

Notice to my future self: The default interval between two runs of puppet is 30min or 1800s. In case this is too short you can add something like:

runinterval = 28800

to the [main] section of the puppet client configuration.

If you want to do this automagically, just run the command

puppet config set runinterval 28800

on each client.

Another command you might want to remember:

puppet agent --configprint runinterval