tip – blog.alteholz.eu

23 March 2020

About two years ago I bought two APC Back-UPS BX. Everything worked fine and they helped to survive one or two power cuts. Occasionally on one of them I got a message about a power outage whereas the other remained silent. Yet everything worked fine and I didn’t pay attention. After a while these messages did not appear again.

Some days ago I wanted to look at the battery state and was quite surprised to only get an error message about a lost connection to the USV. The USV was not available on the USB bus and disconnecting/connecting the USV or rebooting the attached computer did not help. I had to switch off the USV, really pull the plug and wait sometime. Afterwards everything worked fine again. Of course the other one had the same problem :-(.

And the moral of this story: Missing messages might have a reason and if something can fail, write a test and let nagios (or icinga or check_mk or whatever) tell you that there is something wrong.

4 October 2019

Fun with Hetzner: How long does it take to fix a bug?

For some time I have a CX11 instance running at Hetzner. It is connected via vpn to my internal network. There was only IPv4 traffic sent over that connection and everything was fine. Until an idea occured to me: Why not use IPv6 over that tunnel as well? I mean, I did that on others servers, so why not on this one? Copy and paste of some lines of configuration and voila, I could “ping6” from the vpn server to the client. That was easy …

But suddenly my nagios went red because the external IPv6 connection was gone. Did I do some routing wrong? Stopping openvpn and everything was fine again. So I deactivated IPv6 and wondered why this tiny little server behaves different from any other.

After some time I found this article and I can confirm that Hetzner was not able to fix their bug after about two years of reporting it. Well done Hetzner!

23 April 2019

Fun with openvpn: how to jam a line

After upgrading my internet connection from something like 50MBit/s download and 5MBit/s upload to 100Mbit/s download and 50Mbit/s upload, my openvpn connection to an external server became really slow. Before the upgrade, the answer of a ping came after about 40ms, after the upgrade it needed about 1000ms and from time to time even 10000ms. Yes, really 10s to get the reply though I did not change any openvpn configuration.

The solution: Never ever use TCP for an openvpn connection, always use UDP!

There are more options to tune the connection (like don’t use compression nowadays) but really do not use TCP!

2 January 2019

Fun with puppet: Is puppet really running?

I am using puppet to configure most of my machines. Unfortunately I am not perfect and introduce errors in my modules. Of course I only test such modules on machines that are not affected. On an affected machine puppet starts running, works on some modules, detects an error and stops. So sometimes I have a happily running puppet that does only half of the tasks it should do. Using stages in puppet I can hopefully detect such situations.

First I define stages in my manifest/nodes.pp:
stage { 'start': before => Stage['main'], } stage { 'last': } Stage['main'] -> Stage['last']


class { 'createstamp':

     stage => 'last',

}

class { 'resolv_conf': stage => 'start', }

I have one stage start that is executed at the beginning and one stage last that shall be done when everything else is ready. Everything else will run in stage main.
At the moment I only have one module resolv_conf at the beginning. DNS should always work as expected. The only module in the last stage is createstamp that just creates a temporary file containing a time stamp.

class createstamp { file { 'stamp': path => "/usr/local/nagios/createStamp", ensure => file, mode => '0644', owner => 'root', group => 'root', source => [ "puppet:///modules/createstamp/stamp" ], } }

The file in this module will be created on the puppetmaster with a cronjob that runs every two hours:
#!/bin/bash STAMPFILE=/etc/puppet/code/environments/production/modules/createstamp/files/stamp s2000=`date +%s --date="Jan 1 00:00:00 UTC 2000"` now=`date +%s` echo $((now-s2000)) > $STAMPFILE

No I just have to check this file with nagios and a custom nrpe check like:
#!/bin/sh STAMPFILE=/usr/local/nagios/createStamp s2000=`date +%s --date="Jan 1 00:00:00 UTC 2000"` if [ ! -f $STAMPFILE ]; then echo "CRITICAL - no stampfile available here" exit 2 fi now=`date +%s` if [ -f $STAMPFILE ]; then stampTime=`cat $STAMPFILE` fi diff=$((now-s2000-stampTime)) if [ $diff -gt 60000 ]; then echo "CRITICAL - stamp to old: $now / $((now-s2000)) $stampTime" exit 2 else echo "OK - stamp ok $now / $((now-s2000)) $stampTime" fi exit 0

In this case I wait for 60000s before nagios complains. This is due to some external machines running nagios only every 8h. So I wait 16h before everything goes red.

7 July 2018

Import git repository from alioth-archive.debian.org to salsa.debian.org

All repositories that had not been migrated before the shutdown of alioth are still available at the alioth archive. There you can find a compressed tarfile of the bare repository.

So to move such a repository to salsa …

… create the new repository on salsa
… download your file from alioth-archive
wget https://alioth-archive.debian.org/git/debian-iot/duktape.git.tar.xz
… unpack it
tar -Jxf duktape.git.tar.xz
… cd to your bare repository
cd duktape.git
… push your repository to salsa
git push --mirror git@salsa.debian.org:debian-iot-team/duktape.git

Voila, your new repository is ready to be used.

1 May 2018

Fun with puppet — runinterval

Notice to my future self: The default interval between two runs of puppet is 30min or 1800s. In case this is too short you can add something like:
runinterval = 28800
to the [main] section of the puppet client configuration.

If you want to do this automagically, just run the command
puppet config set runinterval 28800
on each client.

Another command you might want to remember:
puppet agent --configprint runinterval

29 April 2018

Fun with broken harddisks

Today I needed to replace a faulty harddisk, which had a GPT, in a software RAID1. A GPT is a Guid Partition Table and is normally needed for partitions > 2TB. But wait, my external harddisk has 4TB and it uses an MBR (Master Boot Record)!?

In an MBR the partition size is stored in four bytes, which could have 0xFFFFFFFF as a maximum value. This would be 4294967295 in decimal. But the partition size is not given in bytes but in sectors. On Linux systems the sector size of an attached harddisk can be found in /sys/block/sd[X]/queue/hw_sector_size.
root@server:~ # cat /sys/block/sdd/queue/hw_sector_size 512
This is the normal sector size of a harddisk, so 4294967295 sectors of 512 bytes result in 2TB.

Luckily some external harddisks have a sector size of 4096 bytes.
root@server:~ # cat /sys/block/sda/queue/hw_sector_size 4096

This results in a partition size of 16TB.

Anyway, my disk had a GPT and after installing the new harddisk, it had to get a copy of the GPT of the first one. This can be done with sgdisk, that is part of package gdisk on Debian systems. So after doing apt-get install gdisk one can:
sgdisk --replicate=/dev/sdb /dev/sda
In this case /dev/sda is the source disk and /dev/sdb is the new one.

You can see the GPT with:
sgdisk -p /dev/sda sgdisk -p /dev/sdb

Due to the cloning, both disks have the same GUID and to avoid hassle, the new one needs a new GUID. This is done with:

sgdisk -G /dev/sdb

The structure of the software raid can be seen in /proc/mdstat. In my case I have three md devices: md0, md1 and md2
On my system md0 currently has only one active member /dev/sda2. So /dev/sdb2 has to be added:
mdadm /dev/md0 --manage --add /dev/sdb2 As this is just a small partition, it took only a few seconds and syslog showed: [ 5881.551829] md: bind [ 5881.581014] RAID1 conf printout: [ 5881.581020] --- wd:1 rd:2 [ 5881.581026] disk 0, wo:0, o:1, dev:sda2 [ 5881.581030] disk 1, wo:1, o:1, dev:sdb2 [ 5881.581174] md: recovery of RAID array md0 [ 5881.581180] md: minimum _guaranteed_ speed: 1000 KB/sec/disk. [ 5881.581186] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery. [ 5881.581195] md: using 128k window, over a total of 499988k. [ 5889.511049] md: md0: recovery done. [ 5889.614014] RAID1 conf printout: [ 5889.614020] --- wd:2 rd:2 [ 5889.614026] disk 0, wo:0, o:1, dev:sda2 [ 5889.614031] disk 1, wo:0, o:1, dev:sdb2


The same needs to be done for the other partitions:

 mdadm /dev/md1 --manage --add /dev/sdb3

 mdadm /dev/md2 --manage --add /dev/sdb4


They are way bigger and recovery of the RAID lasts a bit longer. But finally everything is done and nagios switches back from red to green. Mission accomplished!




		
		Posted on 24 April  2018
bind: rndc addzone and also-notify
	


	
	
		Notice to my future self: If you add zones to bind by rndc addzone please remember that those zones will be stored in /var/cache/bind/*.nzf. If you have to change your nameservers, you also need to adapt the also-notify list in all zones. If you forget one zone and there is one unused ip address in that list, all slaves will get the notification, start the transfer but the update won’t happen and the old data remain on the slave.
This sounds really crazy, but think about April 2018, when the challenge for your letsencrypt certificate was added to the master server but never reached the slaves. The log was full of 


ERROR: Challenge is invalid! (returned: invalid) (result: {

  "type": "dns-01",

  "status": "invalid",

  "error": {

    "type": "urn:acme:error:unauthorized",

    "detail": "Incorrect TXT record \"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx\" found at _acme-challenge.xxxxxxxxx",

    "status": 403

  },


	


	



		
		Posted on 19 March  2017
Let other devices use my own NTP server
	


	
	
		I have these fine set-top boxes here, that try to synchronize their time with some external NTP servers. 
The names of the NTP servers are coded into the firmware and can not be changed in the network settings menu. They are called ntp1.technibutler.de, ntp2.technibutler.de and ntp3.technibutler.de. Though they are already Stratum 2 servers, I would rather use my own, local DCF77 radio clock. Obviously it makes no sense to contact some server in the wide internet to get information that is already available locally.
Luckily those servers are just used for time synchronization and nobody wants to get web pages from them or wants to send emails to them. So all that needs to be done is to redefine their address resolution in DNS.
In a first step, I configure my own DNS server. The example below are config files for bind9. Any other DNS server should work as well, just pretend that you are authorized to answer queries for the technibutler NTP servers. As long as there is no DNSSEC or secure NTP involved, everything is fine.
First I need to define the different zones. As there might be other services within the technibutler.de zone, that I still want to use, I will define an extra zone for each hostname of the NTP servers. 
;
$TTL    86400
@       IN      SOA     ntp1.technibutler.de. redefined-dns.alteholz.de. (
                              1         ; Serial
                         604800         ; Refresh
                          86400         ; Retry
                        2419200         ; Expire
                          86400 )       ; Negative Cache TTL
;
@       IN      NS      localhost.
@       IN      A       10.10.10.1

;
$TTL    86400
@       IN      SOA     ntp2.technibutler.de. redefined-dns.alteholz.de. (
                              1         ; Serial
                         604800         ; Refresh
                          86400         ; Retry
                        2419200         ; Expire
                          86400 )       ; Negative Cache TTL
;
@       IN      NS      localhost.
@       IN      A       10.10.10.1

;
$TTL    86400
@       IN      SOA     ntp3.technibutler.de. redefined-dns.alteholz.de. (
                              1         ; Serial
                         604800         ; Refresh
                          86400         ; Retry
                        2419200         ; Expire
                          86400 )       ; Negative Cache TTL
;
@       IN      NS      localhost.
@       IN      A       10.10.10.1

I store those configs in /etc/bind/redefined/db.ntp1.technibutler.de, /etc/bind/redefined/db.ntp3.technibutler.de and /etc/bind/redefined/db.ntp3.technibutler.de. The only IP address that is needed in these files are the actual IP address of my local NTP server. As I just have only one, all NTP servers from technibutler.de need to point to this address.
Now I have to tell bind that my zones are the master zone. This is done in /etc/bind/redefined/redefined-zones.conf:
zone "ntp1.technibutler.de" {
   type master;
   file "/etc/bind/redefined/db.ntp1.technibutler.de";
};

zone "ntp2.technibutler.de" {
   type master;
   file "/etc/bind/redefined/db.ntp2.technibutler.de";
};

zone "ntp3.technibutler.de" {
   type master;
   file "/etc/bind/redefined/db.ntp3.technibutler.de";
};

And last but not least I have to tell bind9 to load this config during startup. So I add a line:
include "/etc/bind/redefined/redefined-zones.conf";

at the beginning of /etc/bind/named.conf.local
And voila, before that configuration:
$ nslookup ntp1.technibutler.de
Server:         10.10.10.254
Address:        10.10.10.254#53

Non-authoritative answer:
Name:   ntp1.technibutler.de
Address: 62.138.2.9

and after that configuration:
$ nslookup ntp1.technibutler.de
Server:         10.10.10.254
Address:        10.10.10.254#53

Non-authoritative answer:
Name:   ntp1.technibutler.de
Address: 10.10.10.1

After the configuration of your DNS server is done, you just need to point the set-top boxes or any other device in your home network to your own DNS server. You can either deliver this information via “option domain-name-servers” with DHCP, or manually put your DNS server in the network settings of your device.
	


	


	
		Posts navigation
		Page 1
Page 2
…
Page 5
Next page

Tag: tip

Problem with go test

Fun with APC USV