TDC: Test-driven configuration, the first module

The first module I want to develop with TDC is just a small one. I want to distribute email certificates obtained by Lets Encrypt to some email servers:

 class email_cert {

 }

I am using the puppet module puppet-tdc , that is at the moment only available on github.

The first things to add are some tests:

 class email_cert {
  class{'tdc':
  }
  class{'tdc::test_directory':
     directory    => ['/etc/email-cert'],
  }
  class{'tdc::test_file':
     file    => ['/etc/email-cert/mail-fullchain.pem', '/etc/email-cert/mail-privkey.pem'],
  }
 }

I am using the new class tdc and want to check whether the directory for the certificates is available and whether the certificates itself are there.

Great, after awhile my nagios shows lots of red stuff. Every server that uses my new class email_cert is automatically creating tests for nagios and distributes them. All of them fail.

So now it is time for the real data:

class email_cert {

  class{'tdc':
  }
  class{'tdc::test_directory':
     directory    => ['/etc/email-cert'],
  }
  class{'tdc::test_file':
     file    => ['/etc/email-cert/mail-fullchain.pem', '/etc/email-cert/mail-privkey.pem'],
  }

  file { [ '/etc/email-cert', 
         ]:
          ensure => 'directory',
          mode   => '750',
          owner  => 'Debian-exim',
          group  => 'nagios',
       }
  file { 'cert-fullchain.pem': 
         path    => "/etc/email-cert/mail-fullchain.pem", 
         ensure  => file, 
         mode    => '640',
         owner   => 'Debian-exim',
         group   => 'nagios',
         source  => "puppet:///modules/email_cert/cert/mail-$fqdn-fullchain.pem"
  } 
  file { 'cert-privkey.pem': 
         path    => "/etc/email-cert/mail-privkey.pem", 
         ensure  => file, 
         mode    => '640',
         owner   => 'Debian-exim',
         group   => 'nagios',
         source  => "puppet:///modules/email_cert/cert/mail-$fqdn-privkey.pem"
  } 
 }

After waiting some time, my nagios calms down and everything is green again. According to aNag, I also got 30 new tests.

Of course I had to change the config of nrpe and nagios a bit. In nrpe_local.cfg I had to add:

include_dir=/usr/local/nagios/tdc/config

in order to let nrpe know those newly created tests on the puppet agent.

I also had to tell the nagios server via its puppet module, that there is a new directory containing config data:

      file { '/etc/nagios4/conf.d/tdc':
          ensure => 'directory',
          source => 'puppet:///modules/nagios4_server/tdc',
          recurse => 'remote',
          path => '/etc/nagios4/conf.d/tdc',
          owner => 'root',
          group => 'root',
          mode  => '0755',
          notify  => Service["nagios4"],
        }

TDC: Test-driven configuration, the first steps

During TDD (Test-driven development) you have some kind of specifications that your new module needs to fulfill. Hence you know what functions need to be implemented and what kind of tests you need to write.

During TDC things are similar. You know the software you are going to install. Hence you know how this software works, what configuration is needed and what kind of features you and your users want to use in the beginning.

This results in a number of similar tests for each software.

  1. In a first step you need to check whether …
    • … the binaries are available
    • … the common config files are available
    • … the directories are available
    • … all files and directories have the correct owner and access permissions

    In case you install a package from your distribution, a set of predefined config files should be included in the delivery and your test should be green immediately after installation.

    In case you do a manual installation you need to fiddle around with the configuration files. But as you thought before what config is needed, this task should be an easy one.

    These tests are not only important for the initial installation, but also later on, when you upgrade your system or just install a newer release.
    Often enough names of binaries have been changed (apache -> apache2), the path of config files has been changed, the name of the package has been changed ) or even some binaries are now part of a different package.

    If a test fails, you immediately know that something changed and your configuration (or only the tests) needs to be adapted.

  2. If all files are in the proper place, in the second step you need to check whether their contents is fine as well. Some software has its own tool to check the config files.

    For example “apache2ctl configtest” does a syntax check of the Apache config. Nagios does such checks with “nagios -v nagios.cfg”.

  3. The third step would be to check whether the newly installed software is running. You also need to check whether the correct number of processes is running. For example exim4 basically starts a process for each incoming email and normaly (at least in my case) there are up to five processes running. In case you have 128 exim4 processes running, this might be an indicator that there is something wrong with your incoming email.

As written in the previous article, I am doing configuration management with puppet, so the corresponding puppet module will appear in this repository.

TDC: Test-driven configuration

Is there anything more terrible than getting a phone call and someone complaining about a printer that does not work?
Yes, you have to admit that something is broken and you are not able to repair it immediately.

So, how can you avoid such situations?

For some time now in software development you can find a concept called TDD (Test-driven development). Basically you repeat three steps:

  1. develop a test for a new function; as this function does not exist yet, this test has to fail in the beginning
  2. develop the function; the aim in this step is to pass the test and don’t break any other test
  3. refactor code; code can be optimized and improved, do not break any test in this step

Would it be possible to introduce something similar for the configuration of computers/software? Analogous to TDD one could call this TDC (Test-driven configuration).

There is only one relevant Google hit for this keyword. It is an article written by David Lutz from 2011. Other Keywords like “Test-driven Sysadmin” show among others some slides of Johan van den Dorpe which led to Behavioral Driven Development (BDD) and some software called cucumber. Another software that is mentioned is babushka.

Unfortunately these software does not fit into existing solutions for configuration management like puppet or testing like nagios. They are independent software packages that need their own configuration, which might lead to some kind of synchronization error between both worlds.

Is this really such a weird idea that just a few people thought about this?

Anyway, I will also give it a try and I am curious where all of this is going.

In Wikipedia there is a huge list of configuration management software. As all my systems are running on Debian, I will limit my work on software that is part of Debian and has an arbitrary popcon value of at least 1000. So my main target will be puppet. Other software I will look into are ansible and salt. Due to their trademark policy, chef won’t be part of Debian soon, so I won’t consider it.

In Wikipedia one can also see an even bigger list of network monitoring system. My main target on this list will be nagios. I will also look into zabbix and prometheus.

Finally I found a tweet from Jaana Dogan:
“Configuration testing is quite underrated in an industry where the majority of work is becoming configuration.”