Clustering

Merlin allows clustering of multiple Naemon instances, to add redundancy and loadbalancing to the Naemon monitoring infrastructure.

Merlin works in two main ways:

  • Sends Naemon events, such as check results, between nodes over a TCP connection

  • Keeps track, and syncs Naemon configuration across nodes over SSH

A single Merlin node, can have up to 65534 neighbours, allowing one to build a large monitoring infrastructure.

Node types

Merlin has three different node types:

Peer

A peer is a redundant and loadbalanced node. A collection of peers, called a peergroup, divides the check load equally. If you have three nodes in a peergroup, each peer will execute 33% of the configured Naemon checks each. If one of the nodes in a peergroup goes down, Merlin will redistribute the check load to the two online peers, so they execute 50% of the checks each.

Peers will always run the same Naemon configuration.

Poller

A poller is responsible for checking host and services belonging to one or more hostgroups. This makes it possible to put a Merlin node in a remote network. Each poller can also be peered, ensuring multiple nodes are responsible for the set of hostgroups.

A poller only has a subset of the full Naemon configuration (based on the hostgroup assignment), and will never know of any objects not belonging to the specified hostgroups.

It is possible to setup multiple pollergroups which are responsible for different sets of hostgroups.

Master

A poller must connect to each node in the master peergroup. The master peergroup, keeps track of the full Naemon configuration set. Masters will split the Naemon configuration for pollers to use.

Cluster setup walkthrough

In this walk-through we’ll go through creating a cluster with three nodes in total. A master peergroup with two masters, and a single remote poller.

By the end we should have a node structure that looks like the following:

+----------+    +----------+
| master01 |----| master02 |
+----------+    +----------+
   |
   |
   | HOSTGROUP: pollergroup  +----------+
   = ------------------------| poller01 |
                             +----------+

Preparation

To begin with, prepare three machines (master01, master02, poller01), with Naemon and Merlin installed as per the installation instructions. Make sure that there are no firewalls blocking port 22 (for SSH) & 15551 (Merlins default TCP port). Ensure that you had the root password for all machines at hand and that SSH login on the root account is enabled.

Adding a peer

We’ll start by peering master01 and master02. To begin we need to ensure that passwordless SSH connection is possible between the nodes. Merlin includes a convenience script to setup and install SSH keys across the nodes, that we’ll use below.

[root@master01 ~]# mon sshkey push IP-OF-MASTER01

[root@master02 ~]# mon sshkey push IP-OF-MASTER02

With the above, we should be able to SSH between both nodes, both as the root user and the naemon user.

Now we setup the peers in each nodes Merlin configuration. We’ll use another mon tool adjust Merlins configuration file.

We’ll start by adding master01 to master02’s configuration.

Start by adding add master01 to the Merlin config on master02.

[root@master02 ~]# non node add master01 type=peer address=IP-OF-MASTER01

On master01 then, add master02 to the Merlin config, and push the Naemon configuration to master02. We always push configuration as the naemon user. Finally, restart Naemon & Merlin.

[root@master01 ~]# mon node add master02 type=peer address=IP-OF-MASTER02
[root@master01 ~]# su naemon -c "mon oconf push master02"
[root@master01 ~]# mon restart

At this point we should have a healthy cluster, and we can use a few more mon tools to look at our cluster state. These two nodes are now loadbalanced sharing the check executions equally.

[root@master01 ~]# mon node status
Total checks (host / service): 4 / 21

#00 0/1:1 local ipc: ACTIVE - 0.000s latency
Uptime: 15m 47s. Connected: 15m 48s. Last alive: 8s ago
Host checks (handled, expired, total)   : 2, 0, 4 (50.00% : 50.00%)
Service checks (handled, expired, total): 11, 0, 21 (52.38% : 52.38%)

#01 1/1:1 peer master02: ACTIVE - 0.000s latency - (UNENCRYPTED)
Uptime: 15m 48s. Connected: 15m 48s. Last alive: 8s ago
Host checks (handled, expired, total)   : 2, 0, 4 (50.00% : 50.00%)
Service checks (handled, expired, total): 10, 0, 21 (47.62% : 47.62%)

[root@master01 ~]# mon node tree
 +-----+    +----------+
 | ipc |----| master02 |
 +-----+    +----------+

Adding a poller

Now that we have two peers in the master peergroup, we can add a poller. We must first decide which hostgroup(s) the poller should be responsible for. Before getting started with the setup, ensure that the hostgroup has already been added to the masters Naemon configuration. In our example we’ll use a hostgroup called pollergroup.

SSH connection must be established to both masters, so we start by adding SSH keys.

[root@poller01 ~]# mon sshkey push IP-OF-MASTER01
[root@poller01 ~]# mon sshkey push IP-OF-MASTER02
[root@master01 ~]# mon sshkey push IP-OF-POLLER01

[root@master02 ~]# mon sshkey push IP-OF-POLLER02

With the above we have ensured that the poller can SSH to both masters, and that both masters can SSH to the poller. We now add the poller to both masters. Afterwards we restart both masters. This ensures the Merlin will prepare the a subset of the Naemon configuration for poller01.

[root@master01 ~]# mon node add poller01 type=poller hostgroup=pollergroup address=IP-OF-POLLER01
[root@master01 ~]# mon restart

[root@master02 ~]# mon node add poller01 type=poller hostgroup=pollergroup address=IP-OF-POLLER01
[root@master02 ~]# mon restart

On the poller, we now need to add both masters to the Merlin configuration.

[root@poller01 ~]# mon node add master01 type=master address=IP-OF-MASTER01
[root@poller01 ~]# mon node add master02 type=master address=IP-OF-MASTER02

Finally, on one of the masters, do the initial configuration push to the poller manually.

[root@master01 ~]# su naemon -c "mon oconf push poller01"

We have now added a poller, and we use the mon tools again to view the state of our cluster.

[root@master01 ~]# mon node tree
 +-----+    +----------+
 | ipc |----| master02 |
 +-----+    +----------+
    |
    |
    | HOSTGROUP: pollergroup  +----------+
    = ------------------------| poller01 |
                              +----------+


[root@master01 ~]# mon node status
Total checks (host / service): 5 / 29

#00 0/1:1 local ipc: ACTIVE - 0.000s latency
Uptime: 12m 24s. Connected: 12m 25s. Last alive: 2s ago
Host checks (handled, expired, total)   : 3, 0, 4 (75.00% : 60.00%)
Service checks (handled, expired, total): 11, 0, 21 (52.38% : 37.93%)

#01 1/1:1 peer master02: ACTIVE - 0.000s latency - (UNENCRYPTED)
Uptime: 12m 22s. Connected: 12m 17s. Last alive: 2s ago
Host checks (handled, expired, total)   : 1, 0, 4 (25.00% : 20.00%)
Service checks (handled, expired, total): 10, 0, 21 (47.62% : 34.48%)

#02 0/0:0 poller poller01: ACTIVE - 0.000s latency - (UNENCRYPTED)
Uptime: 3m 39s. Connected: 3m 39s. Last alive: 4s ago
Host checks (handled, expired, total)   : 1, 0, 1 (100.00% : 20.00%)
Service checks (handled, expired, total): 8, 0, 8 (100.00% : 27.59%)