geneos-good-practice

Geneos Level 2 Monitoring Best Practice Guide

Document Controls

Version 1.2

31^st July 2014

Revision History

Version	Date	Author	Notes
1.0	05/12/2013	JH	First Draft
1.1	08/01/2014	JH	Revision after initial feedback
1.2	31/07/2014	JH	Copy to new company template

Introduction
- Purpose
- Scope
- Audience
- Background
Process Monitoring
- Process Monitoring Usage
- Process Monitoring Configuration
  - Single Instance Processes
  - Clustered Processes
Log File Monitoring
Appendix
- Process Configuration
  - Processes Rules

Introduction

Purpose

The purpose of this document is to provide information concerning the implementation of best practices for Level 2 Monitoring, encompassing process and log file monitoring.

Scope

The scope of this document is restricted to details of configuration related solely to the implementation of best practices for processes and log file monitoring and does not extend to any explanation related to the general configuration of the Process or FKM Plugins.

Audience

The intended audience for this document is any ITRS personnel or any ITRS client interested in the implementation of best practices for Level 2 monitoring.

Background

The details in this document are the result of the accumulated experience of the Professional Services department implementing Geneos environments for many different clients and are thought to provide the most reliable solution in terms of maintenance and scalability.

Process Monitoring

Application processes can be configured as Single Instance, Active – Passive or Active – Active processes, depending on the needs and/or requirements of the application or organisation.

The following procedures detail the way in which these differences can be accommodated within the configuration in the most straightforward manner, both for monitoring and configuration purposes.

The information contained in this section of the document relates to the best practices for monitoring processes. Although there is specific configuration related to this, this configuration is mostly related to Active – Passive and Active – Active processes. There is no attempt here to provide best practice for general process monitoring or configuration (i.e. use of Samplers, Sampler Includes, Process Descriptors, Variables etc.), as this will be dependent on individual requirements.

Process Monitoring Usage

Once the process monitoring has been configured as detailed in the Configuring Processes the Active Console will display information differently for each process type.

Single Instance Processes

For Single Instance processes, the Active Console Metrics view shows the Instance Count for each process.

Alt text

The default Rule for the instanceCount cell will set the severity of the cell to:

OK if the value is equal to 1
Warning if the value is greater than 1
Critical if the value is less than 1

Active – Passive Processes

For Active – Passive processes, the Active Console Metrics view shows the Instance Count, but will additionally show the Cluster Count for each process.

For Active – Passive processes it can be expected that the Cluster Count will be either 1 or 0.

Alt text

The processName is identical for all instances of a clustered process and this allows instances of the process to be matched across multiple servers.

Each process row contains values for the instanceCount, i.e. the specific process instance for the server, and the clusterCount, i.e. the number of instances of the particular processes that are active across all servers.

In the example above:

The clusterCount is 1 and the instanceCount is 1 for the Primary FX Pricer processes and the clusterCount is 1 and the instanceCount is 0 for the Secondary FX Pricer processes. This is how Active – Passive processes are expected to run and the severity is set to OK accordingly
The clusterCount is 2 and the instanceCount is 1 for both the Primary and the Secondary FX Connect processes. This is not how Active – Passive processes are expected to run and the severity is set to Critical accordingly
The clusterCount is 1 and the instanceCount is 0 for the Primary FX Options processes and the clusterCount is 1 and the instanceCount is 1 for the Secondary FX Options processes. This is how Active – Passive processes are expected to run when failed over and the severity is set to OK accordingly

The Rule for clustered processes is not associated with the instanceCount column, but instead is associated with the clusterCount column.

For Active – Passive processes, the default Rule will set the severity of the cell to:

OK if the clusterCount value is 1
Critical if the clusterCount value is not equal 1

This allows the user to be informed in the event that both processes are down and also in the event that both processes are up.

Active – Active Processes

For Active – Active processes, the Active Console Metrics view shows the Instance Count, but will additionally show the Cluster Count for each process in the same manner as for the Active – Passive processes.

Alt text

In the example above:

The clusterCount is 1 and the instanceCount is 1 for the Primary FX Pricer processes and the clusterCount is 1 and the instanceCount is 0 for the Secondary FX Pricer processes. This implies that the Active * Active processes is not running correctly. Although one of the clustered processes has failed, the application is still available, even though there might be load issues. To indicate that attention is needed the severity is set to Warning
The clusterCount is 2 and the instanceCount is 1 for both the Primary and the Secondary FX Connect processes. This is how Active – Active processes are expected to run and the severity is set to OK accordingly
The clusterCount is 0 and the instanceCount is 0 for both the Primary and the Secondary FX Options processes. This indicates that the application has failed and the severity is set to Critical accordingly

The Rule for clustered processes is not associated with the instanceCount column, but instead is associated with the clusterCount column.

The severity for cells is based on the value assigned to 2 variables:

activeActiveCritical: the number of processes at which the severity is set to Critical (usually 0)
activeActiveOK: the number of processes expected to exist in the cluster

The default Rule will set the severity of the cell to:

OK if the clusterCount value is equal to the value set for the activeActiveOK variable
Warning if the clusterCount value is less than the value set for the activeActiveOK variable
Critical if the clusterCount value is equal to the value set for the activeActiveCritical variable

Process Monitoring Configuration

The document will not describe how to configure individual processes, as it is expected that this information is either already known or available from other sources.

Where there are specific differences or additions to process configuration to accommodate the required type of process monitoring, these details will be included.