Document toolboxDocument toolbox

(12.06.100-en) UMS HA Health Check - Analyse Your IGEL UMS High Availability and Distributed UMS Systems

With the UMS HA Health Check, you can perform an overall check of your multi-instance IGEL Universal Management Suite (UMS) installations. It checks whether the interaction between the components of the High Availability (HA) system or the Distributed UMS is working properly, in particular, whether the components can exchange messages and data:

The permission to use the UMS HA Health Check feature can be set under System > Administrator accounts, see General Administrator Rights.


Menu path: Menu bar > Help > UMS HA Health Check


To check your HA environment / Distributed UMS, proceed as follows:

  1. Make sure the servers and the components installed on them are in normal operational mode.

  2. In the menu bar, go to Help > UMS HA Health Check.

  3. Disable the checkbox Clear cached performance data before check if you want the cached data from previous runs to be included in the analysis.


    After the necessary data are collected and analyzed, a window opens where the results and corresponding recommendations are presented in a number of tabs. Each tab has a Show Details button that opens a detailed analysis report in HTML format. The description of each tab and the HTML report can be found below.

Messaging

This check detects whether the components are running and can exchange messages. It performs a ping test between the components of a High Availability installation on each server. The list shows the result with the indication of the transfer time for each combination of the components. The transfer time indicates for the UMS HA whether ActiveMQ messaging is working or not within the subnet.

If you have a Distributed UMS installation, the results displayed under Messaging can be ignored since the UMS HA Health Check mainly checks the performance of the ActiveMQ messaging of the UMS High Availability (within the subnet). For the Distributed UMS, Messaging tab shows the messaging delay over the database, which is approximately 30 seconds.

You can currently also ignore:

  • the Messaging results of the UMS HA Health Check if your UMS HA without IGEL UMS Load Balancers is installed in different subnets / cloud environment

  • error messages for Watchdogs if you have a UMS HA without IGEL UMS Load Balancers

The reasons why messaging between components is not possible are usually the following: 

  • One of the components is not running at all. 

  • The necessary ports, 61616 and 6155, are not open in the firewall. See IGEL UMS Communication Ports.

  • The system time on the servers differs a lot.

To avoid problems with your HA installation, make sure that the time on the servers of the HA network does not differ by more than one minute. After each manual time reset, the HA services on the relevant server must be restarted.

  • The IGEL network token differs between the components. For example, this can happen due to the generating of a new IGEL network token, instead of using the network token initially created during the installation of the first UMS Server when further UMS Servers / UMS Load Balancers are installed within a HA network.

WebDav

This check examines whether the UMS Servers can exchange files via WebDav. WebDav is mandatory for the synchronization of files between the UMS Servers. See also Which Files Are Automatically Synchronized between the IGEL UMS Servers?.

Possible reasons for failure are the following:

  • One of the components is not running at all.

  • WebDav port 8443 is not open in the firewall.

Port 30001

Port 30001 is used for connections between the devices and the UMS Load Balancer. As the test cannot mimic a device, the UMS Servers try to connect to the UMS Load Balancer via port 30001.

Possible reasons for failure are the following:

  • One of the components is not running at all.

  • Port 30001 is not open in the firewall.

Port 30002

Port 30002 is used by the UMS Load Balancer for forwarding requests from the device to the UMS Server.

Possible reasons for failure are the following:

  • One of the components is not running at all.

  • Port 30002 is not open in the firewall.

Certificates

This check compares the certificates stored on the UMS Server with those stored on the UMS Load Balancer.

A possible reason for failure can be the following:

  • Failure in communication between the components due to the differing IGEL network tokens, see the above section "Messaging".

More Checks

If other problems are detected, the corresponding results and recommendations are displayed here.

Detailed Report

A detailed report generated in HTML format upon the click on the Show Details button provides some additional information. 

Roles: Based on the results, the check shows which roles are possible for the servers.

Example:

Config Info: Shows the configuration information as provided by the processes. For a UMS Load Balancer, i.e. UMS broker process, the known servers of this Load Balancer are shown.

Process Info: Provides an overview of the processes.

Certificate Fingerprints: Shows fingerprints of the certificates stored in the database on the UMS Server and the tc.keystore file on the UMS Load Balancer.