Skip to main content

Overview

We use a set of services to monitor NDIP state and trigger alerts in case there are problems. We monitor system services, docker containers, disk space, system tests, tools tests, etc.

Below is a list of services that are used to provide monitoring:

ServiceHostConfiguration
Node Exporterson each VM we deployhttps://github.com/neutrons/post_processing_agent
Prometheus Stackprometheus_push_gatewayAnsible playbook
Slackslack.comAnsible playbook

Further details about each service are provided in the corresponding subsections.

What is monitored

MetricSource
Systemd servicesNode Exporter
Disk spaceNode Exporter
Docker response time & number of running containersPush Gateway, Docker metrics
Web servicesBlack Box
System testsPush Gateway
Tool testsPush Gateway

Take a look at the Alert rules for more details