Prometheus Stack
Prometheus is an open-source systems monitoring and alerting toolkit that collects and stores metrics as time series data. This means metrics information is stored with the timestamp at which it was recorded, along with optional key-value pairs.
The stack consists of four main services:
- Prometheus – Collects metrics and triggers alerts.
- AlertManager – Sends alerts to users.
- Prometheus Push Gateway – Allows push metrics via REST API.
- Prometheus Black Box – Allows blackbox probing of endpoints over HTTP, HTTPS.
Deployment
The Prometheus Stack runs on a Kubernetes cluster (currently a single node) in the ORC cloud. It is deployed via a CI/CD job using a Helm chart.
To run this job, go to GitLab Pipelines and execute the prometheus or prometheus push gateway job stage.
- The SSH key for the VM can be found in GitLab CI/CD Variables. You may need to request access.
- The Kubernetes config file is also available in GitLab CI/CD Variables. :::
Host
The Kubernetes cluster is provisioned through our infrastructure repository. The VM's IP address can be found in the Ansible variable prometheus_push_gateway.
The current infrastructure repository lacks proper support and documentation. In the future, it should be replaced by a monorepo or transitioned to a production Kubernetes cluster. :::
Deployment Details
Deployment is managed through our deployment monorepo in the monitoring folder.
There are three key files to update:
Take a look how the current alerts are created and modify/add new ones in a similar manner.
Prometheus Push Gateway
The Pushgateway is an intermediary service for pushing metrics from jobs that cannot be scraped directly.
Push Gateway runs as a web
service (see address here),
providing a dashboard to view the pushed metrics. Instead of exposing a /metrics
endpoint, it allows metrics to be
pushed manually.
We use it to send results from system and tool tests using the prometheus_client Python package.
Prometheus Black Box
The Blackbox Exporter allows black-box probing of endpoints over HTTP, HTTPS, DNS, TCP, ICMP, and gRPC. It is installed alongside the rest of the Prometheus stack and is used to monitor web services that do not expose metrics (or as a supplement to existing metrics). For example, it can check whether a service returns an HTTP status code 200.