Prometheus 监控方案学习笔记(十一):Prometheus 常用告警规则整理
huty
2022年12月02日 · 阅读 1,771
配置告警规则
第一步: 编辑 Prometheus 的 prometheus.yml
文件,在 rule_files
中配置告警规则文件路径
示例如下,若根据此示例配置,则 Prometheus 会将 /etc/prometheus/rules/
目录下的全部 yaml
文件加载为告警规则:
rule_files:
- "/etc/prometheus/rules/*.yml"
第二步: 编辑对应的告警规则文件
告警规则模板
参考网站:
常用告警规则
注意: 以下告警规则中的 type="xxx"
为自定义标签,使用时需要根据实际情况进行修改
Prometheus Server
groups:
- name: Prometheus
rules:
- alert: Prometheus 连接失败
expr: up{type="prometheus"} == 0
for: 30s
labels:
severity: emergency
annotations:
summary: Prometheus ({{ $labels.instance }}) 连接失败
description: "Prometheus {{ $labels.instance }} 连接失败!"
黑盒监控
groups:
- name: Blackbox
rules:
- alert: Blackbox 连接失败
expr: probe_success == 0
for: 30s
labels:
severity: emergency
annotations:
summary: Blackbox ({{ $labels.instance }}) 连接失败
description: "Blackbox {{ $labels.instance }} 连接失败!"
Linux 主机( Node )
groups:
- name: Node
rules:
- alert: 主机连接失败
expr: up{type="node"} == 0
for: 30s
labels:
severity: emergency
annotations:
summary: 主机 ({{ $labels.instance }}) 连接失败
description: "主机 {{ $labels.instance }} 连接失败!"
- alert: 主机 CPU 负载过高
expr: sum by (instance) (avg by (mode, instance) (rate(node_cpu_seconds_total{mode!="idle"}[2m]))) > 0.8
for: 0m
labels:
severity: warning
annotations:
summary: 主机 ({{ $labels.instance }}) CPU 负载过高
description: "主机 {{ $labels.instance }} CPU 负载高于 80%!"
- alert: 主机内存不足
expr: node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes * 100 < 20
for: 2m
labels:
severity: warning
annotations:
summary: 主机 ({{ $labels.instance }}) 内存不足
description: "主机 {{ $labels.instance }} 内存剩余不足20%!"
- alert: 主机磁盘不足
expr: (node_filesystem_avail_bytes * 100) / node_filesystem_size_bytes < 20 and ON (instance, device, mountpoint) node_filesystem_readonly == 0
for: 2m
labels:
severity: warning
annotations:
summary: 主机 ({{ $labels.instance }}) 磁盘不足
description: "主机 {{ $labels.instance }} 磁盘剩余不足20%!"
Docker
groups:
- name: Docker
rules:
- alert: Docker 连接失败
expr: up{type="docker"} == 0
for: 30s
labels:
severity: emergency
annotations:
summary: Docker ({{ $labels.instance }}) 连接失败
description: "Docker {{ $labels.instance }} 连接失败!"
Mysql
groups:
- name: MySQL
rules:
- alert: MySQL 连接失败
expr: up{type="mysql"} == 0
for: 30s
labels:
severity: emergency
annotations:
summary: MySQL ({{ $labels.instance }}) 连接失败
description: "MySQL {{ $labels.instance }} 连接失败!"
MongoDB
groups:
- name: MongoDB
rules:
- alert: MongoDB 连接失败
expr: up{type="mongodb"} == 0
for: 30s
labels:
severity: emergency
annotations:
summary: MongoDB ({{ $labels.instance }}) 连接失败
description: "MongoDB {{ $labels.instance }} 连接失败!"
Redis
groups:
- name: Redis
rules:
- alert: Redis 连接失败
expr: up{type="redis"} == 0
for: 30s
labels:
severity: emergency
annotations:
summary: Redis ({{ $labels.instance }}) 连接失败
description: "Redis {{ $labels.instance }} 连接失败!"
RabbitMQ
groups:
- name: RabbitMQ
rules:
- alert: RabbitMQ 连接失败
expr: up{type="rabbitmq"} == 0
for: 30s
labels:
severity: emergency
annotations:
summary: RabbitMQ ({{ $labels.instance }}) 连接失败
description: "RabbitMQ {{ $labels.instance }} 连接失败!"
ElasticSearch
groups:
- name: ElasticSearch
rules:
- alert: ElasticSearch 连接失败
expr: up{type="elasticsearch"} == 0
for: 30s
labels:
severity: emergency
annotations:
summary: ElasticSearch ({{ $labels.instance }}) 连接失败
description: "ElasticSearch {{ $labels.instance }} 连接失败!"
Zookeeper
groups:
- name: Zookeeper
rules:
- alert: Zookeeper 连接失败
expr: up{type="zookeeper"} == 0
for: 30s
labels:
severity: emergency
annotations:
summary: Zookeeper ({{ $labels.instance }}) 连接失败
description: "Zookeeper {{ $labels.instance }} 连接失败!"
分类:
Prometheus 监控体系
标签:
Prometheus
评论已关闭