Prometheus 监控方案学习笔记(十一):Prometheus 常用告警规则整理
                            huty
                            2022年12月02日  ·  阅读 4,775
                        
                    配置告警规则
第一步: 编辑 Prometheus 的 prometheus.yml 文件,在 rule_files 中配置告警规则文件路径
示例如下,若根据此示例配置,则 Prometheus 会将 /etc/prometheus/rules/ 目录下的全部 yaml 文件加载为告警规则:
rule_files:
  - "/etc/prometheus/rules/*.yml" 
第二步: 编辑对应的告警规则文件
告警规则模板
参考网站:
常用告警规则
注意: 以下告警规则中的 type="xxx" 为自定义标签,使用时需要根据实际情况进行修改
Prometheus Server
groups:
- name: Prometheus
  rules:
  - alert: Prometheus 连接失败
    expr: up{type="prometheus"} == 0
    for: 30s
    labels:
      severity: emergency
    annotations:
      summary: Prometheus ({{ $labels.instance }}) 连接失败
      description: "Prometheus {{ $labels.instance }} 连接失败!"
黑盒监控
groups:
- name: Blackbox
  rules:
  - alert: Blackbox 连接失败
    expr: probe_success == 0
    for: 30s
    labels:
      severity: emergency
    annotations:
      summary: Blackbox ({{ $labels.instance }}) 连接失败
      description: "Blackbox {{ $labels.instance }} 连接失败!"
Linux 主机( Node )
groups:
- name: Node
  rules:
  - alert: 主机连接失败
    expr: up{type="node"} == 0
    for: 30s
    labels:
      severity: emergency
    annotations:
      summary: 主机 ({{ $labels.instance }}) 连接失败
      description: "主机 {{ $labels.instance }} 连接失败!"
  - alert: 主机 CPU 负载过高
    expr: sum by (instance) (avg by (mode, instance) (rate(node_cpu_seconds_total{mode!="idle"}[2m]))) > 0.8
    for: 0m
    labels:
      severity: warning
    annotations:
      summary: 主机 ({{ $labels.instance }}) CPU 负载过高
      description: "主机 {{ $labels.instance }} CPU 负载高于 80%!"
  - alert: 主机内存不足
    expr: node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes * 100 < 20
    for: 2m
    labels:
      severity: warning
    annotations:
      summary: 主机 ({{ $labels.instance }}) 内存不足
      description: "主机 {{ $labels.instance }} 内存剩余不足20%!"
  - alert: 主机磁盘不足
    expr: (node_filesystem_avail_bytes * 100) / node_filesystem_size_bytes < 20 and ON (instance, device, mountpoint) node_filesystem_readonly == 0
    for: 2m
    labels:
      severity: warning
    annotations:
      summary: 主机 ({{ $labels.instance }}) 磁盘不足
      description: "主机 {{ $labels.instance }} 磁盘剩余不足20%!"
Docker
groups:
- name: Docker
  rules:
  - alert: Docker 连接失败
    expr: up{type="docker"} == 0
    for: 30s
    labels:
      severity: emergency
    annotations:
      summary: Docker ({{ $labels.instance }}) 连接失败
      description: "Docker {{ $labels.instance }} 连接失败!"
Mysql
groups:
- name: MySQL
  rules:
  - alert: MySQL 连接失败
    expr: up{type="mysql"} == 0
    for: 30s
    labels:
      severity: emergency
    annotations:
      summary: MySQL ({{ $labels.instance }}) 连接失败
      description: "MySQL {{ $labels.instance }} 连接失败!"
MongoDB
groups:
- name: MongoDB
  rules:
  - alert: MongoDB 连接失败
    expr: up{type="mongodb"} == 0
    for: 30s
    labels:
      severity: emergency
    annotations:
      summary: MongoDB ({{ $labels.instance }}) 连接失败
      description: "MongoDB {{ $labels.instance }} 连接失败!"
Redis
groups:
- name: Redis
  rules:
  - alert: Redis 连接失败
    expr: up{type="redis"} == 0
    for: 30s
    labels:
      severity: emergency
    annotations:
      summary: Redis ({{ $labels.instance }}) 连接失败
      description: "Redis {{ $labels.instance }} 连接失败!"
RabbitMQ
groups:
- name: RabbitMQ
  rules:
  - alert: RabbitMQ 连接失败
    expr: up{type="rabbitmq"} == 0
    for: 30s
    labels:
      severity: emergency
    annotations:
      summary: RabbitMQ ({{ $labels.instance }}) 连接失败
      description: "RabbitMQ {{ $labels.instance }} 连接失败!"
ElasticSearch
groups:
- name: ElasticSearch
  rules:
  - alert: ElasticSearch 连接失败
    expr: up{type="elasticsearch"} == 0
    for: 30s
    labels:
      severity: emergency
    annotations:
      summary: ElasticSearch ({{ $labels.instance }}) 连接失败
      description: "ElasticSearch {{ $labels.instance }} 连接失败!"
Zookeeper
groups:
- name: Zookeeper
  rules:
  - alert: Zookeeper 连接失败
    expr: up{type="zookeeper"} == 0
    for: 30s
    labels:
      severity: emergency
    annotations:
      summary: Zookeeper ({{ $labels.instance }}) 连接失败
      description: "Zookeeper {{ $labels.instance }} 连接失败!"
                分类:
                                Prometheus 监控体系
                    
                    标签:
                                Prometheus
                    
                
评论已关闭