Prometheus 监控方案学习笔记（十一）：Prometheus 常用告警规则整理

huty 2022年12月02日 · 阅读 3,155

配置告警规则

第一步： 编辑 Prometheus 的 prometheus.yml 文件，在 rule_files 中配置告警规则文件路径
示例如下，若根据此示例配置，则 Prometheus 会将 /etc/prometheus/rules/ 目录下的全部 yaml 文件加载为告警规则：

rule_files:
  - "/etc/prometheus/rules/*.yml"

第二步： 编辑对应的告警规则文件

告警规则模板

参考网站：

常用告警规则

注意： 以下告警规则中的 type="xxx" 为自定义标签，使用时需要根据实际情况进行修改

Prometheus Server

groups:
- name: Prometheus
  rules:
  - alert: Prometheus 连接失败
    expr: up{type="prometheus"} == 0
    for: 30s
    labels:
      severity: emergency
    annotations:
      summary: Prometheus ({{ $labels.instance }}) 连接失败
      description: "Prometheus {{ $labels.instance }} 连接失败!"

黑盒监控

groups:
- name: Blackbox
  rules:
  - alert: Blackbox 连接失败
    expr: probe_success == 0
    for: 30s
    labels:
      severity: emergency
    annotations:
      summary: Blackbox ({{ $labels.instance }}) 连接失败
      description: "Blackbox {{ $labels.instance }} 连接失败!"

Linux 主机（ Node ）

groups:
- name: Node
  rules:
  - alert: 主机连接失败
    expr: up{type="node"} == 0
    for: 30s
    labels:
      severity: emergency
    annotations:
      summary: 主机 ({{ $labels.instance }}) 连接失败
      description: "主机 {{ $labels.instance }} 连接失败!"
  - alert: 主机 CPU 负载过高
    expr: sum by (instance) (avg by (mode, instance) (rate(node_cpu_seconds_total{mode!="idle"}[2m]))) > 0.8
    for: 0m
    labels:
      severity: warning
    annotations:
      summary: 主机 ({{ $labels.instance }}) CPU 负载过高
      description: "主机 {{ $labels.instance }} CPU 负载高于 80%!"
  - alert: 主机内存不足
    expr: node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes * 100 < 20
    for: 2m
    labels:
      severity: warning
    annotations:
      summary: 主机 ({{ $labels.instance }}) 内存不足
      description: "主机 {{ $labels.instance }} 内存剩余不足20%!"
  - alert: 主机磁盘不足
    expr: (node_filesystem_avail_bytes * 100) / node_filesystem_size_bytes < 20 and ON (instance, device, mountpoint) node_filesystem_readonly == 0
    for: 2m
    labels:
      severity: warning
    annotations:
      summary: 主机 ({{ $labels.instance }}) 磁盘不足
      description: "主机 {{ $labels.instance }} 磁盘剩余不足20%!"

Docker

groups:
- name: Docker
  rules:
  - alert: Docker 连接失败
    expr: up{type="docker"} == 0
    for: 30s
    labels:
      severity: emergency
    annotations:
      summary: Docker ({{ $labels.instance }}) 连接失败
      description: "Docker {{ $labels.instance }} 连接失败!"

Mysql

groups:
- name: MySQL
  rules:
  - alert: MySQL 连接失败
    expr: up{type="mysql"} == 0
    for: 30s
    labels:
      severity: emergency
    annotations:
      summary: MySQL ({{ $labels.instance }}) 连接失败
      description: "MySQL {{ $labels.instance }} 连接失败!"

MongoDB

groups:
- name: MongoDB
  rules:
  - alert: MongoDB 连接失败
    expr: up{type="mongodb"} == 0
    for: 30s
    labels:
      severity: emergency
    annotations:
      summary: MongoDB ({{ $labels.instance }}) 连接失败
      description: "MongoDB {{ $labels.instance }} 连接失败!"

Redis

groups:
- name: Redis
  rules:
  - alert: Redis 连接失败
    expr: up{type="redis"} == 0
    for: 30s
    labels:
      severity: emergency
    annotations:
      summary: Redis ({{ $labels.instance }}) 连接失败
      description: "Redis {{ $labels.instance }} 连接失败!"

RabbitMQ

groups:
- name: RabbitMQ
  rules:
  - alert: RabbitMQ 连接失败
    expr: up{type="rabbitmq"} == 0
    for: 30s
    labels:
      severity: emergency
    annotations:
      summary: RabbitMQ ({{ $labels.instance }}) 连接失败
      description: "RabbitMQ {{ $labels.instance }} 连接失败!"

ElasticSearch

groups:
- name: ElasticSearch
  rules:
  - alert: ElasticSearch 连接失败
    expr: up{type="elasticsearch"} == 0
    for: 30s
    labels:
      severity: emergency
    annotations:
      summary: ElasticSearch ({{ $labels.instance }}) 连接失败
      description: "ElasticSearch {{ $labels.instance }} 连接失败!"

Zookeeper

groups:
- name: Zookeeper
  rules:
  - alert: Zookeeper 连接失败
    expr: up{type="zookeeper"} == 0
    for: 30s
    labels:
      severity: emergency
    annotations:
      summary: Zookeeper ({{ $labels.instance }}) 连接失败
      description: "Zookeeper {{ $labels.instance }} 连接失败!"

分类: Prometheus 监控体系

标签: Prometheus

本文作者：huty

本文链接： https://hty1024.com/archives/prometheus-jian-kong-fang-an-xue-xi-bi-ji--shi-yi-prometheus-chang-yong-gao-jing-gui-ze-zheng-li