menu 学习笔记
search self_improvement
目录

Prometheus 监控方案学习笔记(十一):Prometheus 常用告警规则整理

huty
huty 2022年12月02日  ·  阅读 1,357

配置告警规则

第一步: 编辑 Prometheus 的 prometheus.yml 文件,在 rule_files 中配置告警规则文件路径
示例如下,若根据此示例配置,则 Prometheus 会将 /etc/prometheus/rules/ 目录下的全部 yaml 文件加载为告警规则:

rule_files:
  - "/etc/prometheus/rules/*.yml" 

第二步: 编辑对应的告警规则文件

告警规则模板

参考网站:

常用告警规则

注意: 以下告警规则中的 type="xxx" 为自定义标签,使用时需要根据实际情况进行修改

Prometheus Server

groups:
- name: Prometheus
  rules:
  - alert: Prometheus 连接失败
    expr: up{type="prometheus"} == 0
    for: 30s
    labels:
      severity: emergency
    annotations:
      summary: Prometheus ({{ $labels.instance }}) 连接失败
      description: "Prometheus {{ $labels.instance }} 连接失败!"

黑盒监控

groups:
- name: Blackbox
  rules:
  - alert: Blackbox 连接失败
    expr: probe_success == 0
    for: 30s
    labels:
      severity: emergency
    annotations:
      summary: Blackbox ({{ $labels.instance }}) 连接失败
      description: "Blackbox {{ $labels.instance }} 连接失败!"

Linux 主机( Node )

groups:
- name: Node
  rules:
  - alert: 主机连接失败
    expr: up{type="node"} == 0
    for: 30s
    labels:
      severity: emergency
    annotations:
      summary: 主机 ({{ $labels.instance }}) 连接失败
      description: "主机 {{ $labels.instance }} 连接失败!"
  - alert: 主机 CPU 负载过高
    expr: sum by (instance) (avg by (mode, instance) (rate(node_cpu_seconds_total{mode!="idle"}[2m]))) > 0.8
    for: 0m
    labels:
      severity: warning
    annotations:
      summary: 主机 ({{ $labels.instance }}) CPU 负载过高
      description: "主机 {{ $labels.instance }} CPU 负载高于 80%!"
  - alert: 主机内存不足
    expr: node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes * 100 < 20
    for: 2m
    labels:
      severity: warning
    annotations:
      summary: 主机 ({{ $labels.instance }}) 内存不足
      description: "主机 {{ $labels.instance }} 内存剩余不足20%!"
  - alert: 主机磁盘不足
    expr: (node_filesystem_avail_bytes * 100) / node_filesystem_size_bytes < 20 and ON (instance, device, mountpoint) node_filesystem_readonly == 0
    for: 2m
    labels:
      severity: warning
    annotations:
      summary: 主机 ({{ $labels.instance }}) 磁盘不足
      description: "主机 {{ $labels.instance }} 磁盘剩余不足20%!"

Docker

groups:
- name: Docker
  rules:
  - alert: Docker 连接失败
    expr: up{type="docker"} == 0
    for: 30s
    labels:
      severity: emergency
    annotations:
      summary: Docker ({{ $labels.instance }}) 连接失败
      description: "Docker {{ $labels.instance }} 连接失败!"

Mysql

groups:
- name: MySQL
  rules:
  - alert: MySQL 连接失败
    expr: up{type="mysql"} == 0
    for: 30s
    labels:
      severity: emergency
    annotations:
      summary: MySQL ({{ $labels.instance }}) 连接失败
      description: "MySQL {{ $labels.instance }} 连接失败!"

MongoDB

groups:
- name: MongoDB
  rules:
  - alert: MongoDB 连接失败
    expr: up{type="mongodb"} == 0
    for: 30s
    labels:
      severity: emergency
    annotations:
      summary: MongoDB ({{ $labels.instance }}) 连接失败
      description: "MongoDB {{ $labels.instance }} 连接失败!"

Redis

groups:
- name: Redis
  rules:
  - alert: Redis 连接失败
    expr: up{type="redis"} == 0
    for: 30s
    labels:
      severity: emergency
    annotations:
      summary: Redis ({{ $labels.instance }}) 连接失败
      description: "Redis {{ $labels.instance }} 连接失败!"

RabbitMQ

groups:
- name: RabbitMQ
  rules:
  - alert: RabbitMQ 连接失败
    expr: up{type="rabbitmq"} == 0
    for: 30s
    labels:
      severity: emergency
    annotations:
      summary: RabbitMQ ({{ $labels.instance }}) 连接失败
      description: "RabbitMQ {{ $labels.instance }} 连接失败!"

ElasticSearch

groups:
- name: ElasticSearch
  rules:
  - alert: ElasticSearch 连接失败
    expr: up{type="elasticsearch"} == 0
    for: 30s
    labels:
      severity: emergency
    annotations:
      summary: ElasticSearch ({{ $labels.instance }}) 连接失败
      description: "ElasticSearch {{ $labels.instance }} 连接失败!"

Zookeeper

groups:
- name: Zookeeper
  rules:
  - alert: Zookeeper 连接失败
    expr: up{type="zookeeper"} == 0
    for: 30s
    labels:
      severity: emergency
    annotations:
      summary: Zookeeper ({{ $labels.instance }}) 连接失败
      description: "Zookeeper {{ $labels.instance }} 连接失败!"

评论已关闭