實戰 Prometheus 搭建監控系統

前言：Prometheus已經成為了事實上的監控標準，本篇文章是根據參考書籍《Prometheus監控技術與實踐》、《深入淺出Prometheus》兩本書籍的介紹搭建普羅米修斯系統。

：Prometheus已經成為了事實上的監控標準。是一款基於時序資料庫的開源監控告警系統，說起 Prometheus 則不得不提 SoundCloud，這是一個線上音樂分享的平臺，類似於做視訊分享的 YouTube，由於他們在微服務架構的道路上越走越遠，出現了成百上千的服務，使用傳統的監控系統 StatsD 和 Graphite 存在大量的侷限性，於是他們在 2012 年開始著手開發一套全新的監控系統。Prometheus 的原作者是 Matt T. Proud，他也是在 2012 年加入 SoundCloud 的，實際上，在加入 SoundCloud 之前，Matt 一直就職於 Google，他從 Google 的叢集管理器 Borg 和它的監控系統 Borgmon 中獲取靈感，開發了開源的監控系統 Prometheus，和 Google 的很多專案一樣，使用的程式語言是 Go。

很顯然，Prometheus 作為一個微服務架構監控系統的解決方案，它和容器也脫不開關係。早在 2006 年 8 月 9 日，Eric Schmidt 在搜尋引擎大會上首次提出了雲端計算（Cloud Computing）的概念，在之後的十幾年裡，雲端計算的發展勢如破竹。在 2013 年，Pivotal 的 Matt Stine 又提出了雲原生（Cloud Native）的概念，雲原生由微服務架構、DevOps 和以容器為代表的敏捷基礎架構組成，幫助企業快速、持續、可靠、規模化地交付軟體。為了統一雲端計算介面和相關標準，2015 年 7 月，隸屬於 Linux 基金會的雲原生計算基金會（CNCF，Cloud Native Computing Foundation）應運而生。第一個加入 CNCF 的專案是 Google 的 Kubernetes，而 Prometheus 是第二個加入的（2016 年）。

目前 Prometheus 已經廣泛用於 Kubernetes 叢集的監控系統中，對 Prometheus 的歷史感興趣的同學可以看看 SoundCloud 的工程師 Tobias Schmidt 在 2016 年的 PromCon 大會上的演講：The History of Prometheus at SoundCloud 。

參考書籍《Prometheus監控技術與實踐》、《深入淺出Prometheus》
連結: https://pan.baidu.com/s/1qmdi... 提取碼: n82k
https://pan.baidu.com/s/1pM7_... 提取碼: 6eiw

一、Prometheus 概述

我們在 SoundCloud 的官方部落格中可以找到一篇關於他們為什麼需要新開發一個監控系統的文章 Prometheus: Monitoring at SoundCloud，在這篇文章中，他們介紹到，他們需要的監控系統必須滿足下面四個特性：

A multi-dimensional data model, so that data can be sliced and diced at will, along dimensions like instance, service, endpoint, and method.
Operational simplicity, so that you can spin up a monitoring server where and when you want, even on your local workstation, without setting up a distributed storage backend or reconfiguring the world.
Scalable data collection and decentralized architecture, so that you can reliably monitor the many instances of your services, and independent teams can set up independent monitoring servers.
Finally, a powerful query language that leverages the data model for meaningful alerting (including easy silencing) and graphing (for dashboards and for ad-hoc exploration).

簡單來說，就是下面四個特性：

多維度資料模型
方便的部署和維護
靈活的資料採集
強大的查詢語言

實際上，多維度資料模型和強大的查詢語言這兩個特性，正是時序資料庫所要求的，所以 Prometheus 不僅僅是一個監控系統，同時也是一個時序資料庫。那為什麼 Prometheus 不直接使用現有的時序資料庫作為後端儲存呢？這是因為 SoundCloud 不僅希望他們的監控系統有著時序資料庫的特點，而且還需要部署和維護非常方便。縱觀比較流行的時序資料庫（參見下面的附錄），他們要麼元件太多，要麼外部依賴繁重，比如：Druid 有 Historical、MiddleManager、Broker、Coordinator、Overlord、Router 一堆的元件，而且還依賴於 ZooKeeper、Deep storage（HDFS 或 S3 等），Metadata store（PostgreSQL 或 MySQL），部署和維護起來成本非常高。而 Prometheus 採用去中心化架構，可以獨立部署，不依賴於外部的分散式儲存，你可以在幾分鐘的時間裡就可以搭建出一套監控系統。

此外，Prometheus 資料採集方式也非常靈活。要採集目標的監控資料，首先需要在目標處安裝資料採集元件，這被稱之為 Exporter，它會在目標處收集監控資料，並暴露出一個 HTTP 介面供 Prometheus 查詢，Prometheus 通過 Pull 的方式來採集資料，這和傳統的 Push 模式不同。不過 Prometheus 也提供了一種方式來支援 Push 模式，你可以將你的資料推播到 Push Gateway，Prometheus 通過 Pull 的方式從 Push Gateway 獲取資料。目前的 Exporter 已經可以採集絕大多數的第三方資料，比如 Docker、HAProxy、StatsD、JMX 等等，官網有一份 Exporter 的列表。

除了這四大特性，隨著 Prometheus 的不斷髮展，開始支援越來越多的高階特性，比如：服務發現，更豐富的圖表展示，使用外部儲存，強大的告警規則和多樣的通知方式。下圖是 Prometheus 的整體架構圖（圖片來源）：

從上圖可以看出，Prometheus 生態系統包含了幾個關鍵的元件：Prometheus server、Pushgateway、Alertmanager、Web UI 等，但是大多陣列件都不是必需的，其中最核心的元件當然是 Prometheus server，它負責收集和儲存指標資料，支援表示式查詢，和告警的生成。接下來我們就來安裝 Prometheus server。

二、安裝 Prometheus server

Prometheus 可以支援多種安裝方式，包括 Docker、Ansible、Chef、Puppet、Saltstack 等。下面介紹最簡單的兩種方式，一種是直接使用編譯好的可執行檔案，開箱即用，另一種是使用 Docker 映象，更多的安裝方式可以參考這裡。

2.1 開箱即用

首先從官網的下載頁面獲取 Prometheus 的最新版本和下載地址，目前最新版本是 2.4.3（2018年10月），執行下面的命令下載並解壓：

$ wget [https://github.com/prometheus/prometheus/releases/download/v2.4.3/prometheus-2.4.3.linux-amd64.tar.gz](https://github.com/prometheus/prometheus/releases/download/v2.4.3/prometheus-2.4.3.linux-amd64.tar.gz)

$ tar xvfz prometheus-2.4.3.linux-amd64.tar.gz

然後切換到解壓目錄，檢查 Prometheus 版本：

$ cd prometheus-2.4.3.linux-amd64

$ ./prometheus --version

prometheus, version 2.4.3 (branch: HEAD, revision: 167a4b4e73a8eca8df648d2d2043e21bdb9a7449)

build user: root@1e42b46043e9

build date: 20181004-08:42:02

go version: go1.11.1

執行 Prometheus server：

$ ./prometheus --config.file=prometheus.yml

2.2 使用 Docker 映象

使用 Docker 安裝 Prometheus 更簡單，執行下面的命令即可：

$ sudo docker run -d -p 9090:9090 prom/prometheus

一般情況下，我們還會指定組態檔的位置：

$ sudo docker run -d -p 9090:9090

-v ~/docker/prometheus/:/etc/prometheus/

prom/prometheus

我們把組態檔放在本地 ~/docker/prometheus/prometheus.yml，這樣可以方便編輯和檢視，通過 -v 引數將原生的組態檔掛載到 /etc/prometheus/ 位置，這是 prometheus 在容器中預設載入的組態檔位置。如果我們不確定預設的組態檔在哪，可以先執行上面的不帶 -v 引數的命令，然後通過 docker inspect 命名看看容器在執行時預設的引數有哪些（下面的 Args 引數）：

$ sudo docker inspect 0c

[...]

"Id": "0c4c2d0eed938395bcecf1e8bb4b6b87091fc4e6385ce5b404b6bb7419010f46",

"Created": "2018-10-15T22:27:34.56050369Z",

"Path": "/bin/prometheus",

"Args": [

"--config.file=/etc/prometheus/prometheus.yml",

"--storage.tsdb.path=/prometheus",

"--web.console.libraries=/usr/share/prometheus/console_libraries",

"--web.console.templates=/usr/share/prometheus/consoles"

],

[...]

2.3 設定 Prometheus

正如上面兩節看到的，Prometheus 有一個組態檔，通過引數 --config.file 來指定，組態檔格式為 YAML。我們可以開啟預設的組態檔 prometheus.yml 看下里面的內容：

/etc/prometheus $ cat prometheus.yml

# my global config

global:

scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.

evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.

# scrape_timeout is set to the global default (10s).

# Alertmanager configuration

alerting:

alertmanagers:

- static_configs:

- targets:

# - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.

rule_files:

# - "first_rules.yml"

# - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:

# Here it's Prometheus itself.

scrape_configs:

# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.

- job_name: 'prometheus'

# metrics_path defaults to '/metrics'

# scheme defaults to 'http'.

static_configs:

- targets: ['localhost:9090']

Prometheus 預設的組態檔分為四大塊：

global 塊：Prometheus 的全域性設定，比如 scrape_interval 表示 Prometheus 多久抓取一次資料，evaluation_interval 表示多久檢測一次告警規則；
alerting 塊：關於 Alertmanager 的設定，這個我們後面再看；
rule_files 塊：告警規則，這個我們後面再看；
scrape_config 塊：這裡定義了 Prometheus 要抓取的目標，我們可以看到預設已經設定了一個名稱為 prometheus 的 job，這是因為 Prometheus 在啟動的時候也會通過 HTTP 介面暴露自身的指標資料，這就相當於 Prometheus 自己監控自己，雖然這在真正使用 Prometheus 時沒啥用處，但是我們可以通過這個例子來學習如何使用 Prometheus；可以存取 http://localhost:9090/metrics 檢視 Prometheus 暴露了哪些指標；

更多的設定引數可以參考這裡。

三、學習 PromQL

通過上面的步驟安裝好 Prometheus 之後，我們現在可以開始體驗 Prometheus 了。Prometheus 提供了視覺化的 Web UI 方便我們操作，直接存取 http://localhost:9090/ 即可，它預設會跳轉到 Graph 頁面：

第一次存取這個頁面可能會不知所措，我們可以先看看其他選單下的內容，比如：Alerts 展示了定義的所有告警規則，Status 可以檢視各種 Prometheus 的狀態資訊，有 Runtime & Build Information、Command-Line Flags、Configuration、Rules、Targets、Service Discovery 等等。

實際上 Graph 頁面才是 Prometheus 最強大的功能，在這裡我們可以使用 Prometheus 提供的一種特殊表示式來查詢監控資料，這個表示式被稱為 PromQL（Prometheus Query Language）。通過 PromQL 不僅可以在 Graph 頁面查詢資料，而且還可以通過 Prometheus 提供的 HTTP API 來查詢。查詢的監控資料有列表和曲線圖兩種展現形式（對應上圖中 Console 和 Graph 這兩個標籤）。

我們上面說過，Prometheus 自身也暴露了很多的監控指標，也可以在 Graph 頁面查詢，展開 Execute 按鈕旁邊的下拉框，可以看到很多指標名稱，我們隨便選一個，譬如：promhttp_metric_handler_requests_total，這個指標表示 /metrics 頁面的存取次數，Prometheus 就是通過這個頁面來抓取自身的監控資料的。在 Console 標籤中查詢結果如下：

上面在介紹 Prometheus 的組態檔時，可以看到 scrape_interval 引數是 15s，也就是說 Prometheus 每 15s 存取一次 /metrics 頁面，所以我們過 15s 重新整理下頁面，可以看到指標值會自增。在 Graph 標籤中可以看得更明顯：

3.1 資料模型

要學習 PromQL，首先我們需要了解下 Prometheus 的資料模型，一條 Prometheus 資料由一個指標名稱（metric）和 N 個標籤（label，N >= 0）組成的，比如下面這個例子：

promhttp_metric_handler_requests_total{code="200",instance="192.168.0.107:9090",job="prometheus"} 106

這條資料的指標名稱為 promhttp_metric_handler_requests_total，並且包含三個標籤 code、instance 和 job，這條記錄的值為 106。上面說過，Prometheus 是一個時序資料庫，相同指標相同標籤的資料構成一條時間序列。如果以傳統資料庫的概念來理解時序資料庫，可以把指標名當作表名，標籤是欄位，timestamp 是主鍵，還有一個 float64 型別的欄位表示值（Prometheus 裡面所有值都是按 float64 儲存）。

這種資料模型和 OpenTSDB 的資料模型是比較類似的，詳細的資訊可以參考官網檔案 Data model。另外，關於指標和標籤的命名，官網有一些指導性的建議，可以參考 Metric and label naming 。

雖然 Prometheus 裡儲存的資料都是 float64 的一個數值，但如果我們按型別來分，可以把 Prometheus 的資料分成四大類：

Counter
Gauge
Histogram
Summary

Counter 用於計數，例如：請求次數、任務完成數、錯誤發生次數，這個值會一直增加，不會減少。Gauge 就是一般的數值，可大可小，例如：溫度變化、記憶體使用變化。Histogram 是直方圖，或稱為柱狀圖，常用於跟蹤事件發生的規模，例如：請求耗時、響應大小。它特別之處是可以對記錄的內容進行分組，提供 count 和 sum 的功能。Summary 和 Histogram 十分相似，也用於跟蹤事件發生的規模，不同之處是，它提供了一個 quantiles 的功能，可以按百分比劃分跟蹤的結果。例如：quantile 取值 0.95，表示取取樣值裡面的 95% 資料。更多資訊可以參考官網檔案 Metric types，Summary 和 Histogram 的概念比較容易混淆，屬於比較高階的指標型別，可以參考 Histograms and summaries 這裡的說明。

這四種型別的資料只在指標的提供方作區分，也就是上面說的 Exporter，如果你需要編寫自己的 Exporter 或者在現有系統中暴露供 Prometheus 抓取的指標，你可以使用 Prometheus client libraries，這個時候你就需要考慮不同指標的資料型別了。如果你不用自己實現，而是直接使用一些現成的 Exporter，然後在 Prometheus 裡查查相關的指標資料，那麼可以不用太關注這塊，不過理解 Prometheus 的資料型別，對寫出正確合理的 PromQL 也是有幫助的。