Kubernetes(k8s)健康性檢查:livenessprobe探測和readinessprobe探測

2023-06-01 15:00:36

一.系統環境

本文主要基於Kubernetes1.21.9和Linux作業系統CentOS7.4。

伺服器版本 docker軟體版本 Kubernetes(k8s)叢集版本 CPU架構
CentOS Linux release 7.4.1708 (Core) Docker version 20.10.12 v1.21.9 x86_64

Kubernetes叢集架構:k8scloude1作為master節點,k8scloude2,k8scloude3作為worker節點

伺服器 作業系統版本 CPU架構 程序 功能描述
k8scloude1/192.168.110.130 CentOS Linux release 7.4.1708 (Core) x86_64 docker,kube-apiserver,etcd,kube-scheduler,kube-controller-manager,kubelet,kube-proxy,coredns,calico k8s master節點
k8scloude2/192.168.110.129 CentOS Linux release 7.4.1708 (Core) x86_64 docker,kubelet,kube-proxy,calico k8s worker節點
k8scloude3/192.168.110.128 CentOS Linux release 7.4.1708 (Core) x86_64 docker,kubelet,kube-proxy,calico k8s worker節點

二.前言

在Kubernetes中,保證應用的高可用性和穩定性非常重要。為此,Kubernetes提供了一些機制來監視容器的狀態,並自動重啟或刪除不健康的容器。其中之一就是livenessprobe探測和readinessprobe探測。

本文將介紹Kubernetes中的livenessprobe探測和readinessprobe探測,並提供範例來演示如何使用它們。

使用livenessprobe探測和readinessprobe探測的前提是已經有一套可以正常執行的Kubernetes叢集,關於Kubernetes(k8s)叢集的安裝部署,可以檢視部落格《Centos7 安裝部署Kubernetes(k8s)叢集》https://www.cnblogs.com/renshengdezheli/p/16686769.html。

三.Kubernetes健康性檢查簡介

Kubernetes支援三種健康檢查,它們分別是:livenessprobe, readinessprobe 和 startupprobe。這些探針可以週期性地檢查容器內的服務是否處於健康狀態。

  • livenessprobe:用於檢查容器是否正在執行。如果容器內的服務不再響應,則Kubernetes會將其標記為Unhealthy狀態並嘗試重啟該容器。通過重啟來解決問題(重啟指的是刪除pod,然後建立一個相同的pod),方法有:command,httpGet,tcpSocket。
  • readinessprobe:用於檢查容器是否已準備好接收流量。當容器未準備好時,Kubernetes會將其標記為Not Ready狀態,並將其從Service endpoints中刪除。不重啟,把使用者傳送過來的請求不在轉發到此pod(需要用到service),方法有:command,httpGet,tcpSocket 。
  • startupprobe:用於檢查容器是否已經啟動並準備好接收請求。與readinessprobe類似,但只在容器啟動時執行一次

在本文中,我們將重點介紹livenessprobe探測和readinessprobe探測。

四.建立沒有探測機制的pod

建立存放yaml檔案的目錄和namespace

[root@k8scloude1 ~]# mkdir probe

[root@k8scloude1 ~]# kubectl create ns probe
namespace/probe created

[root@k8scloude1 ~]# kubens probe
Context "kubernetes-admin@kubernetes" modified.
Active namespace is "probe".

現在還沒有pod

[root@k8scloude1 ~]# cd probe/

[root@k8scloude1 probe]# pwd
/root/probe

[root@k8scloude1 probe]# kubectl get pod
No resources found in probe namespace.

先建立一個普通的pod,建立了一個名為liveness-exec的Pod,使用busybox映象來建立一個容器。該容器會執行args引數中的命令:touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 6000

[root@k8scloude1 probe]# vim pod.yaml
[root@k8scloude1 probe]# cat pod.yaml 
apiVersion: v1
kind: Pod
metadata:
  labels:
    test: liveness
  name: liveness-exec
spec:
  #terminationGracePeriodSeconds屬性,將其設定為0,意味著容器在接收到終止訊號時將立即關閉,而不會等待一段時間來完成未完成的工作。
  terminationGracePeriodSeconds: 0
  containers:
  - name: liveness
    image: busybox
    imagePullPolicy: IfNotPresent
    args:
    - /bin/sh
    - -c
    - touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 6000

#先建立一個普通的pod
[root@k8scloude1 probe]# kubectl apply -f pod.yaml 
pod/liveness-exec created

檢視pod

[root@k8scloude1 probe]# kubectl get pod -o wide
NAME            READY   STATUS    RESTARTS   AGE   IP               NODE         NOMINATED NODE   READINESS GATES
liveness-exec   1/1     Running   0          6s    10.244.112.176   k8scloude2   <none>           <none>

檢視pod裡的/tmp檔案

[root@k8scloude1 probe]# kubectl exec -it liveness-exec -- ls /tmp

pod執行30秒之後,/tmp/healthy檔案被刪除,pod還會繼續執行6000秒,/tmp/healthy檔案存在就判定pod正常,/tmp/healthy檔案不存在就判定pod異常,但是目前沒有探測機制,所以pod還是正在執行狀態。

[root@k8scloude1 probe]# kubectl exec -it liveness-exec -- ls /tmp

[root@k8scloude1 probe]# kubectl get pod -o wide
NAME            READY   STATUS    RESTARTS   AGE     IP               NODE         NOMINATED NODE   READINESS GATES
liveness-exec   1/1     Running   0          3m29s   10.244.112.176   k8scloude2   <none>           <none>

刪除pod,新增探測機制

[root@k8scloude1 probe]# kubectl delete -f pod.yaml 
pod "liveness-exec" deleted

[root@k8scloude1 probe]# kubectl get pod -o wide
No resources found in probe namespace.

五.新增livenessprobe探測

5.1 使用command的方式進行livenessprobe探測

建立具有livenessprobe探測的pod

建立了一個名為liveness-exec的Pod,使用busybox映象來建立一個容器。該容器會執行args引數中的命令:touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 600。

Pod還定義了一個名為livenessProbe的屬性來定義liveness探針。該探針使用exec檢查/tmp/healthy檔案是否存在。如果該檔案存在,則Kubernetes認為容器處於健康狀態;否則,Kubernetes將嘗試重啟該容器。

liveness探測將在容器啟動後5秒鐘開始,並每隔5秒鐘執行一次。

[root@k8scloude1 probe]# vim podprobe.yaml 

#現在加入健康檢查:command的方式
[root@k8scloude1 probe]# cat podprobe.yaml 
apiVersion: v1
kind: Pod
metadata:
  labels:
    test: liveness
  name: liveness-exec
spec:
  terminationGracePeriodSeconds: 0
  containers:
  - name: liveness
    image: busybox
    imagePullPolicy: IfNotPresent
    args:
    - /bin/sh
    - -c
    - touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 600
    livenessProbe:
      exec:
        command:
        - cat
        - /tmp/healthy
      #容器啟動的5秒內不監測
      initialDelaySeconds: 5
      #每5秒檢測一次
      periodSeconds: 5      

[root@k8scloude1 probe]# kubectl apply -f podprobe.yaml 
pod/liveness-exec created

觀察pod裡的/tmp檔案和pod狀態

[root@k8scloude1 probe]# kubectl exec -it liveness-exec -- ls /tmp
healthy

[root@k8scloude1 probe]# kubectl get pod -o wide
NAME            READY   STATUS    RESTARTS   AGE   IP               NODE         NOMINATED NODE   READINESS GATES
liveness-exec   1/1     Running   0          18s   10.244.112.177   k8scloude2   <none>           <none>

[root@k8scloude1 probe]# kubectl exec -it liveness-exec -- ls /tmp
healthy

[root@k8scloude1 probe]# kubectl exec -it liveness-exec -- ls /tmp

[root@k8scloude1 probe]# kubectl get pod -o wide
NAME            READY   STATUS    RESTARTS   AGE   IP               NODE         NOMINATED NODE   READINESS GATES
liveness-exec   1/1     Running   0          36s   10.244.112.177   k8scloude2   <none>           <none>

[root@k8scloude1 probe]# kubectl get pod -o wide
NAME            READY   STATUS    RESTARTS   AGE   IP               NODE         NOMINATED NODE   READINESS GATES
liveness-exec   1/1     Running   0          43s   10.244.112.177   k8scloude2   <none>           <none>

[root@k8scloude1 probe]# kubectl get pod -o wide
NAME            READY   STATUS    RESTARTS   AGE   IP               NODE         NOMINATED NODE   READINESS GATES
liveness-exec   1/1     Running   1          50s   10.244.112.177   k8scloude2   <none>           <none>

加了探測機制之後,當/tmp/healthy不存在,則會進行livenessProbe重啟pod,如果不加寬限期terminationGracePeriodSeconds: 0,一般75秒的時候會重啟一次

[root@k8scloude1 probe]# kubectl get pod -o wide
NAME            READY   STATUS    RESTARTS   AGE     IP               NODE         NOMINATED NODE   READINESS GATES
liveness-exec   1/1     Running   3          2m58s   10.244.112.177   k8scloude2   <none>           <none>

刪除pod

[root@k8scloude1 probe]# kubectl delete -f podprobe.yaml 
pod "liveness-exec" deleted

[root@k8scloude1 probe]# kubectl get pod -o wide
No resources found in probe namespace.

5.2 使用httpGet的方式進行livenessprobe探測

建立了一個名為liveness-httpget的Pod,使用nginx映象來建立一個容器。該容器設定了一個HTTP GET請求的liveness探針,檢查是否能夠成功存取Nginx的預設主頁/index.html。如果標準無法滿足,則Kubernetes將認為容器不健康,並嘗試重啟該容器。

liveness探測將在容器啟動後10秒鐘開始,並每隔10秒鐘執行一次。failureThreshold屬性表示最大連續失敗次數為3次,successThreshold屬性表示必須至少1次成功才能將容器視為「健康」。timeoutSeconds屬性表示探測請求的超時時間為10秒

[root@k8scloude1 probe]# vim podprobehttpget.yaml 

#httpGet的方式
[root@k8scloude1 probe]# cat podprobehttpget.yaml 
apiVersion: v1
kind: Pod
metadata:
  labels:
    test: liveness
  name: liveness-httpget
spec:
  terminationGracePeriodSeconds: 0
  containers:
  - name: nginx
    image: nginx
    imagePullPolicy: IfNotPresent
    livenessProbe:
      failureThreshold: 3
      httpGet:
        path: /index.html
        port: 80
        scheme: HTTP
      #容器啟動的10秒內不監測
      initialDelaySeconds: 10
      #每10秒檢測一次
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 10

[root@k8scloude1 probe]# kubectl apply -f podprobehttpget.yaml 
pod/liveness-httpget created

[root@k8scloude1 probe]# kubectl get pod -o wide
NAME               READY   STATUS    RESTARTS   AGE   IP               NODE         NOMINATED NODE   READINESS GATES
liveness-httpget   1/1     Running   0          6s    10.244.112.178   k8scloude2   <none>           <none>

檢視/usr/share/nginx/html/index.html檔案

[root@k8scloude1 probe]# kubectl exec -it liveness-httpget -- ls /usr/share/nginx/html/index.html
/usr/share/nginx/html/index.html

[root@k8scloude1 probe]# kubectl get pod -o wide
NAME               READY   STATUS    RESTARTS   AGE    IP               NODE         NOMINATED NODE   READINESS GATES
liveness-httpget   1/1     Running   0          2m3s   10.244.112.178   k8scloude2   <none>           <none>

刪除/usr/share/nginx/html/index.html檔案

[root@k8scloude1 probe]# kubectl exec -it liveness-httpget -- rm /usr/share/nginx/html/index.html

[root@k8scloude1 probe]# kubectl exec -it liveness-httpget -- ls /usr/share/nginx/html/index.html
ls: cannot access '/usr/share/nginx/html/index.html': No such file or directory
command terminated with exit code 2

觀察pod狀態和/usr/share/nginx/html/index.html檔案,通過埠80探測檔案/usr/share/nginx/html/index.html,探測不到說明檔案有問題,則進行livenessProbe重啟pod。

[root@k8scloude1 probe]# kubectl get pod -o wide
NAME               READY   STATUS    RESTARTS   AGE     IP               NODE         NOMINATED NODE   READINESS GATES
liveness-httpget   1/1     Running   1          2m43s   10.244.112.178   k8scloude2   <none>           <none>

[root@k8scloude1 probe]# kubectl get pod -o wide
NAME               READY   STATUS    RESTARTS   AGE     IP               NODE         NOMINATED NODE   READINESS GATES
liveness-httpget   1/1     Running   1          2m46s   10.244.112.178   k8scloude2   <none>           <none>

[root@k8scloude1 probe]# kubectl exec -it liveness-httpget -- ls /usr/share/nginx/html/index.html
/usr/share/nginx/html/index.html

#通過埠80探測檔案/usr/share/nginx/html/index.html,探測不到說明檔案有問題,則進行livenessProbe重啟pod
[root@k8scloude1 probe]# kubectl exec -it liveness-httpget -- ls /usr/share/nginx/html/index.html
/usr/share/nginx/html/index.html

刪除pod

[root@k8scloude1 probe]# kubectl delete -f podprobehttpget.yaml 
pod "liveness-httpget" deleted

[root@k8scloude1 probe]# kubectl get pod -o wide
No resources found in probe namespace.

5.3 使用tcpSocket的方式進行livenessprobe探測

建立了一個名為liveness-tcpsocket的Pod,使用nginx映象來建立一個容器。該容器設定了一個TCP Socket連線的liveness探針,檢查是否能夠成功連線到指定的埠8080。如果無法連線,則Kubernetes將認為容器不健康,並嘗試重啟該容器。

liveness探測將在容器啟動後10秒鐘開始,並每隔10秒鐘執行一次。failureThreshold屬性表示最大連續失敗次數為3次,successThreshold屬性表示必須至少1次成功才能將容器視為「健康」。timeoutSeconds屬性表示探測請求的超時時間為10秒。

[root@k8scloude1 probe]# vim podprobetcpsocket.yaml 

#tcpSocket的方式:
[root@k8scloude1 probe]# cat podprobetcpsocket.yaml 
apiVersion: v1
kind: Pod
metadata:
  labels:
    test: liveness
  name: liveness-tcpsocket
spec:
  terminationGracePeriodSeconds: 0
  containers:
  - name: nginx
    image: nginx
    imagePullPolicy: IfNotPresent
    livenessProbe:
      failureThreshold: 3
      tcpSocket:
        port: 8080
      #容器啟動的10秒內不監測
      initialDelaySeconds: 10
      #每10秒檢測一次
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 10

[root@k8scloude1 probe]# kubectl apply -f podprobetcpsocket.yaml 
pod/liveness-tcpsocket created

觀察pod狀態,因為nginx執行的是80埠,但是我們探測的是8080埠,所以肯定探測失敗,livenessProbe就會重啟pod

[root@k8scloude1 probe]# kubectl get pod -o wide
NAME                 READY   STATUS    RESTARTS   AGE   IP               NODE         NOMINATED NODE   READINESS GATES
liveness-tcpsocket   1/1     Running   0          10s   10.244.112.179   k8scloude2   <none>           <none>

[root@k8scloude1 probe]# kubectl get pod -o wide
NAME                 READY   STATUS    RESTARTS   AGE   IP               NODE         NOMINATED NODE   READINESS GATES
liveness-tcpsocket   1/1     Running   1          55s   10.244.112.179   k8scloude2   <none>           <none>

刪除pod

[root@k8scloude1 probe]# kubectl delete -f podprobetcpsocket.yaml 
pod "liveness-tcpsocket" deleted

下面新增readinessprobe探測

六.readinessprobe探測

因為readiness probe的探測機制是不重啟的,只是把使用者傳送過來的請求不再轉發到此pod上,為了模擬此情景,建立三個pod,svc把使用者請求轉發到這三個pod上。

小技巧TIPS:要想看文字有沒有對齊,可以使用 :set cuc ,取消使用 :set nocuc

建立pod,readinessProbe探測 /tmp/healthy檔案,如果 /tmp/healthy檔案存在則正常,不存在則異常。lifecycle postStart表示容器啟動之後建立/tmp/healthy檔案。

[root@k8scloude1 probe]# vim podreadinessprobecommand.yaml 

[root@k8scloude1 probe]# cat podreadinessprobecommand.yaml 
apiVersion: v1
kind: Pod
metadata:
  labels:
    test: readiness
  name: readiness-exec
spec:
  terminationGracePeriodSeconds: 0
  containers:
  - name: readiness
    image: nginx
    imagePullPolicy: IfNotPresent
    readinessProbe:
      exec:
        command:
        - cat
        - /tmp/healthy
      #容器啟動的5秒內不監測
      initialDelaySeconds: 5
      #每5秒檢測一次
      periodSeconds: 5
    lifecycle:
      postStart:
        exec:
          command: ["/bin/sh","-c","touch /tmp/healthy"]

建立三個名字不同的pod

[root@k8scloude1 probe]# kubectl apply -f podreadinessprobecommand.yaml 
pod/readiness-exec created

[root@k8scloude1 probe]# sed 's/readiness-exec/readiness-exec2/' podreadinessprobecommand.yaml | kubectl apply -f -
pod/readiness-exec2 created

[root@k8scloude1 probe]# sed 's/readiness-exec/readiness-exec3/' podreadinessprobecommand.yaml | kubectl apply -f -
pod/readiness-exec3 created

檢視pod的標籤
[root@k8scloude1 probe]# kubectl get pod -o wide --show-labels
NAME              READY   STATUS    RESTARTS   AGE   IP               NODE         NOMINATED NODE   READINESS GATES   LABELS
readiness-exec    1/1     Running   0          23s   10.244.112.182   k8scloude2   <none>           <none>            test=readiness
readiness-exec2   1/1     Running   0          15s   10.244.251.236   k8scloude3   <none>           <none>            test=readiness
readiness-exec3   0/1     Running   0          9s    10.244.112.183   k8scloude2   <none>           <none>            test=readiness

三個pod的標籤是一樣的

[root@k8scloude1 probe]# kubectl get pod -o wide --show-labels
NAME              READY   STATUS    RESTARTS   AGE   IP               NODE         NOMINATED NODE   READINESS GATES   LABELS
readiness-exec    1/1     Running   0          26s   10.244.112.182   k8scloude2   <none>           <none>            test=readiness
readiness-exec2   1/1     Running   0          18s   10.244.251.236   k8scloude3   <none>           <none>            test=readiness
readiness-exec3   1/1     Running   0          12s   10.244.112.183   k8scloude2   <none>           <none>            test=readiness

為了標識3個pod的不同,修改nginx的index檔案

[root@k8scloude1 probe]# kubectl exec -it readiness-exec -- sh -c "echo 111 > /usr/share/nginx/html/index.html"

[root@k8scloude1 probe]# kubectl exec -it readiness-exec2 -- sh -c "echo 222 > /usr/share/nginx/html/index.html"

[root@k8scloude1 probe]# kubectl exec -it readiness-exec3 -- sh -c "echo 333 > /usr/share/nginx/html/index.html"

建立一個service服務,把使用者請求轉發到這三個pod上

[root@k8scloude1 probe]# kubectl expose --name=svc1 pod readiness-exec --port=80
service/svc1 exposed

test=readiness這個標籤有3個pod

[root@k8scloude1 probe]# kubectl get svc -o wide
NAME   TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)   AGE   SELECTOR
svc1   ClusterIP   10.101.38.121   <none>        80/TCP    23s   test=readiness

[root@k8scloude1 probe]# kubectl get pod --show-labels
NAME              READY   STATUS    RESTARTS   AGE     LABELS
readiness-exec    1/1     Running   0          7m14s   test=readiness
readiness-exec2   1/1     Running   0          7m6s    test=readiness
readiness-exec3   1/1     Running   0          7m      test=readiness

存取service 服務 ,發現使用者請求都分別轉發到三個pod

[root@k8scloude1 probe]# while true ; do curl -s 10.101.38.121 ; sleep 1 ; done
333
111
333
222
111
......

刪除pod readiness-exec2的探測檔案

[root@k8scloude1 probe]# kubectl exec -it readiness-exec2 -- rm /tmp/healthy

因為/tmp/healthy探測不成功,readiness-exec2的READY狀態變為了0/1,但是STATUS還為Running狀態,還可以進入到readiness-exec2 pod裡。由於readinessprobe只是不把使用者請求轉發到異常pod,所以異常pod不會被刪除。

[root@k8scloude1 probe]# kubectl get pod --show-labels
NAME              READY   STATUS    RESTARTS   AGE   LABELS
readiness-exec    1/1     Running   0          10m   test=readiness
readiness-exec2   0/1     Running   0          10m   test=readiness
readiness-exec3   1/1     Running   0          10m   test=readiness

[root@k8scloude1 probe]# kubectl exec -it readiness-exec2 -- bash
root@readiness-exec2:/# exit
exit

kubectl get ev (檢視事件),可以看到「88s Warning Unhealthy pod/readiness-exec2 Readiness probe failed: cat: /tmp/healthy: No such file or directory」警告

[root@k8scloude1 probe]# kubectl get ev
LAST SEEN   TYPE      REASON      OBJECT                MESSAGE
......
32m         Normal    Pulled      pod/readiness-exec2   Container image "nginx" already present on machine
32m         Normal    Created     pod/readiness-exec2   Created container readiness
32m         Normal    Started     pod/readiness-exec2   Started container readiness
15m         Normal    Killing     pod/readiness-exec2   Stopping container readiness
13m         Normal    Scheduled   pod/readiness-exec2   Successfully assigned probe/readiness-exec2 to k8scloude3
13m         Normal    Pulled      pod/readiness-exec2   Container image "nginx" already present on machine
13m         Normal    Created     pod/readiness-exec2   Created container readiness
13m         Normal    Started     pod/readiness-exec2   Started container readiness
88s         Warning   Unhealthy   pod/readiness-exec2   Readiness probe failed: cat: /tmp/healthy: No such file or directory
32m         Normal    Scheduled   pod/readiness-exec3   Successfully assigned probe/readiness-exec3 to k8scloude3
32m         Normal    Pulled      pod/readiness-exec3   Container image "nginx" already present on machine
32m         Normal    Created     pod/readiness-exec3   Created container readiness
32m         Normal    Started     pod/readiness-exec3   Started container readiness
15m         Normal    Killing     pod/readiness-exec3   Stopping container readiness
13m         Normal    Scheduled   pod/readiness-exec3   Successfully assigned probe/readiness-exec3 to k8scloude2
13m         Normal    Pulled      pod/readiness-exec3   Container image "nginx" already present on machine
13m         Normal    Created     pod/readiness-exec3   Created container readiness
13m         Normal    Started     pod/readiness-exec3   Started container readiness

再次存取service服務,發現使用者請求只轉發到了111和333,說明readiness probe探測生效。

[root@k8scloude1 probe]# while true ; do curl -s 10.101.38.121 ; sleep 1 ; done
111
333
333
333
111
......

七.總結

通過本文,您應該已經瞭解到如何使用livenessprobe探測和readinessprobe探測來監視Kubernetes中容器的健康狀態。通過定期檢查服務狀態、命令退出碼、HTTP響應和記憶體使用情況,您可以自動重啟不健康的容器,並提高應用的可用性和穩定性。