pod(八):pod的排程——將 Pod 指派給節點

2022-11-06 21:03:04

一.系統環境

伺服器版本 docker軟體版本 Kubernetes(k8s)叢集版本 CPU架構
CentOS Linux release 7.4.1708 (Core) Docker version 20.10.12 v1.21.9 x86_64

Kubernetes叢集架構:k8scloude1作為master節點,k8scloude2,k8scloude3作為worker節點

伺服器 作業系統版本 CPU架構 程序 功能描述
k8scloude1/192.168.110.130 CentOS Linux release 7.4.1708 (Core) x86_64 docker,kube-apiserver,etcd,kube-scheduler,kube-controller-manager,kubelet,kube-proxy,coredns,calico k8s master節點
k8scloude2/192.168.110.129 CentOS Linux release 7.4.1708 (Core) x86_64 docker,kubelet,kube-proxy,calico k8s worker節點
k8scloude3/192.168.110.128 CentOS Linux release 7.4.1708 (Core) x86_64 docker,kubelet,kube-proxy,calico k8s worker節點

二.前言

本文介紹pod的排程,即如何讓pod執行在Kubernetes叢集的指定節點。

進行pod的排程的前提是已經有一套可以正常執行的Kubernetes叢集,關於Kubernetes(k8s)叢集的安裝部署,可以檢視部落格《Centos7 安裝部署Kubernetes(k8s)叢集》https://www.cnblogs.com/renshengdezheli/p/16686769.html

三.pod的排程

3.1 pod的排程概述

你可以約束一個 Pod 以便 限制 其只能在特定的節點上執行, 或優先在特定的節點上執行。 有幾種方法可以實現這點,推薦的方法都是用 標籤選擇算符來進行選擇。 通常這樣的約束不是必須的,因為排程器將自動進行合理的放置(比如,將 Pod 分散到節點上, 而不是將 Pod 放置在可用資源不足的節點上等等)。但在某些情況下,你可能需要進一步控制 Pod 被部署到哪個節點。例如,確保 Pod 最終落在連線了 SSD 的機器上, 或者將來自兩個不同的服務且有大量通訊的 Pods 被放置在同一個可用區。

你可以使用下列方法中的任何一種來選擇 Kubernetes 對特定 Pod 的排程:

  • 與節點標籤匹配的 nodeSelector
  • 親和性與反親和性
  • nodeName 欄位
  • Pod 拓撲分佈約束

3.2 pod自動排程

如果不手動指定pod執行在哪個節點上,k8s會自動排程pod的,k8s自動排程pod在哪個節點上執行考慮的因素有:

  • 待排程的pod列表
  • 可用的node列表
  • 排程演演算法:主機過濾,主機打分

3.2.1 建立3個主機埠為80的pod

檢視hostPort欄位的解釋,hostPort欄位表示把pod的埠對映到節點,即在節點上公開 Pod 的埠。

#主機埠對映:hostPort: 80
[root@k8scloude1 pod]# kubectl explain pods.spec.containers.ports.hostPort
KIND:     Pod
VERSION:  v1

FIELD:    hostPort <integer>

DESCRIPTION:
     Number of port to expose on the host. If specified, this must be a valid
     port number, 0 < x < 65536. If HostNetwork is specified, this must match
     ContainerPort. Most containers do not need this.

建立第一個pod,hostPort: 80表示把容器的80埠對映到節點的80埠

[root@k8scloude1 pod]# vim schedulepod.yaml

#kind: Pod表示資源型別為Pod   labels指定pod標籤   metadata下面的name指定pod名字   containers下面全是容器的定義   
#image指定映象名字  imagePullPolicy指定映象下載策略   containers下面的name指定容器名
#resources指定容器資源(CPU,記憶體等)   env指定容器裡的環境變數   dnsPolicy指定DNS策略
#restartPolicy容器重啟策略    ports指定容器埠  containerPort容器埠  hostPort節點上的埠
[root@k8scloude1 pod]# cat schedulepod.yaml 
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  labels:
    run: pod
  name: pod
  namespace: pod
spec:
  containers:
  - image: nginx
    imagePullPolicy: IfNotPresent
    name: pod
    resources: {}
    ports:
    - name: http
      containerPort: 80
      protocol: TCP
      hostPort: 80
  dnsPolicy: ClusterFirst
  restartPolicy: Always
status: {}

[root@k8scloude1 pod]# kubectl apply -f schedulepod.yaml 
pod/pod created

[root@k8scloude1 pod]# kubectl get pods
NAME   READY   STATUS    RESTARTS   AGE
pod    1/1     Running   0          6s

可以看到pod建立成功。

接下來建立第二個pod,hostPort: 80表示把容器的80埠對映到節點的80埠,兩個pod只有pod名字不一樣。

[root@k8scloude1 pod]# cp schedulepod.yaml schedulepod1.yaml 

[root@k8scloude1 pod]# vim schedulepod1.yaml 

[root@k8scloude1 pod]# cat schedulepod1.yaml 
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  labels:
    run: pod1
  name: pod1
  namespace: pod
spec:
  containers:
  - image: nginx
    imagePullPolicy: IfNotPresent
    name: pod1
    resources: {}
    ports:
    - name: http
      containerPort: 80
      protocol: TCP
      hostPort: 80
  dnsPolicy: ClusterFirst
  restartPolicy: Always
status: {}

[root@k8scloude1 pod]# kubectl apply -f schedulepod1.yaml 
pod/pod1 created

[root@k8scloude1 pod]# kubectl get pods
NAME   READY   STATUS    RESTARTS   AGE
pod    1/1     Running   0          11m
pod1   1/1     Running   0          5s

第二個pod建立成功,現在建立第三個pod。

開篇我們已經介紹過叢集架構,Kubernetes叢集架構:k8scloude1作為master節點,k8scloude2,k8scloude3作為worker節點,k8s叢集只有2個worker節點,master節點預設不執行應用pod,主機埠80已經被佔用兩臺worker節點全部佔用,所以pod2無法執行。

[root@k8scloude1 pod]# sed 's/pod1/pod2/' schedulepod1.yaml | kubectl apply -f -
pod/pod2 created

#主機埠80已經被佔用兩臺worker節點全部佔用,pod2無法執行
[root@k8scloude1 pod]# kubectl get pods
NAME   READY   STATUS    RESTARTS   AGE
pod    1/1     Running   0          16m
pod1   1/1     Running   0          5m28s
pod2   0/1     Pending   0          5s

觀察pod在k8s叢集的分佈情況,NODE顯示pod執行在哪個節點

[root@k8scloude1 pod]# kubectl get pods
NAME   READY   STATUS    RESTARTS   AGE
pod    1/1     Running   0          18m
pod1   1/1     Running   0          7m28s

[root@k8scloude1 pod]# kubectl get pods -o wide
NAME   READY   STATUS    RESTARTS   AGE   IP               NODE         NOMINATED NODE   READINESS GATES
pod    1/1     Running   0          29m   10.244.251.208   k8scloude3   <none>           <none>
pod1   1/1     Running   0          18m   10.244.112.156   k8scloude2   <none>           <none>

刪除pod

[root@k8scloude1 pod]# kubectl delete pod pod2 
pod "pod2" deleted

[root@k8scloude1 pod]# kubectl delete pod pod1 pod
pod "pod1" deleted
pod "pod" deleted

上面三個pod都是k8s自動排程的,下面我們手動指定pod執行在哪個節點。

3.3 使用nodeName 欄位指定pod執行在哪個節點

使用nodeName 欄位指定pod執行在哪個節點,這是一種比較直接的方式,nodeName 是 Pod 規約中的一個欄位。如果 nodeName 欄位不為空,排程器會忽略該 Pod, 而指定節點上的 kubelet 會嘗試將 Pod 放到該節點上。 使用 nodeName 規則的優先順序會高於使用 nodeSelector 或親和性與非親和性的規則

使用 nodeName 來選擇節點的方式有一些侷限性:

  • 如果所指代的節點不存在,則 Pod 無法執行,而且在某些情況下可能會被自動刪除。
  • 如果所指代的節點無法提供用來執行 Pod 所需的資源,Pod 會失敗, 而其失敗原因中會給出是否因為記憶體或 CPU 不足而造成無法執行。
  • 在雲環境中的節點名稱並不總是可預測的,也不總是穩定的。

建立pod,nodeName: k8scloude3表示pod要執行在名為k8scloude3的節點

[root@k8scloude1 pod]# vim schedulepod2.yaml 

#kind: Pod表示資源型別為Pod   labels指定pod標籤   metadata下面的name指定pod名字   containers下面全是容器的定義   
#image指定映象名字  imagePullPolicy指定映象下載策略   containers下面的name指定容器名
#resources指定容器資源(CPU,記憶體等)   env指定容器裡的環境變數   dnsPolicy指定DNS策略
#restartPolicy容器重啟策略    ports指定容器埠  containerPort容器埠  hostPort節點上的埠
#nodeName: k8scloude3指定pod在k8scloude3上執行
[root@k8scloude1 pod]# cat schedulepod2.yaml 
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  labels:
    run: pod1
  name: pod1
  namespace: pod
spec:
  nodeName: k8scloude3
  containers:
  - image: nginx
    imagePullPolicy: IfNotPresent
    name: pod1
    resources: {}
    ports:
    - name: http
      containerPort: 80
      protocol: TCP
      hostPort: 80
  dnsPolicy: ClusterFirst
  restartPolicy: Always
status: {}

[root@k8scloude1 pod]# kubectl apply -f schedulepod2.yaml 
pod/pod1 created

可以看到pod執行在k8scloude3節點

[root@k8scloude1 pod]# kubectl get pod -o wide
NAME   READY   STATUS    RESTARTS   AGE   IP               NODE         NOMINATED NODE   READINESS GATES
pod1   1/1     Running   0          7s    10.244.251.209   k8scloude3   <none>           <none>

[root@k8scloude1 pod]# kubectl delete pod pod1 --force
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.

[root@k8scloude1 pod]# kubectl get pods
No resources found in pod namespace.

建立pod,nodeName: k8scloude1讓pod執行在k8scloude1節點

[root@k8scloude1 pod]# vim schedulepod3.yaml 

#kind: Pod表示資源型別為Pod   labels指定pod標籤   metadata下面的name指定pod名字   containers下面全是容器的定義   
#image指定映象名字  imagePullPolicy指定映象下載策略   containers下面的name指定容器名
#resources指定容器資源(CPU,記憶體等)   env指定容器裡的環境變數   dnsPolicy指定DNS策略
#restartPolicy容器重啟策略    ports指定容器埠  containerPort容器埠  hostPort節點上的埠
#nodeName: k8scloude1讓pod執行在k8scloude1節點
[root@k8scloude1 pod]# cat schedulepod3.yaml 
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  labels:
    run: pod1
  name: pod1
  namespace: pod
spec:
  nodeName: k8scloude1
  containers:
  - image: nginx
    imagePullPolicy: IfNotPresent
    name: pod1
    resources: {}
    ports:
    - name: http
      containerPort: 80
      protocol: TCP
      hostPort: 80
  dnsPolicy: ClusterFirst
  restartPolicy: Always
status: {}

[root@k8scloude1 pod]# kubectl apply -f schedulepod3.yaml 
pod/pod1 created

可以看到pod執行在k8scloude1,注意k8scloude1是master節點,master節點一般不執行應用pod,並且k8scloude1有汙點,一般來說,pod是不執行在有汙點的主機上的,如果強制排程上去的話,pod的狀態應該是pending,但是通過nodeName可以把一個pod排程到有汙點的主機上正常執行的,比如nodeName指定pod執行在master上

[root@k8scloude1 pod]# kubectl get pods -o wide
NAME   READY   STATUS    RESTARTS   AGE   IP              NODE         NOMINATED NODE   READINESS GATES
pod1   1/1     Running   0          47s   10.244.158.81   k8scloude1   <none>           <none>

[root@k8scloude1 pod]# kubectl delete pod pod1 --force
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
pod "pod1" force deleted

3.4 使用節點標籤nodeSelector指定pod執行在哪個節點

與很多其他 Kubernetes 物件類似,節點也有標籤。 你可以手動地新增標籤。 Kubernetes 也會為叢集中所有節點新增一些標準的標籤。

通過為節點新增標籤,你可以準備讓 Pod 排程到特定節點或節點組上。 你可以使用這個功能來確保特定的 Pod 只能執行在具有一定隔離性,安全性或監管屬性的節點上。
nodeSelector 是節點選擇約束的最簡單推薦形式。你可以將 nodeSelector 欄位新增到 Pod 的規約中設定你希望目標節點所具有的節點標籤。 Kubernetes 只會將 Pod 排程到擁有你所指定的每個標籤的節點上。nodeSelector 提供了一種最簡單的方法來將 Pod 約束到具有特定標籤的節點上。

3.4.1 檢視標籤

檢視節點node的標籤,標籤的格式:鍵值對:xxxx/yyyy.aaaa=456123,xxxx1/yyyy1.aaaa=456123,--show-labels引數顯示標籤

[root@k8scloude1 pod]# kubectl get nodes --show-labels
NAME         STATUS   ROLES                  AGE    VERSION   LABELS
k8scloude1   Ready    control-plane,master   7d1h   v1.21.0   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8scloude1,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=,node-role.kubernetes.io/master=,node.kubernetes.io/exclude-from-external-load-balancers=
k8scloude2   Ready    <none>                 7d     v1.21.0   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8scloude2,kubernetes.io/os=linux
k8scloude3   Ready    <none>                 7d     v1.21.0   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8scloude3,kubernetes.io/os=linux

檢視namespace的標籤

[root@k8scloude1 pod]# kubectl get ns --show-labels
NAME              STATUS   AGE    LABELS
default           Active   7d1h   kubernetes.io/metadata.name=default
kube-node-lease   Active   7d1h   kubernetes.io/metadata.name=kube-node-lease
kube-public       Active   7d1h   kubernetes.io/metadata.name=kube-public
kube-system       Active   7d1h   kubernetes.io/metadata.name=kube-system
ns1               Active   6d5h   kubernetes.io/metadata.name=ns1
ns2               Active   6d5h   kubernetes.io/metadata.name=ns2
pod               Active   4d2h   kubernetes.io/metadata.name=pod

檢視pod的標籤

[root@k8scloude1 pod]# kubectl get pod -A --show-labels 
NAMESPACE     NAME                                       READY   STATUS    RESTARTS   AGE    LABELS
kube-system   calico-kube-controllers-6b9fbfff44-4jzkj   1/1     Running   12         7d     k8s-app=calico-kube-controllers,pod-template-hash=6b9fbfff44
kube-system   calico-node-bdlgm                          1/1     Running   7          7d     controller-revision-hash=6b57d9cd54,k8s-app=calico-node,pod-template-generation=1
kube-system   calico-node-hx8bk                          1/1     Running   7          7d     controller-revision-hash=6b57d9cd54,k8s-app=calico-node,pod-template-generation=1
kube-system   calico-node-nsbfs                          1/1     Running   7          7d     controller-revision-hash=6b57d9cd54,k8s-app=calico-node,pod-template-generation=1
kube-system   coredns-545d6fc579-7wm95                   1/1     Running   7          7d1h   k8s-app=kube-dns,pod-template-hash=545d6fc579
kube-system   coredns-545d6fc579-87q8j                   1/1     Running   7          7d1h   k8s-app=kube-dns,pod-template-hash=545d6fc579
kube-system   etcd-k8scloude1                            1/1     Running   7          7d1h   component=etcd,tier=control-plane
kube-system   kube-apiserver-k8scloude1                  1/1     Running   11         7d1h   component=kube-apiserver,tier=control-plane
kube-system   kube-controller-manager-k8scloude1         1/1     Running   7          7d1h   component=kube-controller-manager,tier=control-plane
kube-system   kube-proxy-599xh                           1/1     Running   7          7d1h   controller-revision-hash=6795549d44,k8s-app=kube-proxy,pod-template-generation=1
kube-system   kube-proxy-lpj8z                           1/1     Running   7          7d1h   controller-revision-hash=6795549d44,k8s-app=kube-proxy,pod-template-generation=1
kube-system   kube-proxy-zxlk9                           1/1     Running   7          7d1h   controller-revision-hash=6795549d44,k8s-app=kube-proxy,pod-template-generation=1
kube-system   kube-scheduler-k8scloude1                  1/1     Running   7          7d1h   component=kube-scheduler,tier=control-plane
kube-system   metrics-server-bcfb98c76-k5dmj             1/1     Running   6          6d5h   k8s-app=metrics-server,pod-template-hash=bcfb98c76

3.4.2 建立標籤

以node-role.kubernetes.io/control-plane= 標籤為例,鍵是node-role.kubernetes.io/control-plane,值為空。

建立標籤的語法:kubectl label 物件型別 物件名 鍵=值

給k8scloude2節點設定標籤

[root@k8scloude1 pod]# kubectl label nodes k8scloude2 k8snodename=k8scloude2
node/k8scloude2 labeled

[root@k8scloude1 pod]# kubectl get nodes --show-labels
NAME         STATUS   ROLES                  AGE    VERSION   LABELS
k8scloude1   Ready    control-plane,master   7d1h   v1.21.0   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8scloude1,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=,node-role.kubernetes.io/master=,node.kubernetes.io/exclude-from-external-load-balancers=
k8scloude2   Ready    <none>                 7d1h   v1.21.0   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,k8snodename=k8scloude2,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8scloude2,kubernetes.io/os=linux
k8scloude3   Ready    <none>                 7d1h   v1.21.0   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8scloude3,kubernetes.io/os=linux

k8scloude2節點刪除標籤

[root@k8scloude1 pod]# kubectl label nodes k8scloude2 k8snodename-
node/k8scloude2 labeled

[root@k8scloude1 pod]# kubectl get nodes --show-labels
NAME         STATUS   ROLES                  AGE    VERSION   LABELS
k8scloude1   Ready    control-plane,master   7d1h   v1.21.0   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8scloude1,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=,node-role.kubernetes.io/master=,node.kubernetes.io/exclude-from-external-load-balancers=
k8scloude2   Ready    <none>                 7d1h   v1.21.0   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8scloude2,kubernetes.io/os=linux
k8scloude3   Ready    <none>                 7d1h   v1.21.0   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8scloude3,kubernetes.io/os=linux

列出含有標籤k8snodename=k8scloude2的節點

[root@k8scloude1 pod]# kubectl label nodes k8scloude2 k8snodename=k8scloude2

#列出含有標籤k8snodename=k8scloude2的節點
[root@k8scloude1 pod]# kubectl get nodes -l k8snodename=k8scloude2
NAME         STATUS   ROLES    AGE    VERSION
k8scloude2   Ready    <none>   7d1h   v1.21.0

[root@k8scloude1 pod]# kubectl label nodes k8scloude2 k8snodename-
node/k8scloude2 labeled

對所有節點設定標籤

[root@k8scloude1 pod]# kubectl label nodes --all k8snodename=cloude
node/k8scloude1 labeled
node/k8scloude2 labeled
node/k8scloude3 labeled

列出含有標籤k8snodename=cloude的節點

#列出含有標籤k8snodename=cloude的節點
[root@k8scloude1 pod]# kubectl get nodes -l k8snodename=cloude
NAME         STATUS   ROLES                  AGE    VERSION
k8scloude1   Ready    control-plane,master   7d1h   v1.21.0
k8scloude2   Ready    <none>                 7d1h   v1.21.0
k8scloude3   Ready    <none>                 7d1h   v1.21.0

#刪除標籤
[root@k8scloude1 pod]# kubectl label nodes --all k8snodename-
node/k8scloude1 labeled
node/k8scloude2 labeled
node/k8scloude3 labeled

[root@k8scloude1 pod]# kubectl get nodes -l k8snodename=cloude
No resources found

--overwrite引數,標籤的覆蓋

[root@k8scloude1 pod]# kubectl label nodes k8scloude2 k8snodename=k8scloude2
node/k8scloude2 labeled

#標籤的覆蓋
[root@k8scloude1 pod]# kubectl label nodes k8scloude2 k8snodename=k8scloude
error: 'k8snodename' already has a value (k8scloude2), and --overwrite is false

#--overwrite引數,標籤的覆蓋
[root@k8scloude1 pod]# kubectl label nodes k8scloude2 k8snodename=k8scloude --overwrite
node/k8scloude2 labeled

[root@k8scloude1 pod]# kubectl get nodes -l k8snodename=k8scloude2
No resources found

[root@k8scloude1 pod]# kubectl get nodes -l k8snodename=k8scloude
NAME         STATUS   ROLES    AGE    VERSION
k8scloude2   Ready    <none>   7d1h   v1.21.0

[root@k8scloude1 pod]# kubectl label nodes k8scloude2 k8snodename-
node/k8scloude2 labeled

Tips如果不想在k8scloude1的ROLES裡看到control-plane,則可以通過取消標籤達到目的:kubectl label nodes k8scloude1 node-role.kubernetes.io/control-plane- 進行取消標籤

[root@k8scloude1 pod]# kubectl get nodes --show-labels
NAME         STATUS   ROLES                  AGE    VERSION   LABELS
k8scloude1   Ready    control-plane,master   7d1h   v1.21.0   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8scloude1,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=,node-role.kubernetes.io/master=,node.kubernetes.io/exclude-from-external-load-balancers=
k8scloude2   Ready    <none>                 7d1h   v1.21.0   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8scloude2,kubernetes.io/os=linux
k8scloude3   Ready    <none>                 7d1h   v1.21.0   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8scloude3,kubernetes.io/os=linux

[root@k8scloude1 pod]# kubectl label nodes k8scloude1 node-role.kubernetes.io/control-plane-

3.4.3 通過標籤控制pod在哪個節點執行

給k8scloude2節點打上標籤k8snodename=k8scloude2

[root@k8scloude1 pod]# kubectl label nodes k8scloude2 k8snodename=k8scloude2
node/k8scloude2 labeled

[root@k8scloude1 pod]# kubectl get nodes -l k8snodename=k8scloude2
NAME         STATUS   ROLES    AGE    VERSION
k8scloude2   Ready    <none>   7d1h   v1.21.0

[root@k8scloude1 pod]# kubectl get pods
No resources found in pod namespace.

建立pod,nodeSelector:k8snodename: k8scloude2 指定pod執行在標籤為k8snodename=k8scloude2的節點上

[root@k8scloude1 pod]# vim schedulepod4.yaml

[root@k8scloude1 pod]# cat schedulepod4.yaml 
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  labels:
    run: pod1
  name: pod1
  namespace: pod
spec:
  nodeSelector:
    k8snodename: k8scloude2
  containers:
  - image: nginx
    imagePullPolicy: IfNotPresent
    name: pod1
    resources: {}
    ports:
    - name: http
      containerPort: 80
      protocol: TCP
      hostPort: 80
  dnsPolicy: ClusterFirst
  restartPolicy: Always
status: {}

[root@k8scloude1 pod]# kubectl apply -f schedulepod4.yaml 
pod/pod1 created

可以看到pod執行在k8scloude2節點

[root@k8scloude1 pod]# kubectl get pod -o wide
NAME   READY   STATUS    RESTARTS   AGE   IP               NODE         NOMINATED NODE   READINESS GATES
pod1   1/1     Running   0          21s   10.244.112.158   k8scloude2   <none>           <none>

刪除pod,刪除標籤

[root@k8scloude1 pod]# kubectl get pod --show-labels
NAME   READY   STATUS    RESTARTS   AGE   LABELS
pod1   1/1     Running   0          32m   run=pod1

[root@k8scloude1 pod]# kubectl delete pod pod1 --force
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
pod "pod1" force deleted

[root@k8scloude1 pod]# kubectl get pod --show-labels
No resources found in pod namespace.

[root@k8scloude1 pod]# kubectl label nodes k8scloude2 k8snodename-
node/k8scloude2 labeled

[root@k8scloude1 pod]# kubectl get nodes -l k8snodename=k8scloude2
No resources found

[root@k8scloude1 pod]# kubectl get nodes -l k8snodename=k8scloude
No resources found

注意:如果兩臺主機的標籤是一致的,那麼通過在這兩臺機器上進行打分,哪個機器分高,pod就執行在哪個pod上

給k8s叢集的master節點打標籤

[root@k8scloude1 pod]# kubectl label nodes k8scloude1 k8snodename=k8scloude1
node/k8scloude1 labeled

[root@k8scloude1 pod]# kubectl get nodes -l k8snodename=k8scloude1
NAME         STATUS   ROLES                  AGE    VERSION
k8scloude1   Ready    control-plane,master   7d2h   v1.21.0

建立pod,nodeSelector:k8snodename: k8scloude1 指定pod執行在標籤為k8snodename=k8scloude1的節點上

[root@k8scloude1 pod]# vim schedulepod5.yaml 

[root@k8scloude1 pod]# cat schedulepod5.yaml 
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  labels:
    run: pod1
  name: pod1
  namespace: pod
spec:
  nodeSelector:
    k8snodename: k8scloude1
  containers:
  - image: nginx
    imagePullPolicy: IfNotPresent
    name: pod1
    resources: {}
    ports:
    - name: http
      containerPort: 80
      protocol: TCP
      hostPort: 80
  dnsPolicy: ClusterFirst
  restartPolicy: Always
status: {}

[root@k8scloude1 pod]# kubectl apply -f schedulepod5.yaml 
pod/pod1 created

因為k8scloude1上有汙點,所以pod不能執行在k8scloude1上,pod狀態為Pending

[root@k8scloude1 pod]# kubectl get pod
NAME   READY   STATUS    RESTARTS   AGE
pod1   0/1     Pending   0          9s

刪除pod,刪除標籤

[root@k8scloude1 pod]# kubectl delete pod pod1 
pod "pod1" deleted

[root@k8scloude1 pod]# kubectl get pod
No resources found in pod namespace.

[root@k8scloude1 pod]# kubectl label nodes k8scloude1 k8snodename-
node/k8scloude1 labeled

[root@k8scloude1 pod]# kubectl get nodes -l k8snodename=k8scloude1
No resources found

3.5 使用親和性與反親和性排程pod

nodeSelector 提供了一種最簡單的方法來將 Pod 約束到具有特定標籤的節點上。 親和性和反親和性擴充套件了你可以定義的約束型別。使用親和性與反親和性的一些好處有:

  • 親和性、反親和性語言的表達能力更強。nodeSelector 只能選擇擁有所有指定標籤的節點。 親和性、反親和性為你提供對選擇邏輯的更強控制能力。

  • 你可以標明某規則是「軟需求」或者「偏好」,這樣排程器在無法找到匹配節點時仍然排程該 Pod。

  • 你可以使用節點上(或其他拓撲域中)執行的其他 Pod 的標籤來實施排程約束, 而不是隻能使用節點本身的標籤。這個能力讓你能夠定義規則允許哪些 Pod 可以被放置在一起。

親和性功能由兩種型別的親和性組成:

  • 節點親和性功能類似於 nodeSelector 欄位,但它的表達能力更強,並且允許你指定軟規則。
  • Pod 間親和性/反親和性允許你根據其他 Pod 的標籤來約束 Pod。

節點親和性概念上類似於 nodeSelector, 它使你可以根據節點上的標籤來約束 Pod 可以排程到哪些節點上。 節點親和性有兩種:

  • requiredDuringSchedulingIgnoredDuringExecution: 排程器只有在規則被滿足的時候才能執行排程。此功能類似於 nodeSelector, 但其語法表達能力更強。
  • preferredDuringSchedulingIgnoredDuringExecution: 排程器會嘗試尋找滿足對應規則的節點。如果找不到匹配的節點,排程器仍然會排程該 Pod。

在上述型別中,IgnoredDuringExecution 意味著如果節點標籤在 Kubernetes 排程 Pod 後發生了變更,Pod 仍將繼續執行

你可以使用 Pod 規約中的 .spec.affinity.nodeAffinity 欄位來設定節點親和性。

檢視nodeAffinity欄位解釋

[root@k8scloude1 pod]# kubectl explain pods.spec.affinity.nodeAffinity 
KIND:     Pod
VERSION:  v1

RESOURCE: nodeAffinity <Object>

DESCRIPTION:
     Describes node affinity scheduling rules for the pod.

     Node affinity is a group of node affinity scheduling rules.

FIELDS:
#軟策略
   preferredDuringSchedulingIgnoredDuringExecution	<[]Object>
     The scheduler will prefer to schedule pods to nodes that satisfy the
     affinity expressions specified by this field, but it may choose a node that
     violates one or more of the expressions. The node that is most preferred is
     the one with the greatest sum of weights, i.e. for each node that meets all
     of the scheduling requirements (resource request, requiredDuringScheduling
     affinity expressions, etc.), compute a sum by iterating through the
     elements of this field and adding "weight" to the sum if the node matches
     the corresponding matchExpressions; the node(s) with the highest sum are
     the most preferred.

#硬策略
   requiredDuringSchedulingIgnoredDuringExecution	<Object>
     If the affinity requirements specified by this field are not met at
     scheduling time, the pod will not be scheduled onto the node. If the
     affinity requirements specified by this field cease to be met at some point
     during pod execution (e.g. due to an update), the system may or may not try
     to eventually evict the pod from its node.

3.5.1 使用硬策略requiredDuringSchedulingIgnoredDuringExecution

建立pod,requiredDuringSchedulingIgnoredDuringExecution參數列示:節點必須包含一個鍵名為 kubernetes.io/hostname 的標籤, 並且該標籤的取值必須k8scloude2k8scloude3

你可以使用 operator 欄位來為 Kubernetes 設定在解釋規則時要使用的邏輯操作符。 你可以使用 In、NotIn、Exists、DoesNotExist、Gt 和 Lt 之一作為操作符。NotIn 和 DoesNotExist 可用來實現節點反親和性行為。 你也可以使用節點汙點 將 Pod 從特定節點上驅逐。

注意:

  • 如果你同時指定了 nodeSelector 和 nodeAffinity,兩者 必須都要滿足, 才能將 Pod 排程到候選節點上。
  • 如果你指定了多個與 nodeAffinity 型別關聯的 nodeSelectorTerms, 只要其中一個 nodeSelectorTerms 滿足的話,Pod 就可以被排程到節點上。
  • 如果你指定了多個與同一 nodeSelectorTerms 關聯的 matchExpressions, 則只有當所有 matchExpressions 都滿足時 Pod 才可以被排程到節點上。
[root@k8scloude1 pod]# vim requiredDuringSchedule.yaml 

 #硬策略
[root@k8scloude1 pod]# cat requiredDuringSchedule.yaml 
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  labels:
    run: pod1
  name: pod1
  namespace: pod
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: kubernetes.io/hostname
            operator: In
            values: 
            - k8scloude2
            - k8scloude3
  containers:
  - image: nginx
    imagePullPolicy: IfNotPresent
    name: pod1
    resources: {}
    ports:
    - name: http
      containerPort: 80
      protocol: TCP
      hostPort: 80
  dnsPolicy: ClusterFirst
  restartPolicy: Always
status: {}

[root@k8scloude1 pod]# kubectl apply -f requiredDuringSchedule.yaml 
pod/pod1 created

可以看到pod執行在k8scloude3節點

[root@k8scloude1 pod]# kubectl get pods
NAME   READY   STATUS    RESTARTS   AGE
pod1   1/1     Running   0          6s

[root@k8scloude1 pod]# kubectl get pods -o wide
NAME   READY   STATUS    RESTARTS   AGE   IP               NODE         NOMINATED NODE   READINESS GATES
pod1   1/1     Running   0          10s   10.244.251.212   k8scloude3   <none>           <none>

[root@k8scloude1 pod]# kubectl delete pod pod1 --force
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
pod "pod1" force deleted

建立pod,requiredDuringSchedulingIgnoredDuringExecution參數列示:節點必須包含一個鍵名為 kubernetes.io/hostname 的標籤, 並且該標籤的取值必須k8scloude4k8scloude5

[root@k8scloude1 pod]# vim requiredDuringSchedule1.yaml 

[root@k8scloude1 pod]# cat requiredDuringSchedule1.yaml 
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  labels:
    run: pod1
  name: pod1
  namespace: pod
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: kubernetes.io/hostname
            operator: In
            values: 
            - k8scloude4
            - k8scloude5
  containers:
  - image: nginx
    imagePullPolicy: IfNotPresent
    name: pod1
    resources: {}
    ports:
    - name: http
      containerPort: 80
      protocol: TCP
      hostPort: 80
  dnsPolicy: ClusterFirst
  restartPolicy: Always
status: {}

[root@k8scloude1 pod]# kubectl apply -f requiredDuringSchedule1.yaml 
pod/pod1 created

由於requiredDuringSchedulingIgnoredDuringExecution是硬策略,k8scloude4,k8scloude5不滿足條件,所以pod建立失敗

[root@k8scloude1 pod]# kubectl get pods -o wide
NAME   READY   STATUS    RESTARTS   AGE   IP       NODE     NOMINATED NODE   READINESS GATES
pod1   0/1     Pending   0          7s    <none>   <none>   <none>           <none>

[root@k8scloude1 pod]# kubectl delete pod pod1 --force
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
pod "pod1" force deleted

3.5.2 使用軟策略preferredDuringSchedulingIgnoredDuringExecution

給節點打標籤

[root@k8scloude1 pod]# kubectl label nodes k8scloude2 xx=72
node/k8scloude2 labeled

[root@k8scloude1 pod]# kubectl label nodes k8scloude3 xx=59
node/k8scloude3 labeled

建立pod,preferredDuringSchedulingIgnoredDuringExecution參數列示:節點最好具有一個鍵名為 xx 且取值大於 60 的標籤。

[root@k8scloude1 pod]# vim preferredDuringSchedule.yaml 

[root@k8scloude1 pod]# cat preferredDuringSchedule.yaml 
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  labels:
    run: pod1
  name: pod1
  namespace: pod
spec:
  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 2
        preference:
          matchExpressions:
          - key: xx
            operator: Gt
            values:
            - "60"
  containers:
  - image: nginx
    imagePullPolicy: IfNotPresent
    name: pod1
    resources: {}
    ports:
    - name: http
      containerPort: 80
      protocol: TCP
      hostPort: 80
  dnsPolicy: ClusterFirst
  restartPolicy: Always
status: {}

[root@k8scloude1 pod]# kubectl apply -f preferredDuringSchedule.yaml 
pod/pod1 created

可以看到pod執行在k8scloude2,因為k8scloude2標籤為 xx=72,72大於60

[root@k8scloude1 pod]# kubectl get pods -o wide
NAME   READY   STATUS    RESTARTS   AGE   IP               NODE         NOMINATED NODE   READINESS GATES
pod1   1/1     Running   0          13s   10.244.112.159   k8scloude2   <none>           <none>

[root@k8scloude1 pod]# kubectl delete pod pod1 --force
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
pod "pod1" force deleted

建立pod,preferredDuringSchedulingIgnoredDuringExecution參數列示:節點最好具有一個鍵名為 xx 且取值大於 600 的標籤。

[root@k8scloude1 pod]# vim preferredDuringSchedule1.yaml 

[root@k8scloude1 pod]# cat preferredDuringSchedule1.yaml 
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  labels:
    run: pod1
  name: pod1
  namespace: pod
spec:
  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 2
        preference:
          matchExpressions:
          - key: xx
            operator: Gt
            values:
            - "600"
  containers:
  - image: nginx
    imagePullPolicy: IfNotPresent
    name: pod1
    resources: {}
    ports:
    - name: http
      containerPort: 80
      protocol: TCP
      hostPort: 80
  dnsPolicy: ClusterFirst
  restartPolicy: Always
status: {}

[root@k8scloude1 pod]# kubectl apply -f preferredDuringSchedule1.yaml 
pod/pod1 created

因為preferredDuringSchedulingIgnoredDuringExecution是軟策略,儘管k8scloude2,k8scloude3都不滿足xx>600,但是還是能成功建立pod

[root@k8scloude1 pod]# kubectl get pods -o wide
NAME   READY   STATUS    RESTARTS   AGE   IP               NODE         NOMINATED NODE   READINESS GATES
pod1   1/1     Running   0          7s    10.244.251.213   k8scloude3   <none>           <none>

[root@k8scloude1 pod]# kubectl delete pod pod1 --force
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
pod "pod1" force deleted

3.5.3 節點親和性權重

你可以為 preferredDuringSchedulingIgnoredDuringExecution 親和性型別的每個範例設定 weight 欄位,其取值範圍是 1 到 100。 當排程器找到能夠滿足 Pod 的其他排程請求的節點時,排程器會遍歷節點滿足的所有的偏好性規則, 並將對應表示式的 weight 值加和。最終的加和值會新增到該節點的其他優先順序函數的評分之上。 在排程器為 Pod 作出排程決定時,總分最高的節點的優先順序也最高。

給節點打標籤

[root@k8scloude1 pod]# kubectl label nodes k8scloude2 yy=59
node/k8scloude2 labeled

[root@k8scloude1 pod]# kubectl label nodes k8scloude3 yy=72
node/k8scloude3 labeled

建立pod,preferredDuringSchedulingIgnoredDuringExecution指定了2條軟策略,但是權重不一樣:weight: 2 和 weight: 10

[root@k8scloude1 pod]# vim preferredDuringSchedule2.yaml 

[root@k8scloude1 pod]# cat preferredDuringSchedule2.yaml 
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  labels:
    run: pod1
  name: pod1
  namespace: pod
spec:
  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 2
        preference:
          matchExpressions:
          - key: xx
            operator: Gt
            values:
            - "60"
      - weight: 10
        preference:
          matchExpressions:
          - key: yy
            operator: Gt
            values:
            - "60"
  containers:
  - image: nginx
    imagePullPolicy: IfNotPresent
    name: pod1
    resources: {}
    ports:
    - name: http
      containerPort: 80
      protocol: TCP
      hostPort: 80
  dnsPolicy: ClusterFirst
  restartPolicy: Always
status: {}

[root@k8scloude1 pod]# kubectl apply -f preferredDuringSchedule2.yaml 
pod/pod1 created

存在兩個候選節點,因為yy>60這條規則的weight權重大,所以pod執行在k8scloude3

[root@k8scloude1 pod]# kubectl get pods -o wide
NAME   READY   STATUS    RESTARTS   AGE   IP               NODE         NOMINATED NODE   READINESS GATES
pod1   1/1     Running   0          10s   10.244.251.214   k8scloude3   <none>           <none>

[root@k8scloude1 pod]# kubectl delete pod pod1 --force
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
pod "pod1" force deleted

3.6 Pod 拓撲分佈約束

你可以使用 拓撲分佈約束(Topology Spread Constraints) 來控制 Pod 在叢集內故障域之間的分佈, 故障域的範例有區域(Region)、可用區(Zone)、節點和其他使用者自定義的拓撲域。 這樣做有助於提升效能、實現高可用或提升資源利用率。