K8S 效能優化 - tw511教學網

前言

K8S 效能優化系列文章，本文為第一篇：OS sysctl 效能優化引數最佳實踐。

引數一覽

sysctl 調優引數一覽

# Kubernetes Settings
vm.max_map_count = 262144
kernel.softlockup_panic = 1
kernel.softlockup_all_cpu_backtrace = 1
net.ipv4.ip_local_reserved_ports = 30000-32767

# Increase the number of connections
net.core.somaxconn = 32768

# Maximum Socket Receive Buffer
net.core.rmem_max = 16777216

# Maximum Socket Send Buffer
net.core.wmem_max = 16777216

# Increase the maximum total buffer-space allocatable
net.ipv4.tcp_wmem = 4096 87380 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216

# Increase the number of outstanding syn requests allowed
net.ipv4.tcp_max_syn_backlog = 8096


# For persistent HTTP connections
net.ipv4.tcp_slow_start_after_idle = 0

# Allow to reuse TIME_WAIT sockets for new connections
# when it is safe from protocol viewpoint
net.ipv4.tcp_tw_reuse = 1

# Max number of packets that can be queued on interface input
# If kernel is receiving packets faster than can be processed
# this queue increases
net.core.netdev_max_backlog = 16384

# Increase size of file handles and inode cache
fs.file-max = 2097152

# Max number of inotify instances and watches for a user
# Since dockerd runs as a single user, the default instances value of 128 per user is too low
# e.g. uses of inotify: nginx ingress controller, kubectl logs -f
fs.inotify.max_user_instances = 8192
fs.inotify.max_user_watches = 524288

# Additional sysctl flags that kubelet expects
vm.overcommit_memory = 1
kernel.panic = 10
kernel.panic_on_oops = 1

# Prevent docker from changing iptables: https://github.com/kubernetes/kubernetes/issues/40182
net.ipv4.ip_forward=1

如果是 AWS，額外增加如下：

# AWS settings
# Issue #23395
net.ipv4.neigh.default.gc_thresh1=0

如果啟用了 IPv6，額外增加如下：

# Enable IPv6 forwarding for network plugins that don't do it themselves
net.ipv6.conf.all.forwarding=1

引數解釋

分類	核心引數	說明	參考連結
Kubernetes	`vm.max_map_count = 262144`	限制一個程序可以擁有的VMA(虛擬記憶體區域)的數量，一個更大的值對於 elasticsearch、mongo 或其他 mmap 使用者來說非常有用	ES Configuration
Kubernetes	`kernel.softlockup_panic = 1`	用於解決 K8S 核心軟鎖相關 bug	root cause kernel soft lockups · Issue #37853 · kubernetes/kubernetes (github.com)
Kubernetes	`kernel.softlockup_all_cpu_backtrace = 1`	用於解決 K8S 核心軟鎖相關 bug	root cause kernel soft lockups · Issue #37853 · kubernetes/kubernetes (github.com)
Kubernetes	`net.ipv4.ip_local_reserved_ports = 30000-32767`	預設 K8S Nodport 埠	service-node-port-range and ip_local_port_range collision · Issue #6342 · kubernetes/kops (github.com)
網路	`net.core.somaxconn = 32768`	表示socket監聽（listen）的backlog上限。什麼是backlog？backlog就是socket的監聽佇列，當一個請求（request）尚未被處理或建立時，他會進入backlog。增加連線數.	Image: We should tweak our sysctls · Issue #261 · kubernetes-retired/kube-deploy (github.com)
網路	`net.core.rmem_max = 16777216`	接收通訊端緩衝區大小的最大值(以位元組為單位)。最大化 Socket Receive Buffer	Image: We should tweak our sysctls · Issue #261 · kubernetes-retired/kube-deploy (github.com)
網路	`net.core.wmem_max = 16777216`	傳送通訊端緩衝區大小的最大值(以位元組為單位)。最大化 Socket Send Buffer	Image: We should tweak our sysctls · Issue #261 · kubernetes-retired/kube-deploy (github.com)
網路	net.ipv4.tcp_wmem = 4096 87380 16777216 net.ipv4.tcp_rmem = 4096 87380 16777216	增加總的可分配的 buffer 空間的最大值	Image: We should tweak our sysctls · Issue #261 · kubernetes-retired/kube-deploy (github.com)
網路	`net.ipv4.tcp_max_syn_backlog = 8096`	表示那些尚未收到使用者端確認資訊的連線（SYN訊息）佇列的長度，預設為1024 增加未完成的syn請求的數量	Image: We should tweak our sysctls · Issue #261 · kubernetes-retired/kube-deploy (github.com)
網路	`net.ipv4.tcp_slow_start_after_idle = 0`	持久化 HTTP 連線	Image: We should tweak our sysctls · Issue #261 · kubernetes-retired/kube-deploy (github.com)
網路	`net.ipv4.tcp_tw_reuse = 1`	表示允許重用TIME_WAIT狀態的通訊端用於新的TCP連線,預設為0，表示關閉。允許在協定安全的情況下重用TIME_WAIT 通訊端用於新的連線	Image: We should tweak our sysctls · Issue #261 · kubernetes-retired/kube-deploy (github.com)
網路	`net.core.netdev_max_backlog = 16384`	當網路卡接收封包的速度大於核心處理的速度時，會有一個佇列儲存這些封包。這個參數列示該佇列的最大值如果核心接收封包的速度超過了可以處理的速度，這個佇列就會增加	Image: We should tweak our sysctls · Issue #261 · kubernetes-retired/kube-deploy (github.com)
檔案系統	`fs.file-max = 2097152`	該引數決定了系統中所允許的檔案控制程式碼最大數目，檔案控制程式碼設定代表linux系統中可以開啟的檔案的數量。增加檔案控制程式碼和inode快取的大小	Image: We should tweak our sysctls · Issue #261 · kubernetes-retired/kube-deploy (github.com)
檔案系統	fs.inotify.max_user_instances = 8192 fs.inotify.max_user_watches = 524288	一個使用者的inotify範例和watch的最大數量由於dockerd作為單個使用者執行，每個使用者的預設範例值128太低了例如使用inotify: nginx ingress controller, kubectl logs -f	Image: We should tweak our sysctls · Issue #261 · kubernetes-retired/kube-deploy (github.com)
kubelet	`vm.overcommit_memory = 1`	對記憶體分配的一種策略 =1，表示核心允許分配所有的實體記憶體，而不管當前的記憶體狀態如何	Image: We should tweak our sysctls · Issue #261 · kubernetes-retired/kube-deploy (github.com)
kubelet	`kernel.panic = 10`	panic錯誤中自動重啟，等待時間為10秒	Image: We should tweak our sysctls · Issue #261 · kubernetes-retired/kube-deploy (github.com)
kubelet	`kernel.panic_on_oops = 1`	在Oops發生時會進行panic()操作	Image: We should tweak our sysctls · Issue #261 · kubernetes-retired/kube-deploy (github.com)
網路	`net.ipv4.ip_forward=1`	啟用ip轉發另外也防止docker改變iptables	Upgrading docker 1.13 on nodes causes outbound container traffic to stop working · Issue #40182 · kubernetes/kubernetes (github.com)
網路	`net.ipv4.neigh.default.gc_thresh1=0`	修復 AWS `arp_cache: neighbor table overflow!` 報錯	arp_cache: neighbor table overflow! · Issue #4533 · kubernetes/kops (github.com)

EOF

三人行, 必有我師; 知識共用, 天下為公. 本文由東風微鳴技術部落格 EWhisper.cn 編寫.