分散式儲存系統之Ceph叢集MDS擴充套件

2022-10-10 12:00:26

  前文我們瞭解了cephfs使用相關話題,回顧請參考https://www.cnblogs.com/qiuhom-1874/p/16758866.html;今天我們來聊一聊MDS元件擴充套件相關話題;

  我們知道MDS是為了實現cephfs而執行的程序,主要負責管理檔案系統後設資料資訊;這意味著使用者端使用cephfs存取資料,都會先聯絡mds找後設資料;然後mds再去後設資料儲存池讀取資料,然後返回給使用者端;即元素儲存池只能由mds操作;換句話說,mds是存取cephfs的唯一入口;那麼問題來了,如果ceph叢集上只有一個mds程序,很多個使用者端來存取cephfs,那麼mds肯定會成為瓶頸,所以為了提高cephfs的效能,我們必須提供多個mds供使用者端使用;那mds該怎麼擴充套件呢?前邊我們說過,mds是管理檔案系統元素資訊,將元素資訊儲存池至rados叢集的指定儲存池中,使得mds從有狀態變為無狀態;那麼對於mds來說,擴充套件mds就是多執行幾個程序而已;但是由於檔案系統後設資料的工作特性,我們不能像擴充套件其他無狀態應用那樣擴充套件;比如,在ceph叢集上有兩個mds,他們同時操作一個儲存池中的一個檔案,那麼最後合併時發現,一個刪除檔案,一個修改了檔案,合併檔案系統崩潰了;即兩個mds同時操作儲存池的同一個檔案那麼對應mds需要同步和資料一致,這和副本有什麼區別呢?對於使用者端讀請求可以由多個mds分散負載,對於使用者端的寫請求呢,向a寫入,b該怎麼辦呢?b只能從a這邊同步,或者a向b寫入,這樣一來對於使用者端的寫請求並不能分散負載,即當用戶端增多,瓶頸依然存在;

  為了解決分散負載檔案系統的讀寫請求,分散式檔案系統業界提供了將名稱空間分割治理的解決方案,通過將檔案系統根樹及其熱點子樹分別部署於不同的後設資料伺服器進行負載均衡,從而賦予了後設資料儲存線性擴充套件的可能;簡單講就是一個mds之複製一個子目錄的後設資料資訊;

  後設資料分割區

  提示:如上所示,我們將一個檔案系統可以分成多顆子樹,一個mds只複製其中一顆子樹,從而實現後設資料資訊的讀寫分散負載;

  常用的後設資料分割區方式

  1、靜態子樹分割區:所謂靜態子樹分割區,就是管理員手動指定某顆指數,由某個後設資料伺服器負責;如,我們將nfs掛載之一個目錄下,這種方式就是靜態子樹分割區,通過將一個子目錄關聯到另外一個分割區上去,從而實現減輕當前檔案系統的負載;

  2、靜態hash分割區:所謂靜態hash分割區是指,有多個目錄,對應檔案儲存到那個目錄下,不是管理員指定而是通過對檔名做一致性hash或者hash再取模等等,最終落到那個目錄就儲存到那個目錄;從而減輕對應子目錄在當前檔案系統的負載;

  3、惰性混編分割區:所謂惰性混編分割區是指將靜態hash方式和傳統檔案系統的方式結合使用;

  4、動態子樹分割區:所謂動態子樹分割區就是根據檔案系統的負載能力動態調整對應子樹;cephfs就是使用這種方式實現多活mds;在ceph上多主MDS模式是指CephFS將整個檔案系統的名稱空間切分為多個子樹並設定到多個MDS之上,不過,讀寫操作的負載均衡策略分別是子樹切分和目錄副本;將寫操作負載較重的目錄切分成多個子目錄以分散負載;為讀操作負載較重的目錄建立多個副本以均衡負載;子樹分割區和遷移的決策是一個同步過程,各MDS每10秒鐘做一次獨立的遷移決策,每個MDS並不存在一個一致的名稱空間檢視,且MDS叢集也不存在一個全域性排程器負責統一的排程決策;各MDS彼此間通過交換心跳資訊(HeartBeat,簡稱HB)及負載狀態來確定是否要進行遷移、如何分割區名稱空間,以及是否需要目錄切分為子樹等;管理員也可以設定CephFS負載的計算方式從而影響MDS的負載決策,目前,CephFS支援基於CPU負載、檔案系統負載及混合此兩種的決策機制;

  動態子樹分割區依賴於共用儲存完成熱點負載在MDS間的遷移,於是Ceph把MDS的後設資料儲存於後面的RADOS叢集上的專用儲存池中,此儲存池可由多個MDS共用;MDS對後設資料的存取並不直接基於RADOS進行,而是為其提供了一個基於記憶體的快取區以快取熱點後設資料,並且在後設資料相關紀錄檔條目過期之前將一直儲存於記憶體中;

  CephFS使用後設資料紀錄檔來解決容錯問題

  後設資料紀錄檔資訊流式儲存於CephFS後設資料儲存池中的後設資料紀錄檔檔案上,類似於LFS(Log-Structured File System)和WAFL( Write Anywhere File Layout)的工作機制, CephFS後設資料紀錄檔檔案的體積可以無限增長以確保紀錄檔資訊能順序寫入RADOS,並額外賦予守護行程修剪冗餘或不相關紀錄檔條目的能力;

  Multi MDS

  每個CephFS都會有一個易讀的檔案系統名稱和一個稱為FSCID識別符號ID,並且每個CephFS預設情況下都只設定一個Active MDS守護行程;一個MDS叢集中可處於Active狀態的MDS數量的上限由max_mds引數設定,它控制著可用的rank數量,預設值為1; rank是指CephFS上可同時處於Active狀態的MDS守護行程的可用編號,其範圍從0到max_mds-1;一個rank編號意味著一個可承載CephFS層級檔案系統目錄子樹 目錄子樹後設資料管理功能的Active狀態的ceph-mds守護行程編制,max_mds的值為1時意味著僅有一個0號rank可用; 剛啟動的ceph-mds守護行程沒有接管任何rank,它隨後由MON按需進行分配;一個ceph-mds一次僅可佔據一個rank,並且在守護行程終止時將其釋放;即rank分配出去以後具有排它性;一個rank可以處於下列三種狀態中的某一種,Up:rank已經由某個ceph-mds守護行程接管; Failed:rank未被任何ceph-mds守護行程接管; Damaged:rank處於損壞狀態,其後設資料處於崩潰或丟失狀態;在管理員修復問題並對其執行「ceph mds repaired」命令之前,處於Damaged狀態的rank不能分配給其它任何MDS守護行程;

  檢視ceph叢集mds狀態

[root@ceph-admin ~]# ceph mds stat
cephfs-1/1/1 up  {0=ceph-mon02=up:active}
[root@ceph-admin ~]# 

  提示:可以看到當前叢集有一個mds執行在ceph-mon02節點並處於up活動狀態;

  部署多個mds

[root@ceph-admin ~]# ceph-deploy mds create ceph-mon01 ceph-mon03 
[ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (2.0.1): /usr/bin/ceph-deploy mds create ceph-mon01 ceph-mon03
[ceph_deploy.cli][INFO  ] ceph-deploy options:
[ceph_deploy.cli][INFO  ]  username                      : None
[ceph_deploy.cli][INFO  ]  verbose                       : False
[ceph_deploy.cli][INFO  ]  overwrite_conf                : False
[ceph_deploy.cli][INFO  ]  subcommand                    : create
[ceph_deploy.cli][INFO  ]  quiet                         : False
[ceph_deploy.cli][INFO  ]  cd_conf                       : <ceph_deploy.conf.cephdeploy.Conf instance at 0x7f9478f34830>
[ceph_deploy.cli][INFO  ]  cluster                       : ceph
[ceph_deploy.cli][INFO  ]  func                          : <function mds at 0x7f947918d050>
[ceph_deploy.cli][INFO  ]  ceph_conf                     : None
[ceph_deploy.cli][INFO  ]  mds                           : [('ceph-mon01', 'ceph-mon01'), ('ceph-mon03', 'ceph-mon03')]
[ceph_deploy.cli][INFO  ]  default_release               : False
[ceph_deploy][ERROR ] ConfigError: Cannot load config: [Errno 2] No such file or directory: 'ceph.conf'; has `ceph-deploy new` been run in this directory?

[root@ceph-admin ~]# su - cephadm
Last login: Thu Sep 29 23:09:04 CST 2022 on pts/0
[cephadm@ceph-admin ~]$ ls
cephadm@ceph-mgr01  cephadm@ceph-mgr02  cephadm@ceph-mon01  cephadm@ceph-mon02  cephadm@ceph-mon03  ceph-cluster
[cephadm@ceph-admin ~]$ cd ceph-cluster/
[cephadm@ceph-admin ceph-cluster]$ ls
ceph.bootstrap-mds.keyring  ceph.bootstrap-osd.keyring  ceph.client.admin.keyring  ceph-deploy-ceph.log
ceph.bootstrap-mgr.keyring  ceph.bootstrap-rgw.keyring  ceph.conf                  ceph.mon.keyring
[cephadm@ceph-admin ceph-cluster]$ ceph-deploy mds create ceph-mon01 ceph-mon03 
[ceph_deploy.conf][DEBUG ] found configuration file at: /home/cephadm/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (2.0.1): /bin/ceph-deploy mds create ceph-mon01 ceph-mon03
[ceph_deploy.cli][INFO  ] ceph-deploy options:
[ceph_deploy.cli][INFO  ]  username                      : None
[ceph_deploy.cli][INFO  ]  verbose                       : False
[ceph_deploy.cli][INFO  ]  overwrite_conf                : False
[ceph_deploy.cli][INFO  ]  subcommand                    : create
[ceph_deploy.cli][INFO  ]  quiet                         : False
[ceph_deploy.cli][INFO  ]  cd_conf                       : <ceph_deploy.conf.cephdeploy.Conf instance at 0x7f2c575ba7e8>
[ceph_deploy.cli][INFO  ]  cluster                       : ceph
[ceph_deploy.cli][INFO  ]  func                          : <function mds at 0x7f2c57813050>
[ceph_deploy.cli][INFO  ]  ceph_conf                     : None
[ceph_deploy.cli][INFO  ]  mds                           : [('ceph-mon01', 'ceph-mon01'), ('ceph-mon03', 'ceph-mon03')]
[ceph_deploy.cli][INFO  ]  default_release               : False
[ceph_deploy.mds][DEBUG ] Deploying mds, cluster ceph hosts ceph-mon01:ceph-mon01 ceph-mon03:ceph-mon03
[ceph-mon01][DEBUG ] connection detected need for sudo
[ceph-mon01][DEBUG ] connected to host: ceph-mon01 
[ceph-mon01][DEBUG ] detect platform information from remote host
[ceph-mon01][DEBUG ] detect machine type
[ceph_deploy.mds][INFO  ] Distro info: CentOS Linux 7.9.2009 Core
[ceph_deploy.mds][DEBUG ] remote host will use systemd
[ceph_deploy.mds][DEBUG ] deploying mds bootstrap to ceph-mon01
[ceph-mon01][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[ceph_deploy.mds][ERROR ] RuntimeError: config file /etc/ceph/ceph.conf exists with different content; use --overwrite-conf to overwrite
[ceph-mon03][DEBUG ] connection detected need for sudo
[ceph-mon03][DEBUG ] connected to host: ceph-mon03 
[ceph-mon03][DEBUG ] detect platform information from remote host
[ceph-mon03][DEBUG ] detect machine type
[ceph_deploy.mds][INFO  ] Distro info: CentOS Linux 7.9.2009 Core
[ceph_deploy.mds][DEBUG ] remote host will use systemd
[ceph_deploy.mds][DEBUG ] deploying mds bootstrap to ceph-mon03
[ceph-mon03][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[ceph-mon03][WARNIN] mds keyring does not exist yet, creating one
[ceph-mon03][DEBUG ] create a keyring file
[ceph-mon03][DEBUG ] create path if it doesn't exist
[ceph-mon03][INFO  ] Running command: sudo ceph --cluster ceph --name client.bootstrap-mds --keyring /var/lib/ceph/bootstrap-mds/ceph.keyring auth get-or-create mds.ceph-mon03 osd allow rwx mds allow mon allow profile mds -o /var/lib/ceph/mds/ceph-ceph-mon03/keyring
[ceph-mon03][INFO  ] Running command: sudo systemctl enable ceph-mds@ceph-mon03
[ceph-mon03][WARNIN] Created symlink from /etc/systemd/system/ceph-mds.target.wants/[email protected] to /usr/lib/systemd/system/[email protected].
[ceph-mon03][INFO  ] Running command: sudo systemctl start ceph-mds@ceph-mon03
[ceph-mon03][INFO  ] Running command: sudo systemctl enable ceph.target
[ceph_deploy][ERROR ] GenericError: Failed to create 1 MDSs

[cephadm@ceph-admin ceph-cluster]$ 

  提示:這裡出了兩個錯誤,第一個錯誤是沒有找到ceph.conf檔案,解決辦法就是切換至cephadm使用者執行ceph-deploy mds create命令;第二個錯誤是告訴我們說遠端主機上的組態檔和我們本地組態檔不一樣;解決辦法,可以先推播組態檔到叢集各主機之上或者從叢集主機拉取組態檔到本地然後在分發組態檔,然後在部署mds;

  檢視本地組態檔和遠端叢集主機組態檔

[cephadm@ceph-admin ceph-cluster]$ cat /etc/ceph/ceph.conf    
[global]
fsid = 7fd4a619-9767-4b46-9cee-78b9dfe88f34
mon_initial_members = ceph-mon01
mon_host = 192.168.0.71
public_network = 192.168.0.0/24
cluster_network = 172.16.30.0/24
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx

[cephadm@ceph-admin ceph-cluster]$ ssh ceph-mon01 'cat /etc/ceph/ceph.conf'
[global]
fsid = 7fd4a619-9767-4b46-9cee-78b9dfe88f34
mon_initial_members = ceph-mon01
mon_host = 192.168.0.71
public_network = 192.168.0.0/24
cluster_network = 172.16.30.0/24
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx

[client]
rgw_frontends = "civetweb port=8080"
[cephadm@ceph-admin ceph-cluster]$ 

  提示:可以看到ceph-mon01節點上的組態檔中多了一個client的設定段;

  從ceph-mon01拉去組態檔到本地

[cephadm@ceph-admin ceph-cluster]$ ceph-deploy config pull ceph-mon01                  
[ceph_deploy.conf][DEBUG ] found configuration file at: /home/cephadm/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (2.0.1): /bin/ceph-deploy config pull ceph-mon01
[ceph_deploy.cli][INFO  ] ceph-deploy options:
[ceph_deploy.cli][INFO  ]  username                      : None
[ceph_deploy.cli][INFO  ]  verbose                       : False
[ceph_deploy.cli][INFO  ]  overwrite_conf                : False
[ceph_deploy.cli][INFO  ]  subcommand                    : pull
[ceph_deploy.cli][INFO  ]  quiet                         : False
[ceph_deploy.cli][INFO  ]  cd_conf                       : <ceph_deploy.conf.cephdeploy.Conf instance at 0x7f966fb478c0>
[ceph_deploy.cli][INFO  ]  cluster                       : ceph
[ceph_deploy.cli][INFO  ]  client                        : ['ceph-mon01']
[ceph_deploy.cli][INFO  ]  func                          : <function config at 0x7f966fd76cf8>
[ceph_deploy.cli][INFO  ]  ceph_conf                     : None
[ceph_deploy.cli][INFO  ]  default_release               : False
[ceph_deploy.config][DEBUG ] Checking ceph-mon01 for /etc/ceph/ceph.conf
[ceph-mon01][DEBUG ] connection detected need for sudo
[ceph-mon01][DEBUG ] connected to host: ceph-mon01 
[ceph-mon01][DEBUG ] detect platform information from remote host
[ceph-mon01][DEBUG ] detect machine type
[ceph-mon01][DEBUG ] fetch remote file
[ceph_deploy.config][DEBUG ] Got /etc/ceph/ceph.conf from ceph-mon01
[ceph_deploy.config][ERROR ] local config file ceph.conf exists with different content; use --overwrite-conf to overwrite
[ceph_deploy.config][ERROR ] Unable to pull /etc/ceph/ceph.conf from ceph-mon01
[ceph_deploy][ERROR ] GenericError: Failed to fetch config from 1 hosts

[cephadm@ceph-admin ceph-cluster]$ ceph-deploy --overwrite-conf config pull ceph-mon01
[ceph_deploy.conf][DEBUG ] found configuration file at: /home/cephadm/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (2.0.1): /bin/ceph-deploy --overwrite-conf config pull ceph-mon01
[ceph_deploy.cli][INFO  ] ceph-deploy options:
[ceph_deploy.cli][INFO  ]  username                      : None
[ceph_deploy.cli][INFO  ]  verbose                       : False
[ceph_deploy.cli][INFO  ]  overwrite_conf                : True
[ceph_deploy.cli][INFO  ]  subcommand                    : pull
[ceph_deploy.cli][INFO  ]  quiet                         : False
[ceph_deploy.cli][INFO  ]  cd_conf                       : <ceph_deploy.conf.cephdeploy.Conf instance at 0x7fa2f65438c0>
[ceph_deploy.cli][INFO  ]  cluster                       : ceph
[ceph_deploy.cli][INFO  ]  client                        : ['ceph-mon01']
[ceph_deploy.cli][INFO  ]  func                          : <function config at 0x7fa2f6772cf8>
[ceph_deploy.cli][INFO  ]  ceph_conf                     : None
[ceph_deploy.cli][INFO  ]  default_release               : False
[ceph_deploy.config][DEBUG ] Checking ceph-mon01 for /etc/ceph/ceph.conf
[ceph-mon01][DEBUG ] connection detected need for sudo
[ceph-mon01][DEBUG ] connected to host: ceph-mon01 
[ceph-mon01][DEBUG ] detect platform information from remote host
[ceph-mon01][DEBUG ] detect machine type
[ceph-mon01][DEBUG ] fetch remote file
[ceph_deploy.config][DEBUG ] Got /etc/ceph/ceph.conf from ceph-mon01
[cephadm@ceph-admin ceph-cluster]$ ls
ceph.bootstrap-mds.keyring  ceph.bootstrap-osd.keyring  ceph.client.admin.keyring  ceph-deploy-ceph.log
ceph.bootstrap-mgr.keyring  ceph.bootstrap-rgw.keyring  ceph.conf                  ceph.mon.keyring
[cephadm@ceph-admin ceph-cluster]$ cat ceph.conf 
[global]
fsid = 7fd4a619-9767-4b46-9cee-78b9dfe88f34
mon_initial_members = ceph-mon01
mon_host = 192.168.0.71
public_network = 192.168.0.0/24
cluster_network = 172.16.30.0/24
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx

[client]
rgw_frontends = "civetweb port=8080"
[cephadm@ceph-admin ceph-cluster]$ 

  提示:如果本地組態檔存在需要加上--overwrite-conf選項強制將覆蓋原有組態檔

  再次將本地組態檔分發至叢集各主機

[cephadm@ceph-admin ceph-cluster]$ ceph-deploy --overwrite-conf config push ceph-mon01 ceph-mon02 ceph-mon03 ceph-mgr01 ceph-mgr02 
[ceph_deploy.conf][DEBUG ] found configuration file at: /home/cephadm/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (2.0.1): /bin/ceph-deploy --overwrite-conf config push ceph-mon01 ceph-mon02 ceph-mon03 ceph-mgr01 ceph-mgr02
[ceph_deploy.cli][INFO  ] ceph-deploy options:
[ceph_deploy.cli][INFO  ]  username                      : None
[ceph_deploy.cli][INFO  ]  verbose                       : False
[ceph_deploy.cli][INFO  ]  overwrite_conf                : True
[ceph_deploy.cli][INFO  ]  subcommand                    : push
[ceph_deploy.cli][INFO  ]  quiet                         : False
[ceph_deploy.cli][INFO  ]  cd_conf                       : <ceph_deploy.conf.cephdeploy.Conf instance at 0x7fcf983488c0>
[ceph_deploy.cli][INFO  ]  cluster                       : ceph
[ceph_deploy.cli][INFO  ]  client                        : ['ceph-mon01', 'ceph-mon02', 'ceph-mon03', 'ceph-mgr01', 'ceph-mgr02']
[ceph_deploy.cli][INFO  ]  func                          : <function config at 0x7fcf98577cf8>
[ceph_deploy.cli][INFO  ]  ceph_conf                     : None
[ceph_deploy.cli][INFO  ]  default_release               : False
[ceph_deploy.config][DEBUG ] Pushing config to ceph-mon01
[ceph-mon01][DEBUG ] connection detected need for sudo
[ceph-mon01][DEBUG ] connected to host: ceph-mon01 
[ceph-mon01][DEBUG ] detect platform information from remote host
[ceph-mon01][DEBUG ] detect machine type
[ceph-mon01][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[ceph_deploy.config][DEBUG ] Pushing config to ceph-mon02
[ceph-mon02][DEBUG ] connection detected need for sudo
[ceph-mon02][DEBUG ] connected to host: ceph-mon02 
[ceph-mon02][DEBUG ] detect platform information from remote host
[ceph-mon02][DEBUG ] detect machine type
[ceph-mon02][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[ceph_deploy.config][DEBUG ] Pushing config to ceph-mon03
[ceph-mon03][DEBUG ] connection detected need for sudo
[ceph-mon03][DEBUG ] connected to host: ceph-mon03 
[ceph-mon03][DEBUG ] detect platform information from remote host
[ceph-mon03][DEBUG ] detect machine type
[ceph-mon03][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[ceph_deploy.config][DEBUG ] Pushing config to ceph-mgr01
[ceph-mgr01][DEBUG ] connection detected need for sudo
[ceph-mgr01][DEBUG ] connected to host: ceph-mgr01 
[ceph-mgr01][DEBUG ] detect platform information from remote host
[ceph-mgr01][DEBUG ] detect machine type
[ceph-mgr01][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[ceph_deploy.config][DEBUG ] Pushing config to ceph-mgr02
[ceph-mgr02][DEBUG ] connection detected need for sudo
[ceph-mgr02][DEBUG ] connected to host: ceph-mgr02 
[ceph-mgr02][DEBUG ] detect platform information from remote host
[ceph-mgr02][DEBUG ] detect machine type
[ceph-mgr02][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[cephadm@ceph-admin ceph-cluster]$

  再次部署MDS

[cephadm@ceph-admin ceph-cluster]$ ceph-deploy mds create ceph-mon01 ceph-mon03
[ceph_deploy.conf][DEBUG ] found configuration file at: /home/cephadm/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (2.0.1): /bin/ceph-deploy mds create ceph-mon01 ceph-mon03
[ceph_deploy.cli][INFO  ] ceph-deploy options:
[ceph_deploy.cli][INFO  ]  username                      : None
[ceph_deploy.cli][INFO  ]  verbose                       : False
[ceph_deploy.cli][INFO  ]  overwrite_conf                : False
[ceph_deploy.cli][INFO  ]  subcommand                    : create
[ceph_deploy.cli][INFO  ]  quiet                         : False
[ceph_deploy.cli][INFO  ]  cd_conf                       : <ceph_deploy.conf.cephdeploy.Conf instance at 0x7fc39019c7e8>
[ceph_deploy.cli][INFO  ]  cluster                       : ceph
[ceph_deploy.cli][INFO  ]  func                          : <function mds at 0x7fc3903f5050>
[ceph_deploy.cli][INFO  ]  ceph_conf                     : None
[ceph_deploy.cli][INFO  ]  mds                           : [('ceph-mon01', 'ceph-mon01'), ('ceph-mon03', 'ceph-mon03')]
[ceph_deploy.cli][INFO  ]  default_release               : False
[ceph_deploy.mds][DEBUG ] Deploying mds, cluster ceph hosts ceph-mon01:ceph-mon01 ceph-mon03:ceph-mon03
[ceph-mon01][DEBUG ] connection detected need for sudo
[ceph-mon01][DEBUG ] connected to host: ceph-mon01 
[ceph-mon01][DEBUG ] detect platform information from remote host
[ceph-mon01][DEBUG ] detect machine type
[ceph_deploy.mds][INFO  ] Distro info: CentOS Linux 7.9.2009 Core
[ceph_deploy.mds][DEBUG ] remote host will use systemd
[ceph_deploy.mds][DEBUG ] deploying mds bootstrap to ceph-mon01
[ceph-mon01][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[ceph-mon01][WARNIN] mds keyring does not exist yet, creating one
[ceph-mon01][DEBUG ] create a keyring file
[ceph-mon01][DEBUG ] create path if it doesn't exist
[ceph-mon01][INFO  ] Running command: sudo ceph --cluster ceph --name client.bootstrap-mds --keyring /var/lib/ceph/bootstrap-mds/ceph.keyring auth get-or-create mds.ceph-mon01 osd allow rwx mds allow mon allow profile mds -o /var/lib/ceph/mds/ceph-ceph-mon01/keyring
[ceph-mon01][INFO  ] Running command: sudo systemctl enable ceph-mds@ceph-mon01
[ceph-mon01][WARNIN] Created symlink from /etc/systemd/system/ceph-mds.target.wants/[email protected] to /usr/lib/systemd/system/[email protected].
[ceph-mon01][INFO  ] Running command: sudo systemctl start ceph-mds@ceph-mon01
[ceph-mon01][INFO  ] Running command: sudo systemctl enable ceph.target
[ceph-mon03][DEBUG ] connection detected need for sudo
[ceph-mon03][DEBUG ] connected to host: ceph-mon03 
[ceph-mon03][DEBUG ] detect platform information from remote host
[ceph-mon03][DEBUG ] detect machine type
[ceph_deploy.mds][INFO  ] Distro info: CentOS Linux 7.9.2009 Core
[ceph_deploy.mds][DEBUG ] remote host will use systemd
[ceph_deploy.mds][DEBUG ] deploying mds bootstrap to ceph-mon03
[ceph-mon03][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[ceph-mon03][DEBUG ] create path if it doesn't exist
[ceph-mon03][INFO  ] Running command: sudo ceph --cluster ceph --name client.bootstrap-mds --keyring /var/lib/ceph/bootstrap-mds/ceph.keyring auth get-or-create mds.ceph-mon03 osd allow rwx mds allow mon allow profile mds -o /var/lib/ceph/mds/ceph-ceph-mon03/keyring
[ceph-mon03][INFO  ] Running command: sudo systemctl enable ceph-mds@ceph-mon03
[ceph-mon03][INFO  ] Running command: sudo systemctl start ceph-mds@ceph-mon03
[ceph-mon03][INFO  ] Running command: sudo systemctl enable ceph.target
[cephadm@ceph-admin ceph-cluster]$

  檢視msd狀態

[cephadm@ceph-admin ceph-cluster]$ ceph mds stat
cephfs-1/1/1 up  {0=ceph-mon02=up:active}, 2 up:standby
[cephadm@ceph-admin ceph-cluster]$ ceph fs status cephfs
cephfs - 0 clients
======
+------+--------+------------+---------------+-------+-------+
| Rank | State  |    MDS     |    Activity   |  dns  |  inos |
+------+--------+------------+---------------+-------+-------+
|  0   | active | ceph-mon02 | Reqs:    0 /s |   18  |   17  |
+------+--------+------------+---------------+-------+-------+
+---------------------+----------+-------+-------+
|         Pool        |   type   |  used | avail |
+---------------------+----------+-------+-------+
| cephfs-metadatapool | metadata | 59.8k |  280G |
|   cephfs-datapool   |   data   | 3391k |  280G |
+---------------------+----------+-------+-------+
+-------------+
| Standby MDS |
+-------------+
|  ceph-mon03 |
|  ceph-mon01 |
+-------------+
MDS version: ceph version 13.2.10 (564bdc4ae87418a232fc901524470e1a0f76d641) mimic (stable)
[cephadm@ceph-admin ceph-cluster]$ 

  提示:可以看到現在有兩個mds處於standby狀態,一個active狀態mds;

  管理rank

  增加Active MDS的數量命令格式:ceph fs set <fsname> max_mds <number>

[cephadm@ceph-admin ceph-cluster]$ ceph fs set cephfs max_mds 2
[cephadm@ceph-admin ceph-cluster]$ ceph fs status cephfs
cephfs - 0 clients
======
+------+--------+------------+---------------+-------+-------+
| Rank | State  |    MDS     |    Activity   |  dns  |  inos |
+------+--------+------------+---------------+-------+-------+
|  0   | active | ceph-mon02 | Reqs:    0 /s |   18  |   17  |
|  1   | active | ceph-mon01 | Reqs:    0 /s |   10  |   13  |
+------+--------+------------+---------------+-------+-------+
+---------------------+----------+-------+-------+
|         Pool        |   type   |  used | avail |
+---------------------+----------+-------+-------+
| cephfs-metadatapool | metadata | 61.1k |  280G |
|   cephfs-datapool   |   data   | 3391k |  280G |
+---------------------+----------+-------+-------+
+-------------+
| Standby MDS |
+-------------+
|  ceph-mon03 |
+-------------+
MDS version: ceph version 13.2.10 (564bdc4ae87418a232fc901524470e1a0f76d641) mimic (stable)
[cephadm@ceph-admin ceph-cluster]$ 

  提示:僅當存在某個備用守護行程可供新rank使用時,檔案系統中的實際rank數才會增加;多活MDS的場景依然要求存在備用的冗餘主機以實現服務HA,因此max_mds的值總是應該比實際可用的MDS數量至少小1;

  降低Acitve MDS的數量

  減小max_mds的值僅會限制新的rank的建立,對於已經存在的Active MDS及持有的rank不造成真正的影響,因此降低max_mds的值後,管理員需要手動關閉不再不再被需要的rank;命令格式:ceph mds deactivate {System:rank|FSID:rank|rank}

[cephadm@ceph-admin ceph-cluster]$ ceph fs set cephfs max_mds 1
[cephadm@ceph-admin ceph-cluster]$ ceph fs status
cephfs - 0 clients
======
+------+----------+------------+---------------+-------+-------+
| Rank |  State   |    MDS     |    Activity   |  dns  |  inos |
+------+----------+------------+---------------+-------+-------+
|  0   |  active  | ceph-mon02 | Reqs:    0 /s |   18  |   17  |
|  1   | stopping | ceph-mon01 |               |   10  |   13  |
+------+----------+------------+---------------+-------+-------+
+---------------------+----------+-------+-------+
|         Pool        |   type   |  used | avail |
+---------------------+----------+-------+-------+
| cephfs-metadatapool | metadata | 61.6k |  280G |
|   cephfs-datapool   |   data   | 3391k |  280G |
+---------------------+----------+-------+-------+
+-------------+
| Standby MDS |
+-------------+
|  ceph-mon03 |
+-------------+
MDS version: ceph version 13.2.10 (564bdc4ae87418a232fc901524470e1a0f76d641) mimic (stable)
[cephadm@ceph-admin ceph-cluster]$ ceph mds deactivate cephfs:1
Error ENOTSUP: command is obsolete; please check usage and/or man page
[cephadm@ceph-admin ceph-cluster]$ ceph fs status              
cephfs - 0 clients
======
+------+--------+------------+---------------+-------+-------+
| Rank | State  |    MDS     |    Activity   |  dns  |  inos |
+------+--------+------------+---------------+-------+-------+
|  0   | active | ceph-mon02 | Reqs:    0 /s |   18  |   17  |
+------+--------+------------+---------------+-------+-------+
+---------------------+----------+-------+-------+
|         Pool        |   type   |  used | avail |
+---------------------+----------+-------+-------+
| cephfs-metadatapool | metadata | 62.1k |  280G |
|   cephfs-datapool   |   data   | 3391k |  280G |
+---------------------+----------+-------+-------+
+-------------+
| Standby MDS |
+-------------+
|  ceph-mon03 |
|  ceph-mon01 |
+-------------+
MDS version: ceph version 13.2.10 (564bdc4ae87418a232fc901524470e1a0f76d641) mimic (stable)
[cephadm@ceph-admin ceph-cluster]$

  提示:雖然我們執行ceph deactivate 命令對應提示我們命令過時,但對應mds還是被還原了;

  手動分配目錄子樹至rank

  多Active MDS的CephFS叢集上會執行一個均衡器用於排程後設資料負載,這種模式通常足以滿足大多數使用者的需求;個別場景中,使用者需要使用後設資料到特定級別的顯式對映來覆蓋動態平衡器,以在整個叢集上自定義分配應用負載;針對此目的提供的機制稱為「匯出關聯」,它是目錄的擴充套件屬性ceph.dir.pin;目錄屬性設定命令:setfattr -n ceph.dir.pin -v RANK /PATH/TO/DIR;擴充套件屬性的值 ( -v ) 是要將目錄子樹指定到的rank 預設為-1,表示不關聯該目錄;目錄匯出關聯繼承自設定了匯出關聯的最近的父級,因此,對某個目錄設定匯出關聯會影響該目錄的所有子級目錄;

[cephadm@ceph-admin ceph-cluster]$ sefattr
-bash: sefattr: command not found
[cephadm@ceph-admin ceph-cluster]$ yum provides setfattr
Loaded plugins: fastestmirror
Repository epel is listed more than once in the configuration
Repository epel-debuginfo is listed more than once in the configuration
Repository epel-source is listed more than once in the configuration
Loading mirror speeds from cached hostfile
 * base: mirrors.aliyun.com
 * extras: mirrors.aliyun.com
 * updates: mirrors.aliyun.com
attr-2.4.46-13.el7.x86_64 : Utilities for managing filesystem extended attributes
Repo        : base
Matched from:
Filename    : /usr/bin/setfattr



[cephadm@ceph-admin ceph-cluster]$ 

  提示:前提是我們系統上要有setfattr命令,如果沒有可以安裝attr這個包即可;

  MDS故障轉移機制

  出於冗餘的目的,每個CephFS上都應該設定一定數量Standby狀態的ceph-mds守護行程等著接替失效的rank,CephFS提供了四個選項用於控制Standby狀態的MDS守護行程如何工作;

  1、 mds_standby_replay:布林型值,true表示當前MDS守護行程將持續讀取某個特定的Up狀態的rank的後設資料紀錄檔,從而持有相關rank的後設資料快取,並在此rank失效時加速故障切換; 一個Up狀態的rank僅能擁有一個replay守護行程,多出的會被自動降級為正常的非replay型MDS;

  2、 mds_standby_for_name:設定當前MDS程序僅備用於指定名稱的rank;

  3、 mds_standby_for_rank:設定當前MDS程序僅備用於指定的rank,它不會接替任何其它失效的rank;不過,在有著多個CephFS的場景中,可聯合使用下面的引數來指定為哪個檔案系統的rank進行冗餘;

  4、 mds_standby_for_fscid:聯合mds_standby_for_rank引數的值協同生效;同時設定了mds_standby_for_rank:備用於指定fscid的指定rank;未設定mds_standby_for_rank時:備用於指定fscid的任意rank;

  設定冗餘mds

  提示:上述設定表示ceph-mon03這個冗餘的mds開啟對ceph-mon01做實時備份,但ceph-mon01故障,對應ceph-mon03自動接管ceph-mon01負責的rank;

  推播設定到叢集各主機

[cephadm@ceph-admin ceph-cluster]$ ceph-deploy --overwrite-conf config push ceph-mon01 ceph-mon02 ceph-mon03 ceph-mgr01 ceph-mgr02
[ceph_deploy.conf][DEBUG ] found configuration file at: /home/cephadm/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (2.0.1): /bin/ceph-deploy --overwrite-conf config push ceph-mon01 ceph-mon02 ceph-mon03 ceph-mgr01 ceph-mgr02
[ceph_deploy.cli][INFO  ] ceph-deploy options:
[ceph_deploy.cli][INFO  ]  username                      : None
[ceph_deploy.cli][INFO  ]  verbose                       : False
[ceph_deploy.cli][INFO  ]  overwrite_conf                : True
[ceph_deploy.cli][INFO  ]  subcommand                    : push
[ceph_deploy.cli][INFO  ]  quiet                         : False
[ceph_deploy.cli][INFO  ]  cd_conf                       : <ceph_deploy.conf.cephdeploy.Conf instance at 0x7f03332968c0>
[ceph_deploy.cli][INFO  ]  cluster                       : ceph
[ceph_deploy.cli][INFO  ]  client                        : ['ceph-mon01', 'ceph-mon02', 'ceph-mon03', 'ceph-mgr01', 'ceph-mgr02']
[ceph_deploy.cli][INFO  ]  func                          : <function config at 0x7f03334c5cf8>
[ceph_deploy.cli][INFO  ]  ceph_conf                     : None
[ceph_deploy.cli][INFO  ]  default_release               : False
[ceph_deploy.config][DEBUG ] Pushing config to ceph-mon01
[ceph-mon01][DEBUG ] connection detected need for sudo
[ceph-mon01][DEBUG ] connected to host: ceph-mon01 
[ceph-mon01][DEBUG ] detect platform information from remote host
[ceph-mon01][DEBUG ] detect machine type
[ceph-mon01][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[ceph_deploy.config][DEBUG ] Pushing config to ceph-mon02
[ceph-mon02][DEBUG ] connection detected need for sudo
[ceph-mon02][DEBUG ] connected to host: ceph-mon02 
[ceph-mon02][DEBUG ] detect platform information from remote host
[ceph-mon02][DEBUG ] detect machine type
[ceph-mon02][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[ceph_deploy.config][DEBUG ] Pushing config to ceph-mon03
[ceph-mon03][DEBUG ] connection detected need for sudo
[ceph-mon03][DEBUG ] connected to host: ceph-mon03 
[ceph-mon03][DEBUG ] detect platform information from remote host
[ceph-mon03][DEBUG ] detect machine type
[ceph-mon03][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[ceph_deploy.config][DEBUG ] Pushing config to ceph-mgr01
[ceph-mgr01][DEBUG ] connection detected need for sudo
[ceph-mgr01][DEBUG ] connected to host: ceph-mgr01 
[ceph-mgr01][DEBUG ] detect platform information from remote host
[ceph-mgr01][DEBUG ] detect machine type
[ceph-mgr01][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[ceph_deploy.config][DEBUG ] Pushing config to ceph-mgr02
[ceph-mgr02][DEBUG ] connection detected need for sudo
[ceph-mgr02][DEBUG ] connected to host: ceph-mgr02 
[ceph-mgr02][DEBUG ] detect platform information from remote host
[ceph-mgr02][DEBUG ] detect machine type
[ceph-mgr02][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[cephadm@ceph-admin ceph-cluster]$ 

  停止ceph-mon01上的mds程序,看看對應ceph-mon03是否接管?

[cephadm@ceph-admin ceph-cluster]$ ceph fs status cephfs
cephfs - 0 clients
======
+------+--------+------------+---------------+-------+-------+
| Rank | State  |    MDS     |    Activity   |  dns  |  inos |
+------+--------+------------+---------------+-------+-------+
|  0   | active | ceph-mon02 | Reqs:    0 /s |   18  |   17  |
|  1   | active | ceph-mon01 | Reqs:    0 /s |   10  |   13  |
+------+--------+------------+---------------+-------+-------+
+---------------------+----------+-------+-------+
|         Pool        |   type   |  used | avail |
+---------------------+----------+-------+-------+
| cephfs-metadatapool | metadata | 65.3k |  280G |
|   cephfs-datapool   |   data   | 3391k |  280G |
+---------------------+----------+-------+-------+
+-------------+
| Standby MDS |
+-------------+
|  ceph-mon03 |
+-------------+
MDS version: ceph version 13.2.10 (564bdc4ae87418a232fc901524470e1a0f76d641) mimic (stable)
[cephadm@ceph-admin ceph-cluster]$ ssh ceph-mon01 'systemctl stop [email protected]'
Failed to stop [email protected]: Interactive authentication required.
See system logs and 'systemctl status [email protected]' for details.
[cephadm@ceph-admin ceph-cluster]$ ssh ceph-mon01 'sudo systemctl stop [email protected]'
[cephadm@ceph-admin ceph-cluster]$ ceph fs status cephfs                                         cephfs - 0 clients
======
+------+--------+------------+---------------+-------+-------+
| Rank | State  |    MDS     |    Activity   |  dns  |  inos |
+------+--------+------------+---------------+-------+-------+
|  0   | active | ceph-mon02 | Reqs:    0 /s |   18  |   17  |
|  1   | rejoin | ceph-mon03 |               |    0  |    3  |
+------+--------+------------+---------------+-------+-------+
+---------------------+----------+-------+-------+
|         Pool        |   type   |  used | avail |
+---------------------+----------+-------+-------+
| cephfs-metadatapool | metadata | 65.3k |  280G |
|   cephfs-datapool   |   data   | 3391k |  280G |
+---------------------+----------+-------+-------+
+-------------+
| Standby MDS |
+-------------+
+-------------+
MDS version: ceph version 13.2.10 (564bdc4ae87418a232fc901524470e1a0f76d641) mimic (stable)
[cephadm@ceph-admin ceph-cluster]$ ceph fs status cephfs
cephfs - 0 clients
======
+------+--------+------------+---------------+-------+-------+
| Rank | State  |    MDS     |    Activity   |  dns  |  inos |
+------+--------+------------+---------------+-------+-------+
|  0   | active | ceph-mon02 | Reqs:    0 /s |   18  |   17  |
|  1   | active | ceph-mon03 | Reqs:    0 /s |   10  |   13  |
+------+--------+------------+---------------+-------+-------+
+---------------------+----------+-------+-------+
|         Pool        |   type   |  used | avail |
+---------------------+----------+-------+-------+
| cephfs-metadatapool | metadata | 65.3k |  280G |
|   cephfs-datapool   |   data   | 3391k |  280G |
+---------------------+----------+-------+-------+
+-------------+
| Standby MDS |
+-------------+
+-------------+
MDS version: ceph version 13.2.10 (564bdc4ae87418a232fc901524470e1a0f76d641) mimic (stable)
[cephadm@ceph-admin ceph-cluster]$ 

  提示:可以看到當ceph-mon01故障以後,對應ceph-mon03自動接管了ceph-mon01負責的rank;

  恢復ceph-mon01

[cephadm@ceph-admin ceph-cluster]$ ceph fs status 
cephfs - 0 clients
======
+------+--------+------------+---------------+-------+-------+
| Rank | State  |    MDS     |    Activity   |  dns  |  inos |
+------+--------+------------+---------------+-------+-------+
|  0   | active | ceph-mon02 | Reqs:    0 /s |   18  |   17  |
|  1   | active | ceph-mon03 | Reqs:    0 /s |   10  |   13  |
+------+--------+------------+---------------+-------+-------+
+---------------------+----------+-------+-------+
|         Pool        |   type   |  used | avail |
+---------------------+----------+-------+-------+
| cephfs-metadatapool | metadata | 65.3k |  280G |
|   cephfs-datapool   |   data   | 3391k |  280G |
+---------------------+----------+-------+-------+
+-------------+
| Standby MDS |
+-------------+
+-------------+
MDS version: ceph version 13.2.10 (564bdc4ae87418a232fc901524470e1a0f76d641) mimic (stable)
[cephadm@ceph-admin ceph-cluster]$ ssh ceph-mon01 'sudo systemctl start [email protected]'
[cephadm@ceph-admin ceph-cluster]$ ceph fs status 
cephfs - 0 clients
======
+------+--------+------------+---------------+-------+-------+
| Rank | State  |    MDS     |    Activity   |  dns  |  inos |
+------+--------+------------+---------------+-------+-------+
|  0   | active | ceph-mon02 | Reqs:    0 /s |   18  |   17  |
|  1   | active | ceph-mon03 | Reqs:    0 /s |   10  |   13  |
+------+--------+------------+---------------+-------+-------+
+---------------------+----------+-------+-------+
|         Pool        |   type   |  used | avail |
+---------------------+----------+-------+-------+
| cephfs-metadatapool | metadata | 65.3k |  280G |
|   cephfs-datapool   |   data   | 3391k |  280G |
+---------------------+----------+-------+-------+
+-------------+
| Standby MDS |
+-------------+
|  ceph-mon01 |
+-------------+
MDS version: ceph version 13.2.10 (564bdc4ae87418a232fc901524470e1a0f76d641) mimic (stable)
[cephadm@ceph-admin ceph-cluster]$ ssh ceph-mon03 'sudo systemctl restart [email protected]'     
[cephadm@ceph-admin ceph-cluster]$ ceph fs status 
cephfs - 0 clients
======
+------+----------------+------------+---------------+-------+-------+
| Rank |     State      |    MDS     |    Activity   |  dns  |  inos |
+------+----------------+------------+---------------+-------+-------+
|  0   |     active     | ceph-mon02 | Reqs:    0 /s |   18  |   17  |
|  1   |     active     | ceph-mon01 | Reqs:    0 /s |   10  |   13  |
| 1-s  | standby-replay | ceph-mon03 | Evts:    0 /s |    0  |    3  |
+------+----------------+------------+---------------+-------+-------+
+---------------------+----------+-------+-------+
|         Pool        |   type   |  used | avail |
+---------------------+----------+-------+-------+
| cephfs-metadatapool | metadata | 65.3k |  280G |
|   cephfs-datapool   |   data   | 3391k |  280G |
+---------------------+----------+-------+-------+
+-------------+
| Standby MDS |
+-------------+
+-------------+
MDS version: ceph version 13.2.10 (564bdc4ae87418a232fc901524470e1a0f76d641) mimic (stable)
[cephadm@ceph-admin ceph-cluster]$ 

  提示:重新恢復ceph-mon01以後,對應不會進行搶佔,它會自動淪為standby狀態;並且當ceph-mon03重啟或故障後對應ceph-mon01也會自動接管對應rank;