MySQL InnoDB Cluster 是最新GA的MySQL高可用方案,利用MySQL Group Replication和MySQL Shell、MySQL Router可以輕鬆搭建強壯的高可用方案。
MySQL Shell 是新的mysql 用戶端工具支援x protocol和mysql protocol,具備JavaScript和python可程式化能力,作爲搭建InnoDB Cluster管理工具。
MySQL Router 是存取路由轉發中介軟體,提供應用程式存取的failover能力。
[外連圖片轉存失敗,源站可能有防盜鏈機制 機製,建議將圖片儲存下來直接上傳(img-CwKDEo6u-1597371597755)(https://s3.ap-northeast-1.wasabisys.com/img.tw511.com/202008/innodb_cluster_overviewr40lgjmidug.png)]
上面這張圖看着比較清楚,通過MySQL Shell可以設定出一個高可用自動進行故障轉移的MySQL InnoDB Cluster,在後續運維過程中也可以通過MySQL Shell對叢集進行狀態監控及管理維護。通過MySQL Router嚮應用層遮蔽底層叢集的細節,以應用層將普通的MySQL協定存取叢集。
MySQL Group Replication 是最新GA的同步複製方式,具有以下特點:
這裏準備了3台虛擬機器mysql0
、mysql1
、mysql2
,IP分別爲192.168.7.30
、192.168.7.31
、192.168.7.32
。
第一步是在三臺虛擬機器上均安裝mysql-community-server、mysql-shell、mysql-router軟體包。
# 設定mysql的yum源
$ yum install -y https://dev.mysql.com/get/mysql57-community-release-el7-8.noarch.rpm
# 安裝
$ yum install -y mysql-community-server mysql-shell mysql-router
爲保證三臺虛擬機器上可正常通過名稱解析到對方的IP,設定DNS,或將主機名稱對映寫入hosts檔案中。
$ cat << EOF >> /etc/hosts
192.168.7.30 mysql0
192.168.7.31 mysql1
192.168.7.32 mysql2
EOF
爲了後續操作方便,這裏修改三臺虛擬機器上MySQL的root密碼。
# 首先得到初始的root密碼
$ systemctl start mysqld
$ ORIGINAL_ROOT_PASSWORD=$(awk '/temporary password/{print $NF}' /var/log/mysqld.log)
# 這裏將mysql的root密碼修改爲R00T@mysql,這個密碼符合複雜度要求
$ MYSQL_PWD="$ORIGINAL_ROOT_PASSWORD" mysql --connect-expired-password -e "ALTER USER 'root'@'localhost' IDENTIFIED WITH mysql_native_password BY 'R00T@mysql';"
# 順便允許mysql可在其它主機登錄過來
$ MYSQL_PWD="R00T@mysql" mysql --connect-expired-password -e "CREATE USER 'root'@'%' IDENTIFIED WITH mysql_native_password BY 'R00T@mysql';"
$ MYSQL_PWD="R00T@mysql" mysql --connect-expired-password -e "GRANT ALL ON *.* TO 'root'@'%' WITH GRANT OPTION; FLUSH PRIVILEGES;"
MySQL InnoDB Cluster底層依賴Group Replication模式,而設定Group Replication模式首先要通過dba.configureLocalInstance
設定每台虛擬機器上的本地範例必要參數並持久化設定。
# 通過mysqlsh即可輕鬆完成本機範例的設定
$ cat << EOF > config_local_instance.js
dba.configureLocalInstance('root@localhost:3306', {'password': 'R00T@mysql', 'interactive': false, 'mycnfPath': '/etc/my.cnf'})
EOF
$ mysqlsh --no-password --js --file=config_local_instance.js
# 重新啓動後才能 纔能生效
$ systemctl restart mysqld
# 再檢查一下本地範例設定的狀況
$ cat << EOF > check_instance_configuration.js
dba.checkInstanceConfiguration('root@localhost:3306', {'password': 'R00T@mysql', 'interactive': false})
EOF
$ mysqlsh --no-password --js --file=check_instance_configuration.js
首先需要在所有節點的防火牆開通介面3306
和33061
(用於建立Cluster)。
firewall-cmd --zone=public --add-port=3306/tcp --permanent
firewall-cmd --zone=public --add-port=33061/tcp --permanent
firewall-cmd --reload
# 檢視是否新增成功
firewall-cmd --zone=public --list-port
然後在mysql0這台虛擬機器上進行以下操作就可以了。
$ cat << EOF > init_cluster.js
shell.connect('root@localhost:3306', 'R00T@mysql')
dba.createCluster('mycluster', {'localAddress': 'mysql0'})
var cluster=dba.getCluster('mycluster')
cluster.addInstance('root@mysql1:3306', {'localAddress': 'mysql1', 'password': 'R00T@mysql'})
cluster.addInstance('root@mysql2:3306', {'localAddress': 'mysql2', 'password': 'R00T@mysql'})
EOF
$ mysqlsh --no-password --js --file=init_cluster.js
如果在mysqlsh
中檢視叢集狀態,會類似如下輸出:
$ mysqlsh
> shell.connect('root@localhost:3306', 'R00T@mysql')
> var cluster=dba.getCluster('mycluster')
> cluster.status()
{
"clusterName": "mycluster",
"defaultReplicaSet": {
"name": "default",
"primary": "mysql0:3306",
"ssl": "REQUIRED",
"status": "OK",
"statusText": "Cluster is ONLINE and can tolerate up to ONE failure.",
"topology": {
"mysql0:3306": {
"address": "mysql0:3306",
"mode": "R/W",
"readReplicas": {},
"role": "HA",
"status": "ONLINE"
},
"mysql1:3306": {
"address": "mysql1:3306",
"mode": "R/O",
"readReplicas": {},
"role": "HA",
"status": "ONLINE"
},
"mysql2:3306": {
"address": "mysql2:3306",
"mode": "R/O",
"readReplicas": {},
"role": "HA",
"status": "ONLINE"
}
},
"topologyMode": "Single-Primary"
},
"groupInformationSourceMember": "mysql0:3306"
}
可見是一個讀寫(R/W
),其餘爲只讀(R/O
)。
mysql0
節點故障在mysql0
節點停止mysqld
服務:systemctl stop mysqld
。
在mysql1
節點上檢視叢集狀態:
...
"statusText": "Cluster is NOT tolerant to any failures. 1 member is not active",
"topology": {
"mysql0:3306": {
"address": "mysql0:3306",
"mode": "n/a",
"readReplicas": {},
"role": "HA",
"shellConnectError": "MySQL Error 2003 (HY000): Can't connect to MySQL server on 'mysql0' (111)",
"status": "(MISSING)"
},
"mysql1:3306": {
"address": "mysql1:3306",
"mode": "R/W",
"readReplicas": {},
"role": "HA",
"status": "ONLINE"
},
...
此時mysql
被置爲可讀寫的主節點,mysql2
爲只讀節點。
下面 下麪啓動mysql0
的mysqld
服務:systemctl start mysqld
。
此時對於故障恢復的節點需要執行rejoinInstance
操作:
$ mysqlsh
> shell.connect('root@localhost:3306', 'R00T@mysql')
> var cluster=dba.getCluster('mycluster')
> cluster.status()
...
"statusText": "Cluster is NOT tolerant to any failures. 1 member is not active",
"topology": {
"mysql0:3306": {
"address": "mysql0:3306",
"mode": "R/W",
"readReplicas": {},
"role": "HA",
"status": "(MISSING)"
},
...
> cluster.rejoinInstance('root@mysql0:3306', {'password': 'R00T@mysql'})
> cluster.status()
...
"statusText": "Cluster is ONLINE and can tolerate up to ONE failure.",
"topology": {
"mysql0:3306": {
"address": "mysql0:3306",
"mode": "R/O",
"readReplicas": {},
"role": "HA",
"status": "ONLINE"
},
"mysql1:3306": {
"address": "mysql1:3306",
"mode": "R/W",
"readReplicas": {},
"role": "HA",
"status": "ONLINE"
},
...
可見mysql0
作爲從節點加入了。
當叢集的所有節點都offline,直接獲取叢集資訊失敗,如何重新恢復叢集。
> var cluster=dba.getCluster('mycluster')
Dba.getCluster: This function is not available through a session to a standalone instance (metadata exists, instance belongs to that metadata, but GR is not active) (RuntimeError)
執行rebootClusterFromCompleteOutage
命令,可恢復叢集(下面 下麪是在mysql2
上執行的)。
> dba.rebootClusterFromCompleteOutage('mycluster')
Restoring the cluster 'mycluster' from complete outage...
The instance 'mysql0:3306' was part of the cluster configuration.
Would you like to rejoin it to the cluster? [y/N]: y
The instance 'mysql1:3306' was part of the cluster configuration.
Would you like to rejoin it to the cluster? [y/N]: y
NOTE: Group Replication will communicate with other members using 'mysql2:33061'. Use the localAddress option to override.
...
當叢集中有部分節點出現UNREACHABLE狀態,此時叢集無法做出決策,會出現以下局面,此時只剩下一個活躍節點,此節點只能提供查詢,無法寫入,執行寫入操作會hang住。
{
"clusterName": "mycluster",
"defaultReplicaSet": {
"name": "default",
"primary": "mysql2:3306",
"ssl": "DISABLED",
"status": "NO_QUORUM",
"statusText": "Cluster has no quorum as visible from 'mysql2:3306' and cannot process write transactions. 2 members are not active",
"topology": {
"mysql0:3306": {
"address": "mysql0:3306",
"mode": "n/a",
"readReplicas": {},
"role": "HA",
"shellConnectError": "MySQL Error 2003 (HY000): Can't connect to MySQL server on 'mysql0' (111)",
"status": "(MISSING)"
},
"mysql1:3306": {
"address": "mysql1:3306",
"mode": "n/a",
"readReplicas": {},
"role": "HA",
"shellConnectError": "MySQL Error 2003 (HY000): Can't connect to MySQL server on 'mysql1' (110)",
"status": "UNREACHABLE"
},
"mysql2:3306": {
"address": "mysql2:3306",
"mode": "R/O",
"readReplicas": {},
"role": "HA",
"status": "ONLINE"
}
},
...
修復這種狀態,需要執行forceQuorumUsingPartitionOf
指定當前活躍節點(如果是多個則選擇primary node),此時活躍節點可以提供讀寫操作,然後將其他節點加入此叢集。
> cluster.forceQuorumUsingPartitionOf('root@mysql2:3306')
Restoring cluster 'mycluster' from loss of quorum, by using the partition composed of [mysql2:3306]
Restoring the InnoDB cluster ...
Please provide the password for 'root@mysql2:3306': **********
Save password for 'root@mysql2:3306'? [Y]es/[N]o/Ne[v]er (default No): y
The InnoDB cluster was successfully restored using the partition from the instance 'root@mysql2:3306'.
WARNING: To avoid a split-brain scenario, ensure that all other members of the cluster are removed or joined back to the group that was restored.
> cluster.status()
...
"status": "OK_NO_TOLERANCE",
"statusText": "Cluster is NOT tolerant to any failures. 2 members are not active",
"topology": {
"mysql0:3306": {
"address": "mysql0:3306",
"mode": "n/a",
"readReplicas": {},
"role": "HA",
"shellConnectError": "MySQL Error 2003 (HY000): Can't connect to MySQL server on 'mysql0' (111)",
"status": "(MISSING)"
},
"mysql1:3306": {
"address": "mysql1:3306",
"mode": "n/a",
"readReplicas": {},
"role": "HA",
"shellConnectError": "MySQL Error 2003 (HY000): Can't connect to MySQL server on 'mysql1' (113)",
"status": "(MISSING)"
},
"mysql2:3306": {
"address": "mysql2:3306",
"mode": "R/W",
"readReplicas": {},
"role": "HA",
"status": "ONLINE"
}
...
可以看到節點狀態從UNREACHABLE
變成了(MISSING)
,叢集狀態從NO_QUORUM
變爲OK_NO_TOLERANCE
。此時就可以進行寫操作了。
故障修復後把節點加進來:
> cluster.rejoinInstance('root@mysql0:3306', {'password': 'R00T@mysql'})
> cluster.rejoinInstance('root@mysql1:3306', {'password': 'R00T@mysql'})
# 這裏最初UNREACHABLE的節點在加入回來會報錯,似乎是一個BUG,可以先嚐試刪除再加入
> cluster.removeInstance('root@mysql1:3306', {'password': 'R00T@mysql', 'force': true})
> cluster.addInstance('root@mysql1:3306', {'localAddress': 'mysql1', 'password': 'R00T@mysql'})
首先需要在所有節點的防火牆開通介面6446
和6447
(用於建立Cluster)。
firewall-cmd --zone=public --add-port=6446/tcp --permanent
firewall-cmd --zone=public --add-port=6447/tcp --permanent
firewall-cmd --reload
# 檢視是否新增成功
firewall-cmd --zone=public --list-port
爲了嚮應用層遮蔽底層叢集的細節,我們還可以在三臺虛擬機器上均部署mysql-router。
# 以當前叢集資訊建立mysql-router的設定資訊,注意這裏密碼R00T@mysql被編碼爲R00T%40mysql
$ mysqlrouter --bootstrap root:R00T%40mysql@mysql2:3306 --user=mysqlrouter
# 重新啓動mysqlrouter服務
$ systemctl enable mysqlrouter
$ systemctl restart mysqlrouter
mysqlrouter --bootstrap
的時候會根據提供的URI獲取叢集元數據,如果該URI是隻讀的,那會自動找到讀寫節點。
利用keepalived
設定高可用。
安裝keepalived:
yum install keepalived
# 備份組態檔
cp /etc/keepalived/keepalived.conf /etc/keepalived/keepalived.conf.bak
編輯組態檔/etc/keepalived/keepalived.conf
# MASTER節點
global_defs {
router_id MYSQL_ROUTER # 各節點統一ID
vrrp_skip_check_adv_addr
vrrp_strict
vrrp_garp_interval 0
vrrp_gna_interval 0
}
vrrp_script check_mysqlrouter {
script "/usr/bin/killall -0 /usr/bin/mysqlrouter" # 檢測mysqlrouter是否在執行
interval 2
weight 2
fall 2
}
vrrp_instance VI_1 {
state MASTER # 主節點
interface ens192 # VIP系結的網絡卡
virtual_router_id 33 # 各節點統一的虛擬ID
priority 102 # 數越高優先順序越高
advert_int 1 # 檢測間隔 1s
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
192.168.7.33 # VIP
}
track_script {
check_mysqlrouter # 檢測指令碼
}
}
# BACKUP節點(不同的設定)
state BACKUP # 備節點
priority 101 # 數值低於MASTER
設定防火牆:
firewall-cmd --add-rich-rule='rule protocol value="vrrp" accept' --permanent
firewall-cmd --reload
重新啓動keepalived:systemctl restart keepalived
,然後ip a
檢視VIP系結情況。