從原始碼分析 MGR 的流控機制

Group Replication 是一種 Shared-Nothing 的架構，每個節點都會保留一份資料。

雖然支援多點寫入，但實際上系統的吞吐量是由處理能力最弱的那個節點決定的。

如果各個節點的處理能力參差不齊，那處理能力慢的節點就會出現事務堆積。

在事務堆積的時候，如果處理能力快的節點出現了故障，這個時候能否讓處理能力慢的節點（存在事務堆積）接受業務流量呢？

如果不等待堆積事務應用完，直接接受業務流量。

一方面會讀到舊資料，另一方面也容易出現寫衝突。

為什麼容易出現寫衝突呢？因為基於舊資料進行的寫操作，它的 snapshot_version 小於衝突檢測資料庫中對應記錄的 snapshot_version，這個時候，衝突檢測會失敗。
如果等待堆積事務應用完才接受業務流量，又會影響資料庫服務的可用性。

為了避免出現上述兩難場景，Group Replication 引入了流控機制。

在實現上，Group Replication 的流控模組會定期檢查各個節點的事務堆積情況，如果超過一定值，則會觸發流控。

流控會基於上一週期各個節點的事務認證情況和事務應用情況，決定當前節點（注意是當前節點，不是其它節點）下個週期的寫入配額。

超過寫入配額的事務操作會被阻塞，等到下個週期才能執行。

接下來，我們通過原始碼分析下流控的實現原理。

本文主要包括以下幾部分：

流控觸發的條件。
配額的計算邏輯。
基於案例定量分析配額的計算邏輯。
配額作用的時機。
流控的相關引數。

流控觸發的條件

預設情況下，節點的狀態資訊是每秒傳送一次（節點的狀態資訊是在 flow_control_step 中傳送的，傳送週期由 group_replication_flow_control_period 決定）。

當接受到其它節點的狀態資訊時，會呼叫 Flow_control_module::handle_stats_data 來處理。

下面我們看看 Flow_control_module::handle_stats_data 函數的處理邏輯。

int Flow_control_module::handle_stats_data(const uchar *data, size_t len,
                                           const std::string &member_id) {
  DBUG_TRACE;
  int error = 0;
  Pipeline_stats_member_message message(data, len);

  m_flow_control_module_info_lock->wrlock();
  // m_info 是個字典，定義是 std::map<std::string, Pipeline_member_stats>
  // 其中，key 是節點的地址，value 是節點的狀態資訊。
  Flow_control_module_info::iterator it = m_info.find(member_id);
  // 如果 member_id 對應節點的狀態資訊在 m_info 中不存在，則插入。
  if (it == m_info.end()) {
    Pipeline_member_stats stats;

    std::pair<Flow_control_module_info::iterator, bool> ret = m_info.insert(
        std::pair<std::string, Pipeline_member_stats>(member_id, stats));
    error = !ret.second;
    it = ret.first;
  }
  // 更新節點的統計資訊
  it->second.update_member_stats(message, m_stamp);

  // 檢查是否需要流控
  if (it->second.is_flow_control_needed()) {
    ++m_holds_in_period;
#ifndef NDEBUG
    it->second.debug(it->first.c_str(), m_quota_size.load(),
                     m_quota_used.load());
#endif
  }

  m_flow_control_module_info_lock->unlock();
  return error;
}

首先判斷節點的狀態資訊是否在 m_info 中存在。如果不存在，則插入。

接著通過 update_member_stats 更新節點的統計資訊。

更新後的統計資訊包括以下兩部分：

當前資料：如 m_transactions_waiting_certification（當前等待認證的事務數），m_transactions_waiting_apply（當前等待應用的事務數）。
上一週期的增量資料：如 m_delta_transactions_certified（上一週期進行認證的事務數）。

m_delta_transactions_certified 等於 m_transactions_certified （這一次的採集資料） - previous_transactions_certified （上一次的採集資料）

最後會通過is_flow_control_needed判斷是否需要流控。如果需要流控，則會將 m_holds_in_period 自增加 1。

如果是 Debug 版本，且將 log_error_verbosity 設定為 3。當需要流控時，會在錯誤紀錄檔中列印以下資訊。

[Note] [MY-011726] [Repl] Plugin group_replication reported: 'Flow control - update member stats: 127.0.0.1:33071 stats certifier_queue 0, applier_queue 20 certified 387797 (308), applied 387786 (289), local 0 (0), quota 400 (274) mode=1'

什麼時候會觸發流控呢？

接下來我們看看 is_flow_control_needed 函數的處理邏輯。

bool Pipeline_member_stats::is_flow_control_needed() {
  return (m_flow_control_mode == FCM_QUOTA) &&
         (m_transactions_waiting_certification >
              get_flow_control_certifier_threshold_var() ||
          m_transactions_waiting_apply >
              get_flow_control_applier_threshold_var());
}

由此來看，觸發流控需滿足以下條件：

group_replication_flow_control_mode 設定為 QUOTA。
當前等待認證的事務數大於 group_replication_flow_control_certifier_threshold。

當前等待認證的事務數可通過 performance_schema.replication_group_member_stats 中的 COUNT_TRANSACTIONS_IN_QUEUE 檢視。
當前等待應用的事務數大於 group_replication_flow_control_applier_threshold。

當前等待應用的事務數可通過 performance_schema.replication_group_member_stats 中的 COUNT_TRANSACTIONS_REMOTE_IN_APPLIER_QUEUE 檢視。

除了條件 1，條件 2，3 滿足其一即可。

當需要流控時，會將 m_holds_in_period 自增加 1。

m_holds_in_period 這個變數會用在 Flow_control_module::flow_control_step 中。

而 Flow_control_module::flow_control_step 是在 Certifier_broadcast_thread::dispatcher() 中呼叫的，每秒執行一次。

void Certifier_broadcast_thread::dispatcher() {
  ...
  while (!aborted) {
    ...
    applier_module->run_flow_control_step();
    ...
    struct timespec abstime;
    // 定義超時時長 1s。
    set_timespec(&abstime, 1);
    mysql_cond_timedwait(&broadcast_dispatcher_cond, &broadcast_dispatcher_lock,
                         &abstime);
    mysql_mutex_unlock(&broadcast_dispatcher_lock);

    broadcast_counter++;
  }
}

void run_flow_control_step() override {
  flow_control_module.flow_control_step(&pipeline_stats_member_collector);
}

配額的計算邏輯

接下來我們重點分析下 flow_control_step 函數的處理邏輯。

這個函數非常關鍵，它是整個流控模組的核心。

它主要是用來計算 m_quota_size 和 m_quota_used。

其中，m_quota_size 決定了下個週期允許提交的事務數，即我們所說的配額。

m_quota_used 用來統計下個週期已經提交的事務數，在該函數中會重置為 0。

void Flow_control_module::flow_control_step(
    Pipeline_stats_member_collector *member) {
  // 這裡的 seconds_to_skip 實際上就是 group_replication_flow_control_period，後面會有定義。
  // 雖然 flow_control_step 是一秒呼叫一次，但實際起作用的還是 group_replication_flow_control_period。
  if (--seconds_to_skip > 0) return;
  
  // holds 即 m_holds_in_period
  int32 holds = m_holds_in_period.exchange(0);
  // get_flow_control_mode_var() 即 group_replication_flow_control_mode
  Flow_control_mode fcm =
      static_cast<Flow_control_mode>(get_flow_control_mode_var());
  // get_flow_control_period_var() 即 group_replication_flow_control_period
  seconds_to_skip = get_flow_control_period_var();
  // 計數器
  m_stamp++;
  // 傳送當前節點的狀態資訊
  member->send_stats_member_message(fcm);

  switch (fcm) {
    case FCM_QUOTA: {
      // get_flow_control_hold_percent_var() 即 group_replication_flow_control_hold_percent，預設是 10
      // 所以 HOLD_FACTOR 預設是 0.9
      double HOLD_FACTOR =
          1.0 -
          static_cast<double>(get_flow_control_hold_percent_var()) / 100.0;
      // get_flow_control_release_percent_var() 即 group_replication_flow_control_release_percent，預設是 50
      // 所以 RELEASE_FACTOR 預設是 1.5
      double RELEASE_FACTOR =
          1.0 +
          static_cast<double>(get_flow_control_release_percent_var()) / 100.0;
      // get_flow_control_member_quota_percent_var() 即 group_replication_flow_control_member_quota_percent，預設是 0
      // 所以 TARGET_FACTOR 預設是 0
      double TARGET_FACTOR =
          static_cast<double>(get_flow_control_member_quota_percent_var()) /
          100.0;
      // get_flow_control_max_quota_var() 即 group_replication_flow_control_max_quota，預設是 0
      int64 max_quota = static_cast<int64>(get_flow_control_max_quota_var());

      // 將上一個週期的 m_quota_size，m_quota_used 賦值給 quota_size，quota_used，同時自身重置為 0
      int64 quota_size = m_quota_size.exchange(0);
      int64 quota_used = m_quota_used.exchange(0);
      int64 extra_quota = (quota_size > 0 && quota_used > quota_size)
                              ? quota_used - quota_size
                              : 0;

      if (extra_quota > 0) {
        mysql_mutex_lock(&m_flow_control_lock);
        // 傳送一個訊號，釋放 do_wait() 處等待的事務
        mysql_cond_broadcast(&m_flow_control_cond);
        mysql_mutex_unlock(&m_flow_control_lock);
      }
      // m_holds_in_period 大於 0，則意味著需要進行流控
      if (holds > 0) {
        uint num_writing_members = 0, num_non_recovering_members = 0;
        // MAXTPS 是 INT 的最大值，即 2147483647
        int64 min_certifier_capacity = MAXTPS, min_applier_capacity = MAXTPS,
              safe_capacity = MAXTPS;

        m_flow_control_module_info_lock->rdlock();
        Flow_control_module_info::iterator it = m_info.begin();
        // 迴圈遍歷所有節點的狀態資訊
        while (it != m_info.end()) {
            // 這一段原始碼中沒有，加到這裡可以直觀的看到觸發流控時，每個節點的狀態資訊。
#ifndef NDEBUG
            it->second.debug(it->first.c_str(), quota_size,
                     quota_used);
#endif
          if (it->second.get_stamp() < (m_stamp - 10)) {
            // 如果節點的狀態資訊在最近 10 個週期內都沒有更新，則清掉
            m_info.erase(it++);
          } else {
            if (it->second.get_flow_control_mode() == FCM_QUOTA) {
              // 如果 group_replication_flow_control_certifier_threshold 大於 0，
              // 且上一個週期進行認證的事務數大於 0，
              // 且當前等待認證的事務數大於 group_replication_flow_control_certifier_threshold，
              // 且上一個週期進行認證的事務數小於 min_certifier_capacity
              // 則會將上一個週期進行認證的事務數賦予 min_certifier_capacity
              if (get_flow_control_certifier_threshold_var() > 0 &&
                  it->second.get_delta_transactions_certified() > 0 &&
                  it->second.get_transactions_waiting_certification() -
                          get_flow_control_certifier_threshold_var() >
                      0 &&
                  min_certifier_capacity >
                      it->second.get_delta_transactions_certified()) {
                min_certifier_capacity =
                    it->second.get_delta_transactions_certified();
              }

              if (it->second.get_delta_transactions_certified() > 0)
                // safe_capacity 取 safe_capacity 和 it->second.get_delta_transactions_certified() 中的較小值
                safe_capacity =
                    std::min(safe_capacity,
                             it->second.get_delta_transactions_certified());


              // 針對的是 applier，邏輯同 certifier 一樣
              if (get_flow_control_applier_threshold_var() > 0 &&
                  it->second.get_delta_transactions_applied() > 0 &&
                  it->second.get_transactions_waiting_apply() -
                          get_flow_control_applier_threshold_var() >
                      0) {
                if (min_applier_capacity >
                    it->second.get_delta_transactions_applied())
                  min_applier_capacity =
                      it->second.get_delta_transactions_applied();

                if (it->second.get_delta_transactions_applied() > 0)
                  // 如果上一個週期有事務應用，說明該節點不是 recovering 節點
                  num_non_recovering_members++;
              }

              if (it->second.get_delta_transactions_applied() > 0)
                // safe_capacity 取 safe_capacity 和 it->second.get_delta_transactions_applied() 中的較小值
                safe_capacity = std::min(
                    safe_capacity, it->second.get_delta_transactions_applied());

              if (it->second.get_delta_transactions_local() > 0)
                // 如果上一個週期有本地事務，則意味著該節點存在寫入
                num_writing_members++;
            }
            ++it;
          }
        }
        m_flow_control_module_info_lock->unlock();

        num_writing_members = num_writing_members > 0 ? num_writing_members : 1;
        // min_capacity 取 min_certifier_capacity 和 min_applier_capacity 的較小值
        int64 min_capacity = (min_certifier_capacity > 0 &&
                              min_certifier_capacity < min_applier_capacity)
                                 ? min_certifier_capacity
                                 : min_applier_capacity;

        // lim_throttle 是最小配額
        int64 lim_throttle = static_cast<int64>(
            0.05 * std::min(get_flow_control_certifier_threshold_var(),
                            get_flow_control_applier_threshold_var()));
        // get_flow_control_min_recovery_quota_var() 即 group_replication_flow_control_min_recovery_quota
        if (get_flow_control_min_recovery_quota_var() > 0 &&
            num_non_recovering_members == 0)
          lim_throttle = get_flow_control_min_recovery_quota_var();
        // get_flow_control_min_quota_var() 即 group_replication_flow_control_min_quota
        if (get_flow_control_min_quota_var() > 0)
          lim_throttle = get_flow_control_min_quota_var();

        // min_capacity 不能太小，不能低於 lim_throttle
        min_capacity =
            std::max(std::min(min_capacity, safe_capacity), lim_throttle);

        // HOLD_FACTOR 預設是 0.9
        quota_size = static_cast<int64>(min_capacity * HOLD_FACTOR);

        // max_quota 是由 group_replication_flow_control_max_quota 定義的，即 quota_size 不能超過 max_quota
        if (max_quota > 0) quota_size = std::min(quota_size, max_quota);
        
        // num_writing_members 是有實際寫操作的節點數
        if (num_writing_members > 1) {
          // 如果沒有設定 group_replication_flow_control_member_quota_percent，則按照節點數平分 quota_size
          if (get_flow_control_member_quota_percent_var() == 0)
            quota_size /= num_writing_members;
          else
          // 如果有設定，則當前節點的 quota_size 等於 quota_size * group_replication_flow_control_member_quota_percent / 100
            quota_size = static_cast<int64>(static_cast<double>(quota_size) *
                                            TARGET_FACTOR);
        }
        // quota_size 還會減去上個週期超額使用的 quota
        quota_size =
            (quota_size - extra_quota > 1) ? quota_size - extra_quota : 1;
#ifndef NDEBUG
        LogPluginErr(INFORMATION_LEVEL, ER_GRP_RPL_FLOW_CONTROL_STATS,
                     quota_size, get_flow_control_period_var(),
                     num_writing_members, num_non_recovering_members,
                     min_capacity, lim_throttle);
#endif
      } else {
        // 對應 m_holds_in_period = 0 的場景，RELEASE_FACTOR 預設是 1.5
        if (quota_size > 0 && get_flow_control_release_percent_var() > 0 &&
            (quota_size * RELEASE_FACTOR) < MAXTPS) {
          // 當流控結束後，quota_size = 上一個週期的 quota_size * 1.5
          int64 quota_size_next =
              static_cast<int64>(quota_size * RELEASE_FACTOR);
          quota_size =
              quota_size_next > quota_size ? quota_size_next : quota_size + 1;
        } else
          quota_size = 0;
      }

      if (max_quota > 0)
        // quota_size 會取 quota_size 和 max_quota 中的較小值
        quota_size =
            std::min(quota_size > 0 ? quota_size : max_quota, max_quota);
      // 最後，將 quota_size 賦值給 m_quota_size，m_quota_used 重置為 0
      m_quota_size.store(quota_size);
      m_quota_used.store(0);
      break;
    }

    // 如果 group_replication_flow_control_mode 為 DISABLED，
    // 則會將 m_quota_size 和 m_quota_used 置為 0，這個時候會禁用流控。
    case FCM_DISABLED:
      m_quota_size.store(0);
      m_quota_used.store(0);
      break;

    default:
      assert(0);
  }

  if (local_member_info->get_recovery_status() ==
      Group_member_info::MEMBER_IN_RECOVERY) {
    applier_module->get_pipeline_stats_member_collector()
        ->compute_transactions_deltas_during_recovery();
  }
}

程式碼的邏輯看上去有點複雜。

接下來，我們通過一個具體的範例看看 flow_control_step 函數的實現邏輯。

基於案例定量分析

測試叢集有三個節點組成：127.0.0.1:33061，127.0.0.1:33071 和 127.0.0.1:33081。

執行在多主模式下。

使用 sysbench 對 127.0.0.1:33061 進行插入測試（oltp_insert）。

為了更容易觸發流控，這裡將 127.0.0.1:33061 節點的 group_replication_flow_control_applier_threshold 設定為了 10。

以下是觸發流控時 127.0.0.1:33061 的紀錄檔資訊。

[Note] [MY-011726] [Repl] Plugin group_replication reported: 'Flow control - update member stats: 127.0.0.1:33061 stats certifier_queue 0, applier_queue 0 certified 7841 (177), applied 0 (0), local 7851 (177), quota 146 (156) mode=1'
[Note] [MY-011726] [Repl] Plugin group_replication reported: 'Flow control - update member stats: 127.0.0.1:33071 stats certifier_queue 0, applier_queue 0 certified 7997 (186), applied 8000 (218), local 0 (0), quota 146 (156) mode=1'
[Note] [MY-011726] [Repl] Plugin group_replication reported: 'Flow control - update member stats: 127.0.0.1:33081 stats certifier_queue 0, applier_queue 15 certified 7911 (177), applied 7897 (195), local 0 (0), quota 146 (156) mode=1'
[Note] [MY-011727] [Repl] Plugin group_replication reported: 'Flow control: throttling to 149 commits per 1 sec, with 1 writing and 1 non-recovering members, min capacity 177, lim throttle 0'

以 127.0.0.1:33081 的狀態資料為例，我們看看輸出中各項的具體含義：

certifier_queue 0：認證佇列的長度。
applier_queue 15：應用佇列的長度。
certified 7911 (177)：7911 是已經認證的總事務數，177 是上一週期進行認證的事務數（m_delta_transactions_certified）。
applied 7897 (195)：7897 是已經應用的總事務數，195 是上一週期應用的事務數（m_delta_transactions_applied）。
local 0 (0)：本地事務數。括號中的 0 是上一週期的本地事務數（m_delta_transactions_local）。
quota 146 (156)：146 是上一週期的 quota_size，156 是上一週期的 quota_used。
mode=1：mode 等於 1 是開啟流控。

因為 127.0.0.1:33081 中 applier_queue 的長度（15）超過 127.0.0.1:33061 中的 group_replication_flow_control_applier_threshold 的設定（10），所以會觸發流控。

觸發流控後，會呼叫 flow_control_step 計算下一週期的 m_quota_size。

1. 迴圈遍歷各節點的狀態資訊。叢集的吞吐量（min_capacity）取各個節點 m_delta_transactions_certified 和 m_delta_transactions_applied 的最小值。具體在本例中， min_capacity = min(177, 186, 218, 177, 195) = 177。

2. min_capacity 不能太小，不能低於 lim_throttle。im_throttle 的取值邏輯如下：

初始值是 0.05 * min (group_replication_flow_control_applier_threshold, group_replication_flow_control_certifier_threshold)。

具體在本例中，min_capacity = 0.05 * min(10, 25000) = 0.5。
如果設定了 group_replication_flow_control_min_recovery_quota 且 num_non_recovering_members 為 0，則會將 group_replication_flow_control_min_recovery_quota 賦值給 min_capacity。

num_non_recovering_members 什麼時候會為 0 呢？在新節點加入時，因為認證佇列中積壓的事務過多而觸發的流控。
如果設定了 group_replication_flow_control_min_quota，則會將 group_replication_flow_control_min_quota 賦值給 min_capacity。

3. quota_size = min_capacity * 0.9 = 177 * 0.9 = 159。這裡的 0.9 是 1 - group_replication_flow_control_hold_percent /100。之所以要預留部分配額，主要是為了處理積壓事務。

4. quota_size 不能太大，不能超過 group_replication_flow_control_max_quota。

5. 注意，這裡計算的 quota_size 是叢集的吞吐量，不是單個節點的吞吐量。如果要計算當前節點的吞吐量，最簡單的辦法是將 quota_size / 有實際寫操作的節點數（num_writing_members）。怎麼判斷一個節點是否進行了實際的寫操作呢？很簡單，上一週期有本地事務提交，即 m_delta_transactions_local > 0。具體在本例中，只有一個寫節點，所以，當前節點的 quota_size 就等於叢集的 quota_size，即 159。除了均分這個簡單粗暴的方法，如果希望某些節點比其它節點承擔更多的寫操作，也可通過 group_replication_flow_control_member_quota_percent 設定權重。這個時候，當前節點的吞吐量就等於 quota_size * group_replication_flow_control_member_quota_percent / 100。

6. 最後，當前節點的 quota_size 還會減去上個週期超額使用的 quota（extra_quota）。上個週期的 extra_quota 等於上個週期的 quota_used - quota_size = 156 - 146 = 10。所以，當前節點的 quota_size 就等於 159 - 10 = 149，和紀錄檔中的輸出完全一致。為什麼會出現 quota 超額使用的情況呢？這個後面會提到。

7. 當 m_holds_in_period 又恢復為 0 時，就意味著流控結束。流控結束後，MGR 不會完全放開 quota 的限制，否則寫入量太大，容易出現突刺。MGR 採取的是一種漸進式的恢復策略，即下一週期的 quota_size = 上一週期的 quota_size * （1 + group_replication_flow_control_release_percent / 100）。

8. group_replication_flow_control_mode 是 DISABLED ，則會將 m_quota_size 和 m_quota_used 置為 0。m_quota_size 置為 0，實際上會禁用流控。為什麼會禁用流控，這個後面會提到。

配額的作用時機

既然我們已經計算出下一週期的 m_quota_size，什麼時候使用它呢？事務提交之後，GCS 廣播事務訊息之前。

int group_replication_trans_before_commit(Trans_param *param) {
  ...
  // 判斷事務是否需要等待
  applier_module->get_flow_control_module()->do_wait();

  // 廣播事務訊息
  send_error = gcs_module->send_transaction_message(*transaction_msg);
  ...
}

接下來，我們看看 do_wait 函數的處理邏輯。

int32 Flow_control_module::do_wait() {
  DBUG_TRACE;
  // 首先載入 m_quota_size
  int64 quota_size = m_quota_size.load();
  // m_quota_used 自增加 1。
  int64 quota_used = ++m_quota_used;

  if (quota_used > quota_size && quota_size != 0) {
    struct timespec delay;
    set_timespec(&delay, 1);

    mysql_mutex_lock(&m_flow_control_lock);
    mysql_cond_timedwait(&m_flow_control_cond, &m_flow_control_lock, &delay);
    mysql_mutex_unlock(&m_flow_control_lock);
  }

  return 0;
}

可以看到，如果 quota_size 等於 0，do_wait 會直接返回，不會執行任何等待操作。這也就是為什麼當 m_quota_size 等於 0 時，會禁用流控操作。

如果 quota_used 大於 quota_size 且 quota_size 不等於 0，則意味著當前週期的配額用完了。這個時候，會呼叫 mysql_cond_timedwait 觸發等待。

這裡的 mysql_cond_timedwait 會在兩種情況下退出：

收到 m_flow_control_cond 訊號（該訊號會在 flow_control_step 函數中發出）。
超時。這裡的超時時間是 1s。

需要注意的是，m_quota_used 是自增在前，然後才進行判斷，這也就是為什麼 quota 會出現超額使用的情況。

在等待的過程中，如果使用者端是多執行緒並行寫入（且單個執行緒的下個操作會等待上個操作完成），這裡會等待多個事務，並且超額使用的事務數不會多於使用者端並行執行緒數。

所以，在上面的範例中，為什麼 quota_used（156）比 quota_size（146）多 10，這個實際上是 sysbench 並行執行緒數的數量。

接下來，我們看看範例中這 156 個事務在 do_wait 處的等待時間。

前 146 個事務的平均等待時間是 0.000035s，後 10 個事務的平均等待時間是 0.558044s。

很顯然，後 10 個事務是被流控了，最後被 flow_control_step（預設一秒執行一次）中傳送的 m_flow_control_cond 訊號釋放的。

流控的相關引數

group_replication_flow_control_mode

是否開啟流控。預設是 QUOTA，基於配額進行流控。如果設定為 DISABLED ，則關閉流控。

group_replication_flow_control_period

流控週期。有效值 1 - 60，單位秒。預設是 1。注意，各個節點的流控週期應保持一致，否則的話，就會將週期較短的節點配額作為叢集配額。

看下面這個範例，127.0.0.1:33061 這個節點的 group_replication_flow_control_period 是 10，而其它兩個節點的 group_replication_flow_control_period 是 1。

2022-08-27T19:01:50.699939+08:00 63 [Note] [MY-011726] [Repl] Plugin group_replication reported: 'Flow control - update member stats: 127.0.0.1:33061 stats certifier_queue 0, applier_queue 0 certified 217069 (1860), applied 1 (0), local 217070 (1861), quota 28566 (1857) mode=1'
2022-08-27T19:01:50.699955+08:00 63 [Note] [MY-011726] [Repl] Plugin group_replication reported: 'Flow control - update member stats: 127.0.0.1:33071 stats certifier_queue 0, applier_queue 2 certified 218744 (157), applied 218746 (165), local 0 (0), quota 28566 (1857) mode=1'
2022-08-27T19:01:50.699967+08:00 63 [Note] [MY-011726] [Repl] Plugin group_replication reported: 'Flow control - update member stats: 127.0.0.1:33081 stats certifier_queue 16383, applier_queue 0 certified 0 (0), applied 0 (0), local 0 (0), quota 28566 (1857) mode=1'
2022-08-27T19:01:50.699979+08:00 63 [Note] [MY-011727] [Repl] Plugin group_replication reported: 'Flow control: throttling to 141 commits per 10 sec, with 1 writing and 0 non-recovering members, min capacity 157, lim throttle 100'

最後，會將 127.0.0.1:33071 這個節點 1s 的配額（157 * 0.9）當作 127.0.0.1:33061 10s 的配額。

所以，我們會觀察到下面這個現象：

執行時間   TPS
19:01:50   49
19:01:51   93
19:01:52    1
19:01:53    1
19:01:54    1
19:01:55    1
19:01:56    1
19:01:57    1
19:01:58    1
19:01:59    1
19:02:00    1

127.0.0.1:33061 在頭兩秒就使用完了所有配額，導致後面的事務會等待 1s（mysql_cond_timedwait 的超時時長）才處理。因為模擬時指定的並行執行緒數是 1，所以這裡的 TPS 會是 1。

為什麼不是被 flow_control_step 中的m_flow_control_cond 訊號釋放呢？因為127.0.0.1:33061 這個節點的 group_replication_flow_control_period 是 10，所以 flow_control_step 10s 才會執行一次。

group_replication_flow_control_applier_threshold

待應用的事務數如果超過 group_replication_flow_control_applier_threshold 的設定，則會觸發流控，該引數預設是 25000。

group_replication_flow_control_certifier_threshold

待認證的事務數如果超過 group_replication_flow_control_certifier_threshold 的設定，則會觸發流控，該引數預設是 25000。

group_replication_flow_control_min_quota

group_replication_flow_control_min_recovery_quota

兩個引數都會決定當前節點下個週期的最小配額，只不過 group_replication_flow_control_min_recovery_quota 適用於新節點加入時的分散式恢復階段。group_replication_flow_control_min_quota 則適用於所有場景。如果兩者同時設定了， group_replication_flow_control_min_quota 的優先順序更高。兩者預設都為 0，即不限制。

group_replication_flow_control_max_quota

當前節點下個週期的最大配額。預設是 0，即不限制。

group_replication_flow_control_member_quota_percent

分配給當前成員的配額比例。有效值 0 - 100。預設為 0，此時，節點配額 = 叢集配額 / 上個週期寫節點的數量。

注意，這裡的寫節點指的是有實際寫操作的節點，不是僅指 PRIMARY 節點。畢竟不是所有的 PRIMARY 節點都會有寫操作。

另外，設定配額比例時，不要求所有節點的配額比例加起來等於 100。

group_replication_flow_control_hold_percent

預留配額的比例。有效值 0 - 100，預設是 10。預留的配額可用來處理落後節點積壓的事務。

group_replication_flow_control_release_percent

當流控結束後，會逐漸增加吞吐量以避免出現突刺。

下一週期的 quota_size = 上一週期的 quota_size * （1 + group_replication_flow_control_release_percent / 100）。有效值 0 - 1000，預設是 50。

總結

1. 從可用性的角度出發，不建議線上關閉流控。雖然主節點出現故障的概率很小，但墨菲定律告訴我們，任何有可能發生的事情最後一定會發生。線上上還是不要心存僥倖。

2. 流控限制的是當前節點的流量，不是其它節點的。

3. 流控引數在各節點應保持一致，尤其是 group_replication_flow_control_period。

參考資料

[1] WL#9838: Group Replication: Flow-control fine tuning: https://dev.mysql.com/worklog/task/?id=9838

[2] MySQL Group Replication流控實現分析: https://zhuanlan.zhihu.com/p/39541394