【MySQL】DDL因Waiting for table metadata lock卡住

在資料庫空閒時間，對錶做碎片整理：

alter table my_abc engine=innodb;

發現對談被阻塞，顯示狀態是：

Waiting for table metadata lock

手動斷開alter操作後，通過show processlist檢視：

> show processlist;
+--------+--------+---------------------+-----------+---------+-------+----------+------------------+----------+
| Id     | User   | Host                | db        | Command | Time  | State    | Info             | Progress |
+--------+--------+---------------------+-----------+---------+-------+----------+------------------+----------+
| 489669 | pig    | 119.119.3.221:60226 | pig       | Sleep   |     4 |          | NULL             |    0.000 |
| 489673 | xxx    | 119.119.3.172:41394 | abced_xxx | Sleep   |   120 |          | NULL             |    0.000 |
| 498227 | xxx    | 119.119.3.172:48916 | abced_xxx | Sleep   | 28303 |          | NULL             |    0.000 |
| 502773 | pig    | 119.119.3.172:47752 | pig       | Sleep   | 26139 |          | NULL             |    0.000 |
| 519545 | root   | localhost           | NULL      | Sleep   | 15994 |          | NULL             |    0.000 |
+--------+--------+---------------------+-----------+---------+-------+----------+------------------+----------+

可以看到對談都處於sleep狀態。但是一旦執行alter操作，就會卡在那裡，等待後設資料鎖。

出現Waiting for table metadata lock這種情況多半是和事務有關，要麼是一個長事務在執行，要麼是事務沒有提交造成的

先檢視下是否開啟了自動提交：

> show global variables like '%autocommit%';
+------------------------+-------+
| Variable_name          | Value |
+------------------------+-------+
| autocommit             | ON    |
| wsrep_retry_autocommit | 1     |
+------------------------+-------+
2 rows in set (0.001 sec)

接下來，在檢視一下是否有事務正在執行：

> select * from information_schema.innodb_trx \G;
*************************** 1. row ***************************
                    trx_id: 0
                 trx_state: RUNNING
               trx_started: 2022-08-29 10:10:37
     trx_requested_lock_id: NULL
          trx_wait_started: NULL
                trx_weight: 0
       trx_mysql_thread_id: 498227
                 trx_query: NULL
       trx_operation_state: 
         trx_tables_in_use: 0
         trx_tables_locked: 0
          trx_lock_structs: 0
     trx_lock_memory_bytes: 1128
           trx_rows_locked: 0
         trx_rows_modified: 0
   trx_concurrency_tickets: 0
       trx_isolation_level: REPEATABLE READ
         trx_unique_checks: 1
    trx_foreign_key_checks: 1
trx_last_foreign_key_error: NULL
          trx_is_read_only: 0
trx_autocommit_non_locking: 0
1 row in set (0.001 sec)

ERROR: No query specified

從這裡看到，的確有一個事務在執行。從上班到下班還沒結束。

這裡就要找開發同學問問應用到底是在幹什麼了，解決了這個問題，順便還解決了最近業務反應有些功能慢的問題。

此外，information_schema.innodb_trx中不會記錄執行失敗的事務，但是在這個執行失敗的事務回滾前，它依然持有metadata lock，所以DDL操作依然會被阻塞。這個時候可以通過查詢performance_schema.events_statements_current表來找到相關的語句和對談資訊，將其殺死。為了減少metadata lock帶來的危害，設定一個合理的lock_wait_timeout比較重要，這個值預設是365天，可以根據自身業務來設定，避免長時間的metadata lock等待。

總結一下：

·長時間執行的事務很危險，需要多關注。

·要關注autocommit是否開啟

·使用事務需要小心，記得 commit，捕獲異常 rollback

·做DDL操作前先檢查一遍innodb_trx

·設定合理的lock_wait_timeout