最近有個同事對字串加索引,加完後,發現多了個奇奇怪怪的數位
執行的SQL如下:
alter table string_index_test add index `idx_name` (`name`) USING BTREE;
這個奇怪數位就是191,它很是疑惑,也沒指定索引的長度
通過檢視MySQL官方檔案
InnoDB has a maximum index length of 767 bytes for tables that use COMPACT or REDUNDANT row format, so for utf8mb3 or utf8mb4 columns, you can index a maximum of 255 or 191 characters, respectively. If you currently have utf8mb3 columns with indexes longer than 191 characters, you must index a smaller number of characters.
In an InnoDB table that uses COMPACT or REDUNDANT row format, these column and index definitions are legal:
col1 VARCHAR(500) CHARACTER SET utf8, INDEX (col1(255))
To use utf8mb4 instead, the index must be smaller:
col1 VARCHAR(500) CHARACTER SET utf8mb4, INDEX (col1(191))
大概意思就是InnoDB最大索引長度為 767 位元組數,用的編碼是utf8mb4,則可以儲存191個字元(767/4 約等於 191),編碼欄位長度超出最大索引長度後MySQL 預設在普通索引追加了191
對字串加索引?
一般情況下,是不建議在字串加索引,佔空間
如果一定要加,建議可以指定長度,前提是字串前面部分割區分度好的話,此時這類索引就叫字首索引
區分度不好的話,很容易發生碰撞,進而引發一系列問題
我們再通過執行計劃來分析一波
上面分別演示了字首索引和普通索引在只有where條件、order by和group by不同執行情況,可以看到Extra的說明,字首索引只有where條件,無法使用覆蓋索引,order by會使用filesort,group by會使用temporary和filesort
總的來說,字首索引無法使用覆蓋索引,進而導致order by和group by要使用檔案排序,甚至臨時表字首索引有這麼些問題,不指定長度?怎麼處理?
準備了單表100W的資料進行測試
使用效能壓力測試工具mysqlslap
效能測試指令碼
mysqlslap -uroot -p --concurrency=100,200 --iterations=1 --number-of-queries=1 --create-schema=test --query=C:\xxx\query.sql
–concurrency=100,200 測試並行的執行緒數/使用者端數,第一次100,第二次200
–iterations=1 指定測試重複次數1次
–number-of-queries=1 指定每個執行緒執行的 SQL 語句數量上限(不精確)
–create-schema=test 指定查詢的資料庫test
1、不加索引
查詢的SQL:SELECT SQL_NO_CACHE * FROM string_index_test WHERE name=‘forlan’;
Benchmark
Average number of seconds to run all queries: 8.328 seconds
Minimum number of seconds to run all queries: 8.328 seconds
Maximum number of seconds to run all queries: 8.328 seconds
Number of clients running queries: 100
Average number of queries per client: 0
Benchmark
Average number of seconds to run all queries: 18.078 seconds
Minimum number of seconds to run all queries: 18.078 seconds
Maximum number of seconds to run all queries: 18.078 seconds
Number of clients running queries: 200
Average number of queries per client: 0
2、加字串索引
alter table string_index_test add index idx_name
(name
) USING BTREE;
查詢的SQL:SELECT SQL_NO_CACHE * FROM string_index_test WHERE name=‘forlan’;
Benchmark
Average number of seconds to run all queries: 0.250 seconds
Minimum number of seconds to run all queries: 0.250 seconds
Maximum number of seconds to run all queries: 0.250 seconds
Number of clients running queries: 100
Average number of queries per client: 0
Benchmark
Average number of seconds to run all queries: 1.438 seconds
Minimum number of seconds to run all queries: 1.438 seconds
Maximum number of seconds to run all queries: 1.438 seconds
Number of clients running queries: 200
Average number of queries per client: 0
3、使用CRC32建立索引
CRC全稱為Cyclic Redundancy Check,又叫回圈冗餘校驗。
CRC32是CRC演演算法的一種,返回值的範圍0~2^32-1,使用bigint儲存
加一個name_crc32列,建立這個列的所有,索引空間小很多,利用整型加速查詢
加索引:alter table string_index_test add index idx_nam_crc32
(name_crc32
) USING BTREE;
查詢的SQL:SELECT SQL_NO_CACHE * FROM string_index_test WHERE name_crc32=CRC32(‘forlan’) and name=‘forlan’;因為CRC32存在發生碰撞,所以加上name條件,才能篩選出正確的資料
Benchmark
Average number of seconds to run all queries: 0.266 seconds
Minimum number of seconds to run all queries: 0.266 seconds
Maximum number of seconds to run all queries: 0.266 seconds
Number of clients running queries: 100
Average number of queries per client: 0
Benchmark
Average number of seconds to run all queries: 0.390 seconds
Minimum number of seconds to run all queries: 0.390 seconds
Maximum number of seconds to run all queries: 0.390 seconds
Number of clients running queries: 200
Average number of queries per client: 0