PostgreSQL的查詢技巧: 零除, GENERATED STORED, COUNT DISTINCT, JOIN和陣列LIKE

2022-07-09 12:00:50

零除的處理

NULLIF(col, 0)可以避免複雜的WHEN...CASE判斷, 例如

ROUND(COUNT(view_50.amount_in)::NUMERIC / NULLIF(COUNT(view_50.amount_out)::NUMERIC, 0),2) AS out_divide_in,

使用 COLA / NULLIF(COLB,0) 後, 如果 COLB 為0, 產生的輸出就是 NULL

GENERATED 欄位, GENERATED..STORED

對於讀多寫少的表, 這是一個高效的效能提升方法, 對已知表可以增加Generated欄位, 這些欄位唯讀, 自動計算賦值, 可以像普通欄位一樣參與查詢, 不需要在查詢中實時計算, 是一種典型的使用空間換時間的優化方式.

ALTER TABLE "bank_card"
    ADD COLUMN "card_num_in" varchar(255) 	GENERATED ALWAYS AS (CASE WHEN direction = 'IN' THEN card_num ELSE NULL END) STORED,
    ADD COLUMN "card_num_out" varchar(255) 	GENERATED ALWAYS AS (CASE WHEN direction = 'OUT' THEN card_num ELSE NULL END) STORED,
    ADD COLUMN "amount_in" numeric(53,2) 	GENERATED ALWAYS AS (CASE WHEN direction = 'IN' THEN amount ELSE NULL END) STORED,
    ADD COLUMN "amount_out" numeric(53,2) 	GENERATED ALWAYS AS (CASE WHEN direction = 'OUT' THEN amount ELSE NULL END) STORED;

注意

  • PostgreSQL 14 只有 STORED 型別的欄位, 還不能使用 VIRTUAL 型別
  • 這樣的欄位是唯讀的, INSERT 的時候不能往這些欄位寫入
  • GENERATED 欄位不帶索引, 如果基於帶索引的欄位建立 GENERATED 欄位, 在 GENERATED 欄位上檢索, 效能可能反而更差, 可以通過給 GENERATED 欄位建索引解決.

COUNT DISTINCT 優化

COUNT DISTINCT 的效能問題

COUNT DISTINCT 的效能是PostgreSQL中長期存在的問題, 在版本14中尚未解決. 在資料量大的時候, 這個查詢會很慢, 千萬級別的表可能需要10秒左右才能返回結果

SELECT
	COUNT(DISTINCT field_1)
FROM
	table_1

原因連結

count(distinct ...) always sorts, rather than using a hash, to do its work. I don't think that there is any fundamental reason that it could not be changed to allow it to use hashing, it just hasn't been done yet. It is complicated by the fact that you can have multiple count() expressions in the same query which demand sorting/grouping on different columns.

PostgreSQL 的 count(distinct ...) 的實現方式是排序而不是使用 hash, 所以速度很慢. 應該要換成 hash 方式, 只是因為各種原因還沒有實現.

規避途徑一: 通過 COUNT 子查詢

使用下面的方式, 查詢時間能縮短一半以上

SELECT
	COUNT(col)
FROM (
	SELECT DISTINCT field_1 AS col FROM table_1
) TEMP

規避途徑二: 通過 COUNT_DISTINCT 擴充套件

針對這個效能問題的擴充套件 count_distinct, 安裝之後可以使用COUNT_DISTINCT()函數代替COUNT(DISTINCT ...), 但是缺點是費記憶體, 而且對引數有長度限制.

規避途徑三: 通過 GROUP BY

使用GROUP BY代替DISTINCT, 下面的例子, 對 field_1 和 field_2 建聯合索引, 速度會非常快

SELECT COUNT(field_2), field_1, field_2
FROM table_1
GROUP BY field_1, field_2

對於複雜場景, 可以對 GROUP BY 之後的結果建立檢視, 而後以子查詢的形式取值

優化JOIN效能

JOIN查詢, 需要限定JOIN的範圍, 例如對於一個翻頁查詢, 需要對翻頁的結果通過JOIN掛接大量屬性的, 翻頁結果通過LEFT JOIN連線到多個屬性表, 就應該將翻頁結果限制數量後, 再進行關聯, 這樣效能會好很多, 例如

Preparing : SELECT
"view_46"."id",
"view_46"."name",
"label_view6"."labels" AS "1___label",
"label_view7"."labels" AS "21022___label",
"label_view8"."labels" AS "21023___label",
"label_view9"."labels" AS "50197___label" 
FROM
    -- 這行是關鍵, 因為主體在ID上有索引, 偏移查詢是很快的, 先限制結果集大小, 然後再進行JOIN
	( SELECT * FROM "view_46" ORDER BY ID ASC LIMIT 10 OFFSET 14270 ) AS "view_46"
	LEFT JOIN "label_view" AS "label_view6" ON (
		"label_view6"."item_type" = '1' 
		AND "label_view6"."item_name" = '1' 
		AND "label_view6"."item_attr" = '2' 
		AND "label_view6"."item_id" = "view_46"."id" :: TEXT 
	)
	LEFT JOIN "label_view" AS "label_view7" ON (
		"label_view7"."item_type" = '1' 
		AND "label_view7"."item_name" = '21022' 
		AND "label_view7"."item_attr" = '2' 
		AND "label_view7"."item_id" = "view_46"."id" :: TEXT 
	)
	LEFT JOIN "label_view" AS "label_view8" ON (
		"label_view8"."item_type" = '1' 
		AND "label_view8"."item_name" = '21023' 
		AND "label_view8"."item_attr" = '2' 
		AND "label_view8"."item_id" = "view_46"."id" :: TEXT 
	)
	LEFT JOIN "label_view" AS "label_view9" ON (
		"label_view9"."item_type" = '1' 
		AND "label_view9"."item_name" = '50197' 
		AND "label_view9"."item_attr" = '2' 
		AND "label_view9"."item_id" = "view_46"."id" :: TEXT 
	) 
ORDER BY
ID ASC

LIKE ARRAY的用法

PostgreSQl 的LIKE用法

LIKE

  • LIKE
  • NOT LIKE
  • LIKE ANY(ARRAY[]) 如果需要相似任意一個引數, 需要使用這個語法
  • NOT LIKE ALL(ARRAY[]) 如果想達到不相似任意一個引數, 需要用這個語法

ILIKE

ILIKE是不區分大小寫的LIKE

  • ILIKE
  • NOT ILIKE
  • ILIKE ANY(ARRAY[])
  • NOT ILIKE ALL(ARRAY[])