【碼上實戰】【立體匹配系列】經典AD-Census: （5）掃描線優化

同學們國慶假期快樂呀！瀟灑7天（帶娃7天），難得坐下來更新部落格。

下載完整原始碼，點選進入: https://github.com/ethan-li-coding/AD-Census
歡迎同學們在Github專案裡討論！

接上篇十字交叉域代價聚合，本篇的內容是AD-Census的掃描線優化步驟，實際上，該步的思路和SGM的程式碼聚合是基本一樣的，只不過在P1/P2引數設定上做了一些修改。確實，SGM的P1、P2設定策略過於簡單，優點是魯棒性高，對大部分資料都能得到一個還不錯的視差結果，但明顯的弊端就是很難找到一組特別好的引數組合，使特定應用場景的資料達到比較完美的狀態，P1/P2的設定對整體視差效果尤其是邊緣處的視差很關鍵，所以AD-Census的改進方向是有實際意義的。

我們不妨直接先看下AD-Census掃描線優化的成果：

代價計算

代價聚合

掃描線優化

顯然，掃描線優化後的視差圖比代價聚合後的視差圖更加完整，錯誤值更少。當然這並不能說明AD-Census的引數改進就是有效的，只能說明掃描線優化步驟是有效的。

我們來看編碼介紹吧！

文章目錄

演演算法

同樣的，演演算法原理請看博文：

經典AD-Census: （3）掃描線優化（Scanline Optimization）

這裡我就不再展開講優化的原理了，和SGM（SemiGlobalMatching）的代價聚合策略確實是一模一樣，看博主往期部落格就行了，AD-Census採用4方向的掃描線優化，即上下左右4個方向。

AD-Census所做的修改在於 $P_1$ 和 $P_2$ 值的設定方式，在SGM中， $P_1$ 、 $P_2'$ 是預設的固定值，實際使用的 $P_2$ 是根據左檢視相鄰兩個畫素的亮度差值而實時調整的，調整公式為 $P_2=P_2'/(I_p-I_q)$ 。

而在Ad-Census中， $P_1$ 、 $P_2$ 不只是和左檢視的相鄰畫素顏色差 $D_1=D_c(p,p-r)$ 有關，而且和右檢視對應同名點的相鄰畫素顏色差 $D_2=D_c(pd,pd-r)$ 有關。

（注1：AD-Census演演算法預設輸入彩色圖，所以是算顏色差，如果是輸入灰度圖，則是亮度差，顏色差的定義是 $D_c(p_l,p)=max_{i=R,G,B}|I_i(p_l)-I_i(p)|$ ，即三個顏色分量差值的最大值）
（注2： $p d$ 實際就是畫素 $p$ 通過視差 $d$ 找到的右檢視上的同名點 $q = p - d$ ）
（注3： $p - r$ 代表聚合方向上的上一個畫素，比如從左到右聚合，則 $p - r$ 就是 $p - 1$ ；從右到左聚合，則 $p - r$ 就是 $p + 1$ ）

具體設定規則如下：

$P_1=Π_1,P_2=Π_2, if D_1<τ_{SO},D_2<τ_{SO}$
$P_1=Π_1/4,P_2=Π_2/4, if D_1<τ_{SO},D_2>τ_{SO}$
$P_1=Π_1/4,P_2=Π_2/4, if D_1>τ_{SO},D_2<τ_{SO}$
$P_1=Π_1/10,P_2=Π_2/10, if D_1>τ_{SO},D_2>τ_{SO}$

$Π_1,Π_2$ 是設定的固定閾值， $τ_{SO}$ 是設定的顏色差閾值。

程式碼實現

類設計

成員函數

同樣，我們用一個掃描線優化器類ScanlineOptimizer來實現該功能。放在檔案scanline_optimizer.h/scanline_optimizer.cpp中。

/**
 * \brief 掃描線優化器
 */
class ScanlineOptimizer {
public:
	ScanlineOptimizer();
	~ScanlineOptimizer();
}

在公有成員函數的設計上，第一類介面是必不可少的 設定資料SetData 以及 設定引數SetParam ，完成演演算法的輸入。第二類就是優化功能介面 Optimize 。

而具體的優化子步驟，我們放在私有成員函數列表裡，包括水平方向聚合 CostAggregateLeftRight 以及豎直方向聚合 CostAggregateUpDown。

同時，演演算法需要的一個小功能顏色距離計算函數 ColorDist，也放在私有函數中。

所有成員函數的宣告程式碼如下：

public:
	ScanlineOptimizer();
	~ScanlineOptimizer();
	
	/**
	 * \brief 設定資料
	 * \param img_left		// 左影像資料，三通道 
	 * \param img_right 	// 右影像資料，三通道
	 * \param cost_init 	// 初始代價陣列
	 * \param cost_aggr 	// 聚合代價陣列
	 */
	void SetData(const uint8* img_left, const uint8* img_right, float32* cost_init, float32* cost_aggr);

	/**
	 * \brief 
	 * \param width			// 影像寬
	 * \param height		// 影像高
	 * \param min_disparity	// 最小視差
	 * \param max_disparity // 最大視差
	 * \param p1			// p1
	 * \param p2			// p2
	 * \param tso			// tso
	 */
	void SetParam(const sint32& width,const sint32& height, const sint32& min_disparity, const sint32& max_disparity, const float32& p1, const float32& p2, const sint32& tso);

	/**
	 * \brief 優化 */
	void Optimize();

private:
	/**
	* \brief 左右路徑聚合 → ←
	* \param cost_so_src		輸入，SO前代價資料
	* \param cost_so_dst		輸出，SO後代價資料
	* \param is_forward			輸入，是否為正方向（正方向為從左到右，反方向為從右到左）
	*/
	void CostAggregateLeftRight(const float32* cost_so_src, float32* cost_so_dst, bool is_forward = true);

	/**
	* \brief 上下路徑聚合 ↓ ↑
	* \param cost_so_src		輸入，SO前代價資料
	* \param cost_so_dst		輸出，SO後代價資料
	* \param is_forward			輸入，是否為正方向（正方向為從上到下，反方向為從下到上）
	*/
	void CostAggregateUpDown(const float32* cost_so_src, float32* cost_so_dst, bool is_forward = true);

	/** \brief 計算顏色距離 */
	inline sint32 ColorDist(const ADColor& c1, const ADColor& c2) {
		return std::max(abs(c1.r - c2.r), std::max(abs(c1.g - c2.g), abs(c1.b - c2.b)));
	}

為每個函數都寫了清晰的註釋，便於快速理解。此外計算顏色距離的函數為行內函式，宣告的同時也定義實現了它。

成員變數

成員變數全部都被設計為私有，僅在演演算法內部使用，他們是影象尺寸、影像資料、代價資料（初始/聚合）、演演算法引數等。

private:
	/** \brief 影象尺寸 */
	sint32	width_;
	sint32	height_;

	/** \brief 影像資料 */
	const uint8* img_left_;
	const uint8* img_right_;
	
	/** \brief 初始代價陣列 */
	float32* cost_init_;
	/** \brief 聚合代價陣列 */
	float32* cost_aggr_;

	/** \brief 最小視差值 */
	sint32 min_disparity_;
	/** \brief 最大視差值 */
	sint32 max_disparity_;
	/** \brief 初始的p1值 */
	float32 so_p1_;
	/** \brief 初始的p2值 */
	float32 so_p2_;
	/** \brief tso閾值 */
	sint32 so_tso_;

類實現

由於SetData和SetParam比較簡單，程式碼量也很少，所以就不做介紹了，大家看程式碼就懂了。這裡就介紹下掃描線優化的兩個子步驟 CostAggregateLeftRight和 CostAggregateUpDown。

實際上，我是直接把SGM的代價聚合程式碼搬過來，修改 $P_1$ 和 $P_2$ 值的計算方式就行了。如下：

void ScanlineOptimizer::CostAggregateLeftRight(const float32* cost_so_src, float32* cost_so_dst, bool is_forward)
{
	const auto width = width_;
	const auto height = height_;
	const auto min_disparity = min_disparity_;
	const auto max_disparity = max_disparity_;
	const auto p1 = so_p1_;
	const auto p2 = so_p2_;
	const auto tso = so_tso_;
	
	assert(width > 0 && height > 0 && max_disparity > min_disparity);

	// 視差範圍
	const sint32 disp_range = max_disparity - min_disparity;

	// 正向(左->右) ：is_forward = true ; direction = 1
	// 反向(右->左) ：is_forward = false; direction = -1;
	const sint32 direction = is_forward ? 1 : -1;

	// 聚合
	for (sint32 y = 0u; y < height; y++) {
		// 路徑頭為每一行的首(尾,dir=-1)列畫素
		auto cost_init_row = (is_forward) ? (cost_so_src + y * width * disp_range) : (cost_so_src + y * width * disp_range + (width - 1) * disp_range);
		auto cost_aggr_row = (is_forward) ? (cost_so_dst + y * width * disp_range) : (cost_so_dst + y * width * disp_range + (width - 1) * disp_range);
		auto img_row = (is_forward) ? (img_left_ + y * width * 3) : (img_left_ + y * width * 3 + 3 * (width - 1));
		const auto img_row_r = img_right_ + y * width * 3;
		sint32 x = (is_forward) ? 0 : width - 1;

		// 路徑上當前顏色值和上一個顏色值
		ADColor color(img_row[0], img_row[1], img_row[2]);
		ADColor color_last = color;

		// 路徑上上個畫素的代價陣列，多兩個元素是為了避免邊界溢位（首尾各多一個）
		std::vector<float32> cost_last_path(disp_range + 2, Large_Float);

		// 初始化：第一個畫素的聚合代價值等於初始代價值
		memcpy(cost_aggr_row, cost_init_row, disp_range * sizeof(float32));
		memcpy(&cost_last_path[1], cost_aggr_row, disp_range * sizeof(float32));
		cost_init_row += direction * disp_range;
		cost_aggr_row += direction * disp_range;
		img_row += direction * 3;
		x += direction;

		// 路徑上上個畫素的最小代價值
		float32 mincost_last_path = Large_Float;
		for (auto cost : cost_last_path) {
			mincost_last_path = std::min(mincost_last_path, cost);
		}

		// 自方向上第2個畫素開始按順序聚合
		for (sint32 j = 0; j < width - 1; j++) {
			color = ADColor(img_row[0], img_row[1], img_row[2]);
			const uint8 d1 = ColorDist(color, color_last);
			uint8 d2 = d1;
			float32 min_cost = Large_Float;
			for (sint32 d = 0; d < disp_range; d++) {
				const sint32 xr = x - d;
				if (xr > 0 && xr < width - 1) {
					const ADColor color_r = ADColor(img_row_r[3 * xr], img_row_r[3 * xr + 1], img_row_r[3 * xr + 2]);
					const ADColor color_last_r = ADColor(img_row_r[3 * (xr - direction)],
						img_row_r[3 * (xr - direction) + 1],
						img_row_r[3 * (xr - direction) + 2]);
					d2 = ColorDist(color_r, color_last_r);
				}

				// 計算P1和P2
				float32 P1(0.0f), P2(0.0f);
				if (d1 < tso && d2 < tso) {
					P1 = p1; P2 = p2;
				}
				else if (d1 < tso && d2 >= tso) {
					P1 = p1 / 4; P2 = p2 / 4;
				}
				else if (d1 >= tso && d2 < tso) {
					P1 = p1 / 4; P2 = p2 / 4;
				}
				else if (d1 >= tso && d2 >= tso) {
					P1 = p1 / 10; P2 = p2 / 10;
				}

				// Lr(p,d) = C(p,d) + min( Lr(p-r,d), Lr(p-r,d-1) + P1, Lr(p-r,d+1) + P1, min(Lr(p-r))+P2 ) - min(Lr(p-r))
				const float32  cost = cost_init_row[d];
				const float32 l1 = cost_last_path[d + 1];
				const float32 l2 = cost_last_path[d] + P1;
				const float32 l3 = cost_last_path[d + 2] + P1;
				const float32 l4 = mincost_last_path + P2;

				float32 cost_s = cost + static_cast<float32>(std::min(std::min(l1, l2), std::min(l3, l4)));
				cost_s /= 2;

				cost_aggr_row[d] = cost_s;
				min_cost = std::min(min_cost, cost_s);
			}

			// 重置上個畫素的最小代價值和代價陣列
			mincost_last_path = min_cost;
			memcpy(&cost_last_path[1], cost_aggr_row, disp_range * sizeof(float32));

			// 下一個畫素
			cost_init_row += direction * disp_range;
			cost_aggr_row += direction * disp_range;
			img_row += direction * 3;
			x += direction;

			// 畫素值重新賦值
			color_last = color;
		}
	}
}

如果不瞭解聚合程式碼，可以看我此前部落格：

編碼實現經典SGM：（3）代價聚合

本篇我們重點看下P1和P2的計算方式：

我們首先在輪到每個畫素時，計算了左檢視上它與上一個畫素的顏色距離（顏色差） $d_1$ ：

const uint8 d1 = ColorDist(color, color_last);

然後在遍歷畫素每個視差時，計算右檢視對應畫素與其上一個畫素的顏色距離 $d_2$ 。

const sint32 xr = x - d;
if (xr > 0 && xr < width - 1) {
	const ADColor color_r = ADColor(img_row_r[3 * xr], img_row_r[3 * xr + 1], img_row_r[3 * xr + 2]);
	const ADColor color_last_r = ADColor(img_row_r[3 * (xr - direction)],
		img_row_r[3 * (xr - direction) + 1],
		img_row_r[3 * (xr - direction) + 2]);
	d2 = ColorDist(color_r, color_last_r);
}

接下來根據 $d_1$ 和 $d_2$ 與閾值的比較情況，判定為四種情況中的某一種，計算P1和P2的值。

// 計算P1和P2
float32 P1(0.0f), P2(0.0f);
if (d1 < tso && d2 < tso) {
	P1 = p1; P2 = p2;
}
else if (d1 < tso && d2 >= tso) {
	P1 = p1 / 4; P2 = p2 / 4;
}
else if (d1 >= tso && d2 < tso) {
	P1 = p1 / 4; P2 = p2 / 4;
}
else if (d1 >= tso && d2 >= tso) {
	P1 = p1 / 10; P2 = p2 / 10;
}

其中，小寫的p1、p2，以及tso都是輸入的演演算法引數。

const auto p1 = so_p1_;
const auto p2 = so_p2_;
const auto tso = so_tso_;

豎直方向的程式碼我就不貼了，除了方向不同，和水平方向並無其他區別，照葫蘆畫瓢。

在公有的優化介面 Optimize 內，只需要依次呼叫四個方向的優化函數就行了。

void ScanlineOptimizer::Optimize()
{
	if (width_ <= 0 || height_ <= 0 ||
		img_left_ == nullptr || img_right_ == nullptr ||
		cost_init_ == nullptr || cost_aggr_ == nullptr) {
		return;
	}
	
	// 4方向掃描線優化
	// 模組的首次輸入是上一步代價聚合後的資料，也就是cost_aggr_
	// 我們把四個方向的優化按次序進行，並利用cost_init_及cost_aggr_間次儲存臨時資料，這樣不用開闢額外的記憶體來儲存中間結果
	// 模組的最終輸出也是cost_aggr_
	
	// left to right
	CostAggregateLeftRight(cost_aggr_, cost_init_, true);
	// right to left
	CostAggregateLeftRight(cost_init_, cost_aggr_, false);
	// up to down
	CostAggregateUpDown(cost_aggr_, cost_init_, true);
	// down to up
	CostAggregateUpDown(cost_init_, cost_aggr_, false);
}

這裡用了一個小技巧，即交替使用cost_aggr和cost_init，不用額外開闢四個方向的代價陣列，只用兩個代價資料即完成整個優化操作。

實驗

我們做了三組實驗，一組是隻做左右水平方向的掃描線優化，一組是隻做上下豎直方向的掃描線優化，剩下一組是做四個方向的優化。我們來看看效果。

代價聚合

水平方向優化

豎直方向優化

4方向優化

看上去，只做水平或者豎直優化，視差圖已有明顯的改進，但單方向的優化會存在方向條紋效應，而4方向的優化結果則能夠消除這一現象，達到更佳的狀態。

最後，我們再貼一下文章開頭的實驗圖：

代價計算

代價聚合

掃描線優化

好了，本篇到此結束，下一篇將為大家帶來的是後處理部分。感謝觀看！

下載AD-Census完整原始碼，點選進入: https://github.com/ethan-li-coding/AD-Census
歡迎同學們在Github專案裡討論，如果覺得博主程式碼品質不錯，右上角給顆星！感謝！

博主簡介：
Ethan Li 李迎鬆（知乎：李迎鬆）
武漢大學攝影測量與遙感專業博士
主方向立體匹配、三維重建
2019年獲測繪科技進步一等獎（省部級）

愛三維，愛分享，愛開源
GitHub： https://github.com/ethan-li-coding （歡迎follow和star）

個人微信：

歡迎交流！

關注博主不迷路，感謝！
部落格主頁：https://ethanli.blog.csdn.net

Ethan Li 李迎鬆

CSDN認證部落格專家立體視覺工學博士部落格專家

武漢大學攝影測量與遙感專業博士
主方向立體匹配、三維重建
2019年國家測繪科技進步一等獎

個人微訊號：EthanYs6，歡迎交流

我正在做一些立體視覺的程式碼開源工作，歡迎存取我的GitHub ：
https://github.com/ethan-li-coding（歡迎follow和star）

知識的傳播是無邊界的，願遠隔千里的我們成為朋友！