本文為圖神經網路學習筆記，講解 ChebyNet-切比雪夫多項式近似圖折積核。歡迎在評論區與我交流👏

ChebyNet 簡介

ChebyNet 實現

對圖的鄰接矩陣進行歸一化處理得到拉普拉斯矩陣。歸一化方法有：
$\left\{ \begin{array}{rcl} L=D-A \\ L^{sym}=D^{-1/2}LD^{-1/2}\\ L^{rw}=D^{-1}L \end{array} \right.$
根據得到的歸一化拉普拉斯矩陣計算：
$\hat{L}=\frac{2}{\lambda_{max}}L-I_N$
Re-scaled 特徵值對角矩陣，將其變換到 $[- 1, 1]$ 之間：

num_nodes = x.shape[0]
norm_edge_index, norm_edge_weight = chebnet_norm_edge(edge_index, num_nodes, edge_weight, lambda_max, normalization_type=normalization_type)

利用切比雪夫多項式的迭代定義遞推計算高階項（節省大量運算），最後輸出模型結果，即多項式和 $y=\sigma(\sum\limits_{k=0}^K\theta_kT_k(\hat{L})(x))$ 計算損失或評估模型效果：

T0_x = x
T1_x = x
out = tf.matmul(T0_x, kernel[0]) # 兩個矩陣相乘 

if K > 1:
    T1_x = aggregate_neighbors(x, norm_edge_index, norm_edge_weight, gcn_mapper, sum_reducer, identity_updater)
    out += tf.matmul(T1_x, kernel[1])

# T_{n+1}=2T_n-T_{n-1}
for i in range(2, K):
    T2_x = aggregate_neighbors(T1_x, norm_edge_index, norm_edge_weight, gcn_mapper, sum_reducer, identity_updater)  # L^T_{k-1}(L^)
    T2_x = 2.0 * T2_x - T0_x
    out += tf.matmul(T2_x, kernel[i])

    T0_x, T1_x = T1_x, T2_x

if bias is not None:
    out += bias

if activation is not None:
    out += activation(out)

return out

模型構建

本教學使用的核心庫是 tf_geometric，我們用它來進行圖資料匯入、圖資料預處理及圖神經網路構建。ChebNet 的具體實現已經在上面詳細介紹，LaplacianMaxEigenvalue 獲取拉普拉斯矩陣的最大特徵值。後面使用 keras.metrics.Accuracy 評估模型效能：

import os

os.environ["CUDA_VISIBLE_DEVICES"] = "1"
import tensorflow as tf
import numpy as np
from tensorflow import keras
from tf_geometric.layers.conv.chebnet import chebNet
from tf_geometric.datasets.cora import CoraDataset
from tf_geometric.utils.graph_utils import LaplacianMaxEigenvalue
from tqdm import tqdm

使用 tf_geometric 自帶的圖結構資料介面載入 Cora 資料集：

# 載入 Cora 資料集
graph, (train_index, valid_index, test_index) = CoraDataset().load_data()

獲取圖拉普拉斯矩陣的最大特徵值：

# 獲取 lambda_max
graph_lambda_max = LaplacianMaxEigenvalue(graph.x, graph.edge_index, graph.edge_weight)

定義模型，引入 keras.layers 中的 Dropout 層隨機關閉神經元緩解過擬合。由於 Dropout 層在訓練和預測階段的狀態不同，通過引數 training 來決定是否需要 Dropout 發揮作用：

model = chebNet(64, K=3, lambda_max=graph_lambda_max()
fc = tf.keras.Sequential([
    keras.layers.Dropout(0.5), # Dropout 層隨機關閉神經元緩解過擬合
    keras.layers.Dense(num_classes)])

def forward(graph, training=False):
    h = model([graph.x, graph.edge_index, graph.edge_weight])
    h = fc(h, training=training) # 通過引數 training 來決定是否需要 Dropout 發揮作用
    return h

ChebyNet 訓練

模型的訓練與其他基於 Tensorflow 框架的模型訓練基本一致，主要步驟有定義優化器，計算誤差與梯度，反向傳播等，然後分別計算驗證集和測試集上的準確率：

# 定義優化器
optimizer = tf.keras.optimizers.Adam(learning_rate=1e-2)

best_test_acc = tmp_valid_acc = 0
for step in tqdm(range(1, 101)):
    with tf.GradientTape() as tape:
      	# 前向傳播
        logits = forward(graph, training=True)
        # 計算損失
        loss = compute_loss(logits, train_index, tape.watched_variables())

    vars = tape.watched_variables()
    grads = tape.gradient(loss, vars) # 計算梯度
    optimizer.apply_gradients(zip(grads, vars)) # 梯度下降優化

    valid_acc = evaluate(valid_index) # 計算驗證集
    test_acc = evaluate(test_index) # 計算測試集
    if test_acc > best_test_acc:
        best_test_acc = test_acc
        tmp_valid_acc = valid_acc
    print("step = {}\tloss = {}\tvalid_acc = {}\tbest_test_acc = {}".format(step, loss, tmp_valid_acc, best_test_acc))

用交叉熵損失函數計算模型損失。注意在載入 Cora 資料集時，返回值是整個圖資料以及相應的 train_index、valid_index、test_index。TAGCN 在訓練時輸入整個Graph，計算損失時通過 train_index 計算模型在訓練集上的迭代損失。因此，此時傳入的 mask_index 是 train_index。由於是多分類任務，需要將節點的標籤轉換為 one-hot 向量以便與模型輸出的結果維度對應。由於圖神經模型在小資料集上很容易過擬合，所以這裡用 $L_2$ 正則化緩解過擬合：

def compute_loss(logits, mask_index, vars):
    masked_logits = tf.gather(logits, mask_index) # 前向傳播（預測）的結果，取訓練資料部分
    masked_labels = tf.gather(graph.y, mask_index) # 真實結果，取訓練資料部分
    losses = tf.nn.softmax_cross_entropy_with_logits(
        logits=masked_logits, # 預測結果
        labels=tf.one_hot(masked_labels, depth=num_classes) # 真實結果，即標籤
    )
		# 用 L_2 正則化緩解過擬合
    kernel_vals = [var for var in vars if "kernel" in var.name]
    l2_losses = [tf.nn.l2_loss(kernel_var) for kernel_var in kernel_vals]

    # reduce_mean 計算張量的平均值；tf.add_n 列表對應元素相加
    return tf.reduce_mean(losses) + tf.add_n(l2_losses) * 5e-4

ChebyNet 評估

評估模型效能時只需傳入 valid_mask 或 test_mask，通過 tf.gather 函數可以拿出驗證集或測試集在模型上的預測結果與真實標籤，用 keras自帶的 keras.metrics.Accuracy 計算準確率：

def evaluate(mask):
    logits = forward(graph) # 前向傳播結果
    logits = tf.nn.log_softmax(logits, axis=-1) # 假設函數處理
    masked_logits = tf.gather(logits, mask) # 預測結果
    masked_labels = tf.gather(graph.y, mask) # 真實標籤

    # 返回預測結果向量最大值的索引
    y_pred = tf.argmax(masked_logits, axis=-1, output_type=tf.int32)

    accuracy_m = keras.metrics.Accuracy()
    accuracy_m.update_state(masked_labels, y_pred)
    return accuracy_m.result().numpy() # 準確度結果轉換為 numpy 返回

執行結果

 0%|          | 0/100 [00:00<?, ?it/s]step = 1	loss = 1.9817407131195068	valid_acc = 0.7139999866485596	best_test_acc = 0.7089999914169312
  2%|▏         | 2/100 [00:01<00:55,  1.76it/s]step = 2	loss = 1.6069653034210205	valid_acc = 0.75	best_test_acc = 0.7409999966621399
step = 3	loss = 1.2625869512557983	valid_acc = 0.7720000147819519	best_test_acc = 0.7699999809265137
  4%|▍         | 4/100 [00:01<00:48,  1.98it/s]step = 4	loss = 0.9443040490150452	valid_acc = 0.7760000228881836	best_test_acc = 0.7749999761581421
  5%|▌         | 5/100 [00:02<00:46,  2.06it/s]step = 5	loss = 0.7023431062698364	valid_acc = 0.7760000228881836	best_test_acc = 0.7770000100135803
  ...
96	loss = 0.0799005851149559	valid_acc = 0.7940000295639038	best_test_acc = 0.8080000281333923
 96%|█████████▌| 96/100 [00:43<00:01,  2.31it/s]step = 97	loss = 0.0768655389547348	valid_acc = 0.7940000295639038	best_test_acc = 0.8080000281333923
 97%|█████████▋| 97/100 [00:43<00:01,  2.33it/s]step = 98	loss = 0.0834992527961731	valid_acc = 0.7940000295639038	best_test_acc = 0.8080000281333923
 99%|█████████▉| 99/100 [00:44<00:00,  2.34it/s]step = 99	loss = 0.07315651327371597	valid_acc = 0.7940000295639038	best_test_acc = 0.8080000281333923
100%|██████████| 100/100 [00:44<00:00,  2.23it/s]
step = 100	loss = 0.07698118686676025	valid_acc = 0.7940000295639038	best_test_acc = 0.8080000281333923

完整程式碼見【demo_chebynet.py】。

有幫助的話點個贊加關注吧 😃