論文資訊

論文標題：DUCK: Rumour Detection on Social Media by Modelling User and Comment Propagation Networks
論文作者：Lin Tian, Xiuzhen Zhang, Jey Han Lau
論文來源：2022，NAACL
論文地址：download
論文程式碼：download

1 Introduction

　　本文的模型研究瞭如何充分利用使用者和評論資訊，對比之前的方法，有以下不同：

　　(1) we model comments both as a:

　　　　(i) stream to capture the temporal nature of evolving comments;

　　　　(ii) network by following the conversational structure (see Figure 1 for an illustration);

　　(2) our comment network uses sequence model to encode a pair of comments before feeding them to a graph network, allowing our model to capture the nuanced charac- teristics (e.g. agreement or rebuttal) exhibited by a reply;

　　(3) when modelling the users who engage with a story via graph networks, we initialise the user nodes with encodings learned from their profiles and characteristics of their 「friends」 based on their social networks.

2 Problem Statement

3 Methodology

　　總體框架：

　　包括如下幾個部分：

　　(1) comment tree: models the comment network by following the reply-to structure using a combination of BERT and graph attentional networks;
　　(2) comment chain: models the comments as a stream using transformer-based sequence models;
　　(3) user tree: incorporates social relations to model the user network using graph attentional networks;
　　(4) rumour classifier: combines the output from comment tree, comment chain and user tree to classify the source post.

　　請注意，user tree 的網路結構不同於 comment tree 的網路結構，因為前者同時捕獲 comment 和 reposts/retweets，但後者只考慮 comment（Figure 1）。

3.1 Comment Tree

　　基於 GNN 的建模 comment 之間的關係的模型通常使用的是簡單的文字特徵（bag-of-words），忽略了 comment 之間的微妙關係（"stance" or "deny"）關係。

　　所以，本文采用預訓練語言模型 BERT 和 GAT 去建模 comment tree ，具體參見 Figure 2：

　　首先，使用 BERT 去處理一對 parent-child posts ，然後使用 GAT 去建模整個 conversational strucure 。（ self-attention 在 parent-child 之間的詞產生細粒度的分析）

　　以 Figure 2 中的 comment tree 為例，這意味著我們將首先使用 BERT 處理以下幾對 comments {(0, 0),(0, 1),(0, 2),(2, 6),(2, 7),(6, 9)}：

　　　　$h_{p+q}=\mathrm{BERT}\left(\mathrm{emb}\left([C L S], c_{p},[S E P], c_{q}\right)\right)$

　　其中，$c$ 表示 text，$emb()$ 表示 embedding function，$h$ 表示由 BERT 產生的 [CLS] 標記的上下文表示。

　　為了模擬 conversational network structure ，本文使用圖注意網路 GAT。為了計算 $h_{i}^{(l+1)}$，在迭代 $l+1$ 次時對節點 $i$ 的編碼：

　　　　$\begin{array}{l}e_{i j}^{(l)} &=&\operatorname{LR}\left(a^{(l)^{T}}\left(W^{(l)} h_{i}^{(l)} \oplus W^{(l)} h_{j}^{(l)}\right)\right) \\h_{i}^{(l+1)} &=&\sigma\left(\sum\limits _{j \in \mathcal{N}(i)} \operatorname{softmax}\left(e_{i j}^{(l)}\right) z_{j}^{(l)}\right)\end{array}$

　　為了聚合節點編碼以得到一個圖表示（$\left(z_{c t}\right)$），探索了四種方法：

　　root：Uses the root encoding to represent the graph as the source post

　　　　$z_{c t}=h_{0}^{L}$

　　$\neg root$: Mean-pooling over all nodes except the root:

　　　　$z_{c t}=\frac{1}{m} \sum_{i=1}^{m} h_{i}^{L}$

　　　　where $m$ is the number of replies/comments.

　　$\Delta$ : Mean-pooling of the root node and its immediate neighbours:

　　　　$z_{c t}=\frac{1}{|\mathcal{N}(0)|} \sum_{i \in \mathcal{N}(0)} h_{i}^{L}$

　　all: Mean-pooling of all nodes:

　　　　$z_{c t}=\frac{1}{m+1} \sum_{i=0}^{m} h_{i}^{L}$

3.2 Comment Chain

　　本文按照它們釋出的順序將這些貼文建模為一個流結構，而不是一個樹結構，處理 comment chain 考慮了三種模型：

　　(1) one-tier transformer
　　(2) longformer
　　(3) two-tier transformer

3.2.1 One-tier transformer

　　給定一個源貼文 $\left(c_{0}\right)$ 和 comment $\left(\left\{c_{1}, \ldots, c_{m}\right\}\right)$，我們可以簡單地將它們連線成一個長字串，並將其提供給 BERT：

　　　　$z_{c c}=\operatorname{BERT}\left(\mathrm{emb}\left([C L S], c_{0},[S E P], c_{1}, \ldots, c_{m^{\prime}}\right)\right)$

　　其中，$m^{\prime}(<m)$ 是我們可以合併的不超過 BERT 的最大序列長度的 comment（實驗中是384個）。

3.2.2 Longformer

　　為規避序列長度的限制，實驗使用了一個 Longformer，它可以處理多達4096個子詞，允許使用大部分 comment，如果不是所有的評論。

　　Longformer 具有與 one-tier transformer 類似的架構，但使用更稀疏的注意模式來更有效地處理更長的序列。我們使用一個預先訓練過的 Longformer，並遵循與之前相同的方法來建模 comment chain：

　　　　$z_{c c}=\mathrm{LF}\left(\operatorname{emb}\left([C L S], c_{0},[S E P], c_{1}, \ldots, c_{m^{\prime \prime}}\right)\right)$

　　其中，$m^{\prime \prime} \approx m$

3.2.3 Two-tier transformer

　　解決序列長度限制的另一種方法是使用 two tiers of transformers 對 comment chain 進行建模：一層用於獨立處理貼文，另一種用於使用來自第一個 transformer 的表示來處理貼文序列。

　　　　$\begin{array}{l}h_{i} &=&\operatorname{BERT}\left(\mathrm{emb}_{1}\left([C L S], c_{i}\right)\right) \\z_{c c} &=&\operatorname{transformer}\left(\operatorname{emb}_{2}([C L S]), h_{0}, h_{1}, \ldots, h_{m}\right)\end{array}$

　　其中，BERT 和 transformer 分別表示 first-tier transformers 和 second-tier transformers。econd-tier transformers 具有與 BERT 類似的架構，但只有 2 層，其引數是隨機初始化的。

3.3 User Tree

　　我們探索了三種都是基於 GAT 建模 user network 的方法，並通過 mean-pooling 所有節點來聚合節點編碼，以生成圖表示：

　　　　$z_{u t}=\frac{1}{m+1} \sum\limits_{i=0}^{m} h_{i}^{L}$

　　這三種方法之間的主要區別在於它們如何初始化使用者節點 $\left(h_{i}^{(0)}\right)$：

　　第一種 $\mathbf{G A T_{\text {rnd }}}$ ：用隨機向量初始化使用者節點。

　　　　$h_{i}^{0}=\operatorname{random}\left[v_{1}, v_{2}, \ldots, v_{d}\right]$

　　第二種 $\mathbf{GAT _{\text {prf: }}}$ : 來自他們的 user profiles ：username, user screen name, user description, user account age 等。因此，static user node $h_{i}^{0}$ 由 $v_{i} \in \mathbb{R}^{k}$ 給出

　　　　$h_{i}^{0}=\left[v_{1}, v_{2}, \ldots, v_{k}\right]$

　　第三種 $\mathbf{GAT_{\text {prf }+\text { rel : }}}$：該方法基於使用者特徵（user profiles）及其社會關係（基於「follow」關係）通過變分圖自動編碼器 GAE 初始化使用者節點的表示。

　　前者捕捉使用源貼文的使用者，而後者是互相關注的使用者網路。

　　給定基於訓練資料構造的 social graph $G_{s}$，我們可以推匯出一個鄰接矩陣 $\mathrm{A} \in \mathbb{R}^{n \times n}$，其中 $\mathrm{n} $ 為使用者數。設 $X=\left[x_{1}, x_{2}, \ldots, x_{n}\right], x_{i} \in \mathbb{R}^{k}$，$x_{i} \in \mathbb{R}^{k}$ 為輸入節點特徵。我們的目標是學習一個變換矩陣 $\mathrm{Z} \in \mathbb{R}^{n \times d}$，它將使用者轉換為一個維數為 $d$ 的潛在空間。我們使用一個兩層的 GCN 作為編碼器。它以鄰接矩陣 $\mathrm{A}$ 和特徵矩陣 $\mathrm{X}$ 作為輸入，並生成潛在變數 $Z$ 作為輸出。解碼器由潛在變數 $\mathrm{Z}$ 之間的內積定義。我們的解碼器的輸出是一個重構的鄰接矩陣 $ \hat{A}$。從形式上講：

　　　　$\begin{array}{l}Z &=\operatorname{enc}(\mathbf{X}, \mathbf{A}) =\operatorname{GCN}\left(f\left(\operatorname{GCN}\left(\mathbf{A}, \mathbf{X} ; \theta_{1}\right)\right) ; \theta_{2}\right) \\\hat{A} &=\operatorname{dec}\left(Z, Z^{\top}\right)=\sigma\left(Z Z^{\top}\right)\end{array}$

　　$h_{i}^{(0)} \in \mathbb{R}^{d}$ 通過下述方法計算：

　　　　$h_{i}^{(0)}=\left\{\begin{array}{ll}\operatorname{ReLU}\left(W \cdot\left[v_{1}, \ldots, v_{k}\right]\right), & \text { if } \operatorname{user}_{i} \notin G_{s} \\Z_{i}, & \text { if } \operatorname{user}_{i} \in G_{s}\end{array}\right.$

　　其中，$W_{i}$ 是全連線引數，$v_{i} \in \mathbb{R}^{k}$ 是 user profiles。

3.4 Rumour Classifier

　　使用 comment tree、comment chain、user tree 分別生成的圖表示 $z_{c t}$、$z_{c c}$、$z_{u t}$ 進行謠言分類：

　　　　$\begin{array}{l}z=z_{c t} \oplus z_{c c} \oplus z_{u t} \\\hat{y}=\operatorname{softmax}\left(W_{c} z+b_{c}\right) \\\mathcal{L}=-\sum\limits _{i=1}^{n} y_{i} \log \left(\hat{y_{i}}\right)\end{array}$

　　其中，$n$ 表示訓練範例數。

4 Experiments and Results

4.1 Datasets

　　資料集統計如下：

　　we report the average performance based on 5-fold cross-validation.

　　we reserve 20% data as test and split the rest in a ratio of 4:1 for training and development partitions and report the average test performance over 5 runs (initialised with different random seeds).

4.2 Results

　　本文實驗主要回答如下問題：

Q1 [Comment tree]: Does incorporating BERT to analyse the relation between parent and child posts help modelling the comment network, and what is the best way to aggregate comment-pair encodings to represent the comment graph?
Q2 [Comment chain]: Does incorporating more comments help rumour detection when modelling them as a stream of posts?
Q3 [User tree]: Can social relations help modelling the user network?
Q4 [Overall performance]: Do the three different components complement each other and how does a combined approach compared to existing rumour detection systems?

4.2.1 Comment Tree

　　為了理解使用BERT處理一對 parent-child posts 的影響，我們提出了另一種替代方法（「unpaired」），即使用 BERT 獨立處理每個貼文，然後將其 [CLS] 表示提供給GAT。

　　　　$h_{p}=\operatorname{BERT}\left(\operatorname{emb}\left([C L S], c_{p}\right)\right)$

　　其中，$h$ 將用作 GAT 中的初始節點表示（$h^{(0)}$）。這裡報告了這個替代模型（「unpaired」）及不同的聚合方法（「root」、「¬root」、「$\bigtriangleup $」 和 「all」）的效能。

　　Comparing the aggregation methods, "all" performs the best, followed by "$\boldsymbol{\Delta}$ " and "root" (0.88 vs . 0.87 vs. 0.86 in Twitter16; 0.87 vs. 0.86 vs. 0.85 in CoAID in terms of Macro-F1). We can see that the root and its immediate neighbours contain most of the information, and not including the root node impacts the performance severely (both Twitter16 and CoAID drops to 0.80 with $\neg$ root).

　　Does processing the parent-child posts together with BERT help? The answer is evidently yes, as we see a substantial drop in performance when we process the posts independently: "unpaired" produces a macro-F1 of only 0.83 in both Twitter16 and CoAID. Given these results, our full model (DUCK) will be using "all"' as the aggregation method for computing the comment graph representation.

4.2.2 Comment Chain

　　Fig. 3 繪製了我們改變所包含的評論數量來回答 Q2 的結果：

謠言檢測（DUCK）《DUCK: Rumour Detection on Social Media by Modelling User and Comment Propagation Networks