# 论文阅读和分析:Masked Label Prediction: Unified Message Passing Model for Semi-Supervised Classification

### 提出了一个统一消息传递模型（UniMP）

（a）将节点特征传播与标签相结合；

UniMP在训练和推理阶段同时使用节点特征和标签。标签使用嵌入技术将部分节点标签从一个 one-hot类型标签转换为密集的类向量节点特征。多层Graph Transformer网络将节点特征和标签作为输入，在节点之间进行信息传播。因此，每个节点可以聚合来自其邻居的特征和标签信息。

（b）屏蔽标签预测。

#### Graph Neural Networks：

l

l

l层的特征传播：

D

D

D是正则化邻接矩阵，A是邻接矩阵，

H

l

H^l

Hl

l

l

l层的特征表示，

σ

sigma

σ是激活函数，

W

l

W^l

Wl

l

l

l层的可学习权重；

#### Label propagation algorithms

Y

(

0

)

^

hat{Y^{(0)}}

Y(0)^，它由一个one-hot标签指示向量

y

i

0

^

hat{y_i^{0}}

yi0^（用于标记节点）或零向量（用于未标记节点）组成。LPA的简单迭代方程公式如下：

### 算法：geometric.nn开源实现

x

i

=

W

1

x

i

+

j

N

(

i

)

α

i

,

j

W

2

x

j

,

mathbf{x}^{prime}_i = mathbf{W}_1 mathbf{x}_i + sum_{j in mathcal{N}(i)} alpha_{i,j} mathbf{W}_2 mathbf{x}_{j},

xi=W1xi+jN(i)αi,jW2xj,
where the attention coefficients

a

i

,

j

a_{i,j}

ai,j are computed via multi-head dot product attention:

α

i

,

j

=

softmax

(

(

W

3

x

i

)

(

W

4

x

j

)

d

)

alpha_{i,j} = textrm{softmax} left( frac{(mathbf{W}_3mathbf{x}_i)^{top} (mathbf{W}_4mathbf{x}_j)} {sqrt{d}} right)

αi,j=softmax(d

(W3xi)(W4xj))

• in_channels (int or tuple) – Size of each input sample, or -1 to derive the size from the first input(s) to the forward method. A tuple corresponds to the sizes of source and target dimensionalities.

• out_channels (int) – Size of each output sample.

• heads (int, optional) – Number of multi-head-attentions. (default: 1)

• concat (bool, optional) – If set to False, the multi-head attentions are averaged instead of concatenated. (default: True)

• beta (bool, optional) –

If set, will combine aggregation and skip information via

x

i

=

β

i

W

1

x

i

+

(

1

β

i

)

(

j

N

(

i

)

α

i

,

j

W

2

x

j

)

=

m

i

mathbf{x}^{prime}_i = beta_i mathbf{W}_1 mathbf{x}_i + (1 - beta_i) underbrace{left(sum_{j in mathcal{N}(i)} alpha_{i,j} mathbf{W}_2 vec{x}_j right)}_{=mathbf{m}_i}

xi=βiW1xi+(1βi)=mi

jN(i)αi,jW2x

j

其中：

β

i

=

sigmoid

(

w

5

[

W

1

x

i

,

m

i

,

W

1

x

i

m

i

]

)

beta_i = textrm{sigmoid}(mathbf{w}_5^{top} [ mathbf{W}_1 mathbf{x}_i, mathbf{m}_i, mathbf{W}_1 mathbf{x}_i - mathbf{m}_i ])

βi=sigmoid(w5[W1xi,mi,W1ximi])

• dropout (float, optional) – Dropout probability of the normalized attention coefficients which exposes each node to a stochastically sampled neighborhood during training. (default: 0)

• edge_dim (int, optional) –

Edge feature dimensionality (in case there are any). Edge features are added to the keys after linear transformation, that is, prior to computing the attention dot product. They are also added to final values after the same linear transformation. The model is:

x

i

=

W

1

x

i

+

j

N

(

i

)

α

i

,

j

(

W

2

x

j

+

W

6

e

i

j

)

,

mathbf{x}^{prime}_i = mathbf{W}_1 mathbf{x}_i + sum_{j in mathcal{N}(i)} alpha_{i,j} left( mathbf{W}_2 mathbf{x}_{j} + mathbf{W}_6 mathbf{e}_{ij} right),

xi=W1xi+jN(i)αi,j(W2xj+W6eij),
其中：

α

i

,

j

=

softmax

(

(

W

3

x

i

)

(

W

4

x

j

+

W

6

e

i

j

)

d

)

alpha_{i,j} = textrm{softmax} left( frac{(mathbf{W}_3mathbf{x}_i)^{top} (mathbf{W}_4mathbf{x}_j + mathbf{W}_6 mathbf{e}_{ij})} {sqrt{d}} right)

αi,j=softmax(d

(W3xi)(W4xj+W6eij))

• (default None)

• bias (bool, optional) – If set to False, the layer will not learn an additive bias. (default: True)

• root_weight (bool, optional) – If set to False, the layer will not add the transformed root node features to the output and the option beta is set to False. (default: True)

• **kwargs (optional) – Additional arguments of conv.MessagePassing.

       def __init__(
self,
in_channels: Union[int, Tuple[int, int]],
out_channels: int,
concat: bool = True,
beta: bool = False,
dropout: float = 0.,
edge_dim: Optional[int] = None,
bias: bool = True,
root_weight: bool = True,
**kwargs,
):

def forward(self, x: Union[Tensor, PairTensor], edge_index: Adj,
edge_attr: OptTensor = None, return_attention_weights=None):


THE END