We first map the node feature to the embedding space \((1)\),
then compute message passing T times.
In each message passing step, we compute the message from neighboring nodes \((2)\)
and mean pool all nodes with the same program type as the master node embedding \((3)\).
Lastly, we update the node embeddings with residual learning to avoid gradient vanishing \((4)\).
After T = 5 steps of message passing, the final embedding of program node \(i\) is denoted as \(x^T_i\).
\[
\begin{align}
\,\\
x^0_i &= MLP^p_{enc}([x_i, z^p_i, F]) \tag{1} \\\\
m^t_i &= \frac{1}{|Ne(i)|} \sum_{j \in Ne(i)} MLP^p_{message}([x^t_i, x^t_j]) \tag{2} \\\\
c^t_i &= Mean_{j \in Cl(i)}(\{x^t_j\}) \tag{3} \\\\
x^{t+1}_i &= x^t_i + MLP^p_{update}([x^t_i, m^t_i, r_{Cl(i)}c^t_i, F]) \tag{4}
\,\\
\end{align}
\]