# Shallow Node Embeddings

In this tutorial, we will take a closer look at how to learn *shallow node embeddings* in an unsupervised fashion via PyG.

## Introduction

The key difference between *shallow* node embeddings (*e.g.,* `Node2Vec`

) and *deep* node embeddings (*e.g.,* GNNs) is the choice of the encoder \(\textrm{ENC}(v, \mathcal{G}) = \mathbf{z}_v \in \mathbb{R}^d\).
Specifically, shallow node embedding techniques rely on embedding nodes into low-dimensional vectorial representations \(\mathbf{z}_v\) via a *shallow embedding lookup table* such that the likelihood of preserving neighborhoods is maximized, *i.e.* nearby nodes should receive similar embeddings while distant nodes should receive distinct embedding.
These techniques generalize the famous SkipGram model for obtaining low-dimensional word embeddings, in which sequences of words are now interpreted as sequences of nodes, *e.g.*, given via randomly-generated walks:

Specifically, given a *random walk* \(\mathcal{W} = (v_{\pi(1)}, \ldots, v_{\pi_(k)})\) of length \(k\) starting at node \(v \in \mathcal{V}\), the objective is to maximize the likelihood of observing node \(v_{\pi(i)}\) given node \(v\).
This objective can be efficiently trained via stochastic gradient descent in a contrastive learning scenario

in which non-existent walks (so called *negative examples*) are sampled and trained jointly, and \(\sigma\) denotes the \(\textrm{sigmoid}\) function.
Noteworthy, the dot-product \(\mathbf{z}_v^{\top} \mathbf{z}_w\) between the embeddings is usually used to measure similarity, but other similarity measures are applicable as well.

Importantly, shallow node embeddings are trained in an unsupervised fashion, and can eventually be used as input for a given down-stream task, *e.g.*, in node-level tasks \(\mathbf{z}_v\) can directly be used as input to a final classifier.
For edge-level tasks, edge-level representations can be obtained via averaging \(\frac{1}{2} (\mathbf{z}_v + \mathbf{z}_w)\) or via the Hadamard product \(\mathbf{z}_v \odot \mathbf{z}_w\).

Despite the simplicity of node embedding techniques, they are also subject to certain shortcomings. In particular, they fail to incorporate rich feature information attached to nodes and edges, and cannot be trivially applied to unseen graphs as learnable parameters are fixed to the nodes of a particular graph (making this approach transductive by nature and hard-to-scale due to the \(\mathcal{O}(|\mathcal{V}| \cdot d)\) parameter complexity). However, it is still a commonly used technique to preserve structural graph information into fixed-size vectors, and is often times also used to generate inputs to GNNs for further processing in case the initial set of node features is not rich.

## Node2Vec

Note

In this section of the tutorial, we will learn node embeddings for **homogenous graphs** using the `Node2Vec`

module of PyG.
The code is available in examples/node2vec.py and as a Google Colab tutorial notebook.

`Node2Vec`

is a method for learning shallow node embeddings, which allows for flexible
control of random walk procedures based on breadth-first or depth-first samplers.
In particular, its parameter `p`

dictates the likelihood of immediately revisiting a node in the walk, while its parameter `q`

interpolates between breadth-first and depth-first strategies.

To begin the example, let us load in the needed packages and the data that we will be working with:

```
from torch_geometric.nn import Node2Vec
data = Planetoid('./data/Planetoid', name='Cora')[0]
```

We are now ready to initialize our `Node2Vec`

module:

```
import torch
from torch_geometric.nn import Node2Vec
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model = Node2Vec(
data.edge_index,
embedding_dim=128,
walks_per_node=10,
walk_length=20,
context_size=10,
p=1.0,
q=1.0,
num_negative_samples=1,
).to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
```

`Node2Vec`

takes the graph structure `edge_index`

as input (but none of its feature information), the `embedding_dim`

of the shallow embeddings, and additional parameters to control the random walk and negative sampling procedures.
In particular, `walks_per_node`

and `walk_length`

specify the number of walks to perform for each node and their length, respectively.
The `context_size`

then denotes how many nodes in the walk are actually used for gradient optimization, *i.e* `Node2Vec`

slides over each sampled walk and splits them into windows of size `context_size`

.
As previously mentioned, `p`

and `q`

denote how random walks are generated.
Finally, `num_negative_samples`

specifies how many negative walks we want to generate for each positive walk.

After initializing, we can go ahead and train our `Node2Vec`

model right away.
We start this by creating a data loader that will generate positive and negative random walks for us:

```
loader = model.loader(batch_size=128, shuffle=True, num_workers=4)
```

To generate random walks, we can simply iterate over the data loader, *e.g.*:

```
pos_rw, neg_rw = next(iter(loader))
```

Here, `pos_rw`

will contain the node indices of positive random walks and `neg_rw`

will contain the node indices of negative walks.
In particular, `pos_rw`

is a two-dimensional matrix of shape `[batch_size * walks_per_node * (2 + walk_length - context_size), context_size]`

, and `neg_rw`

is a two-dimensional matrix of shape `[num_negative_samples * pos_rw.size(0), context_size]`

.

Using this `loader`

and the built-in constrastive `loss()`

function, we can define our `train()`

function as follows:

```
def train():
model.train()
total_loss = 0
for pos_rw, neg_rw in loader:
optimizer.zero_grad()
loss = model.loss(pos_rw.to(device), neg_rw.to(device))
loss.backward()
optimizer.step()
total_loss += loss.item()
return total_loss / len(loader)
```

After finishing training, we can obtain the final node embeddings from the model as follows:

```
z = model() # Full node-level embeddings.
z = model(torch.tensor([0, 1, 2])) # Embeddings of first three nodes.
```

## MetaPath2Vec

Note

In this section of the tutorial, we will learn node embeddings for **heterogenous graphs** using the `MetaPath2Vec`

module of PyG.
The code is available as examples/hetero/metapath2vec.py and as a Google Colab tutorial notebook.

An extension of `Node2Vec`

to *heterogeneous graphs* is the `MetaPath2Vec`

model.
`MetaPath2Vec`

works similar to `Node2Vec`

but expects a dictionary of edge indices as input (holding the `edge_index`

for each edge type in the graph), and samples random walks based on a given `metapath`

formulation, *e.g.*,

```
metapath = [
('author', 'writes', 'paper'),
('paper', 'published_in', 'venue'),
('venue', 'publishes', 'paper'),
('paper', 'written_by', 'author'),
]
```

denotes that random walk sampling is performed from author nodes to paper nodes to venue nodes back to paper nodes and author nodes.
Otherwise, initialization and training of the model stays the same as in the `Node2Vec`

case.