Introduction to GraphSAGE
GraphSAGE (Graph Sampling and Aggregation) is a common graph neural network model mainly used for node-level representation learning. The model fuses the information of a node and its neighboring nodes together to obtain its representation representation based on a sampling and aggregation strategy, and improves the accuracy of the representation through multiple rounds of iterative updates.
Implementation steps
Data preparation
In this implementation, we still use the Cora dataset as an example for testing, and since GraphSage focuses mainly on updating single node features, no special handling of the dataset is needed here, just converting the data into PyG format.
import as F from torch_geometric.datasets import Planetoid from torch_geometric.utils import from_networkx, to_networkx # Load cora dataset dataset = Planetoid(root='./cora', name='Cora') data = dataset[0] # Convert formal diagrams into the format needed by PyG graph = to_networkx(data) data = from_networkx(graph) # Get the number of nodes and feature vector dimensions num_nodes = data.num_nodes num_features = dataset.num_features num_classes = dataset.num_classes # Create a node segmentation dataset that needs to be trained data.train_mask = (num_nodes, dtype=) data.val_mask = (num_nodes, dtype=) data.test_mask = (num_nodes, dtype=) data.train_mask[:num_nodes - 1000] = True data.test_mask[-1000:] = True data.val_mask[num_nodes - 2000: num_nodes - 1000] = True
implementation model
Next, we need to define the GraphSAGE model. Unlike traditional GCNs, which require only one layer of convolutional operations, GraphSAGE contains two layers of convolutional and sampling (also known as "aggregation") operations.
from import Sequential as Seq, Linear as Lin, ReLU from torch_geometric.nn import SAGEConv class GraphSAGE(): def __init__(self, hidden_channels, num_layers): super(GraphSAGE, self).__init__() = () for i in range(num_layers): in_channels = hidden_channels if i != 0 else num_features out_channels = num_classes if i == num_layers - 1 else hidden_channels (SAGEConv(in_channels, out_channels)) def forward(self, x, edge_index): for _, conv in enumerate([:-1]): x = (conv(x, edge_index)) # The last layer doesn't use an activation function x = [-1](x, edge_index) return F.log_softmax(x, dim=-1)
In the above code, we implemented multilayer GraphSAGE convolution and the corresponding aggregation function, and used ReLU and softmax functions for feature extraction and output of classification scores.
model training
After defining the model, we can start to train the model for Cora dataset. First of all, you still need to specify the optimizer and loss function first, and set some parameters for recording the information during the training process, such as Epochs, Batch size, learning rate, and so on.
# Initialize GraphSage and specify parameters num_layers = 2 hidden_channels = 256 model = GraphSAGE(hidden_channels, num_layers).to(device) optimizer = ((), lr=0.01) loss_func = () # Training process for epoch in range(500): () optimizer.zero_grad() out = model((device), data.edge_index.to(device)) loss = loss_func(out[data.train_mask], (device)[data.train_mask]) () () # Check the accuracy at each test stage if epoch % 10 == 0: with torch.no_grad(): _, pred = model((device), data.edge_index.to(device)).max(dim=1) correct = float(pred[data.test_mask].eq((device)[data.test_mask]).sum().item()) acc = correct / data.test_mask.sum().item() print("Epoch {:03d}, Train Loss {:.4f}, Test Acc {:.4f}".format( epoch, (), acc))
In the above code, we fit the GraphSAGE model using labeled training data, test the accuracy at various validation stages, and optimize the loss function by gradient descent.
Above is Pytorch+PyG to achieve GraphSAGE process example details, more information about Pytorch PyG to achieve GraphSAGE please pay attention to my other related articles!