SoFunction
Updated on 2024-11-16

Pytorch PyG implementation of EdgePool graph classification

Introduction to EdgePool

EdgePool is a Convolutional Neural Network (CNN) model for graph classification. Its main idea is to optimize the image size through edge pooling up and down sampling to reduce the space complexity and improve the classification performance.

Implementation steps

Data preparation

Generally speaking, when building larger scale datasets, we need to normalize, normalize and clean the data for subsequent semantic analysis or deep learning operations. In the case of image datasets, this is done using a specific framework or tool library.

# Import MNIST datasets
from torch_geometric.datasets import MNISTSuperpixels
# Load data, divide training and test sets
dataset = MNISTSuperpixels(root='./mnist', transform=Compose([ToTensor(), NormalizeMeanStd()]))
data = dataset[0]
# Define hyperparameters
num_features = dataset.num_features
num_classes = dataset.num_classes
# Build training set and test set index files
train_mask = (data.num_nodes, dtype=torch.uint8)
train_mask[:60000] = 1
test_mask = (data.num_nodes, dtype=torch.uint8)
test_mask[60000:] = 1
# Create data loaders
train_loader = DataLoader(data[train_mask], batch_size=32, shuffle=True)
test_loader = DataLoader(data[test_mask], batch_size=32, shuffle=False)

implementation model

In defining the EdgePool model, we need to reconsider the up- and down-sampling operations in the structure of the network in order to give the whole network more expressive power to learn more complex relationships.

from  import Linear
from torch_geometric.nn import EdgePooling
class EdgePool():
    def __init__(self, dataset):
        super(EdgePool, self).__init__()
        # of input and output dimensions defined
        self.input_dim = dataset.num_features
        self.hidden_dim = 128
        self.output_dim = 10
        # Define convolutional, normalization, and pooling layers, etc.
        self.conv1 = GCNConv(self.input_dim, self.hidden_dim)
        self.norm1 = BatchNorm1d(self.hidden_dim)
        self.pool1 = EdgePooling(self.hidden_dim)
        self.conv2 = GCNConv(self.hidden_dim, self.hidden_dim)
        self.norm2 = BatchNorm1d(self.hidden_dim)
        self.pool2 = EdgePooling(self.hidden_dim)
        self.conv3 = GCNConv(self.hidden_dim, self.hidden_dim)
        self.norm3 = BatchNorm1d(self.hidden_dim)
        self.pool3 = EdgePooling(self.hidden_dim)
         = (self.hidden_dim, self.output_dim)
    def forward(self, x, edge_index, batch):
        x = (self.norm1(self.conv1(x, edge_index)))
        x, edge_index, _, batch, _ = self.pool1(x, edge_index, None, batch)
        x = (self.norm2(self.conv2(x, edge_index)))
        x, edge_index, _, batch, _ = self.pool2(x, edge_index, None, batch)
        x = (self.norm3(self.conv3(x, edge_index)))
        x, edge_index, _, batch, _ = self.pool3(x, edge_index, None, batch)
        x = global_mean_pool(x, batch)
        x = (x)
        return x

In the above code, we used different blocks of neural network functions such as convolutional, pooling and fully connected layers to build the EdgePool model. Among them, each GCNConv layer is kept as a hidden size of 128; BatchNorm1d is a method designed to increase the convergence speed and enhance the generalization ability of the network; EdgePooling is a special class attached to GraphConvolution, which downsamples a given graph to half of its size and returns the reduced graph with two tracking full-graph-to-pool bi-directional mappings (keep and senders) of edge indexes (edgendarcs). In this case passingNone It shows thatbatch No change.

model training

After defining the EdgePool network structure, it is necessary to specify the appropriate optimizer, loss function, and control the hyperparameters such as the number of training rounds, batch size and learning rate. It is also necessary to record a large amount of log information for later tracking and drivers.

# Define the training plan, including loss function, optimizer, and number of iterations, etc.
train_epochs = 50
learning_rate = 0.01
criterion = ()
optimizer = (edge_pool.parameters(), lr=learning_rate)
losses_per_epoch = []
accuracies_per_epoch = []
for epoch in range(train_epochs):
    running_loss = 0.0
    running_corrects = 0.0
    count = 0.0
    for samples in train_loader:
        optimizer.zero_grad()
        x, edge_index, batch = , samples.edge_index, 
        out = edge_pool(x, edge_index, batch)
        label = 
        loss = criterion(out, label)
        ()
        ()
        running_loss += () / len(train_loader.dataset)
        pred = (dim=1)
        running_corrects += (label).sum().item() / len(train_loader.dataset)
        count += 1
    losses_per_epoch.append(running_loss)
    accuracies_per_epoch.append(running_corrects)
    if (epoch + 1) % 10 == 0:
        print("Train Epoch {}/{} Loss {:.4f} Accuracy {:.4f}".format(
            epoch + 1, train_epochs, running_loss, running_corrects))

During the training process, we traversed each batch of data and optimized it by back propagation algorithm and updated the loss and accuracy output values. At the same time, to facilitate visualization and recording, we need to output the loss and accuracy of the training process into the corresponding containers for later analysis and processing.

Above is the detailed content of Pytorch PyG realize EdgePool graph classification, more information about Pytorch PyG EdgePool graph classification please pay attention to my other related articles!