How Do I Modify This Pytorch Convolutional Neural Network To Accept A 64 X 64 Image And Properly Output Predictions?

January 24, 2024 Post a Comment

I took this convolutional neural network (CNN) from here. It accepts 32 x 32 images and defaults to 10 classes. However, I have 64 x 64 images with 500 classes. When I pass in 64 x

Solution 1:

The problem is an incompatible reshape (view) at the end.

You're using a sort of "flattening" at the end, which is different from a "global pooling". Both are valid for CNNs, but only the global poolings are compatible with any image size.

The flattened net (your case)

In your case, with a flatten, you need to keep track of all image dimensions in order to know how to reshape at the end.

So:

Enter with 64x64
Pool1 to 32x32
Pool2 to 16x16
Pool3 to 8x8
AvgPool to 2x2

Then, at the end you've got a shape of (batch, 128, 2, 2). Four times the final number if the image were 32x32.

Then, your final reshape should be output = output.view(-1,128*2*2).

This is a different net with a different classification layer, though, because in_features=512.

The global pooling net

On the other hand, you could use the same model, same layers and same weights for any image size >= 32 if you replace the last pooling with a global pooling:

defflatChannels(x):
    size = x.size()
    return x.view(size[0],size[1],size[2]*size[3])

defglobalAvgPool2D(x):        
    return flatChannels(x).mean(dim=-1)

defglobalMaxPool2D(x):
    return flatChannels(x).max(dim=-1)

The ending of the model:

#removed the pool from here to put it in forward
    self.net = nn.Sequential(self.unit1, self.unit2, self.unit3, self.pool1, self.unit4, 
                             self.unit5, self.unit6, self.unit7, self.pool2, self.unit8, 
                             self.unit9, self.unit10, self.unit11, self.pool3, 
                             self.unit12, self.unit13, self.unit14)

    self.fc = nn.Linear(in_features=128,out_features=num_classes)


defforward(self, input):
    output = self.net(input)
    output = globalAvgPool2D(output) #or globalMaxPool2D
    output = self.fc(output)
    return output

Solution 2:

You need to use transforms module before trainig neural network (here is the link https://pytorch.org/docs/stable/torchvision/transforms.html).

You have a few options:

transforms.Resize(32),
transforms.ResizedCrop(32) - most preferable, because you can augment your data and prevent overfitting in some respect via this way.
transforms.CenterCrop(32), etc.

Moreover, you could compose transforms objects into one object via transforms.Compose).

Enjoy.

PS. Of course, you can refactor your Neural Network architecture, enabling it to take images of size 64 x 64.

Python Courses, Training, and Tutorials

How Do I Modify This Pytorch Convolutional Neural Network To Accept A 64 X 64 Image And Properly Output Predictions?

Solution 1:

The flattened net (your case)

The global pooling net

Solution 2:

Post a Comment for "How Do I Modify This Pytorch Convolutional Neural Network To Accept A 64 X 64 Image And Properly Output Predictions?"