Understanding the basics of CNN with image classification.

CIFAR -10 datasetA breakthrough in building models for image classification came with the discovery that a convolutional neural network(CNN) could be used to progressively extract higher- and higher-level representations of the image content. Instead of preprocessing the data to derive features like textures and shapes, a CNN takes the image’s raw pixel data as input and “learns” how to extract these features, and ultimately infer what object they constitute.In this article, we will learn the basic concepts of CNN and then implementing them on a multiclass image classification problem. We will also discuss in detail- how the accuracy and performance of a model can be further improved.Trending AI Articles:1. Neural networks for solving differential equations2. Turn your Raspberry Pi into homemade Google Home3. Keras Cheat Sheet: Neural Networks in Python4. Making a Simple Neural NetworkTo start with, the CNN receives an input feature map: a three-dimensional matrix where the size of the first two dimensions corresponds to the length and width of the images in pixels. The size of the third dimension is 3 (corresponding to the 3 channels of a color image: red, green, and blue). The CNN comprises a stack of modules, each of which performs three operations.Convolution + Relu.Mathematically, the convolution operation is the summation of the element-wise product of two matrices. Let’s take two matrices, X and Y. If you ‘convolve the image X using filter Y’, this operation will produce the matrix Z.Element wise Summation.Finally, you compute the sum of all the elements in Z to get a scalar number, i.e. 3+4+0+6+0+0+0+45+2 = 60.Convolution(Conv) operation (using an appropriate filter) detects certain features in images, such as horizontal or vertical edges. For example- in the image given below, in the convolution output using the first filter, only the middle two columns are nonzero while the two extreme columns (1 and 4) are zero. This is an example of vertical edge detection.Similarly above filter with 1’s placed horizontally and 0s in the middle layer can be used for horizontal edge detection.Image convolution with filter.During Convolution, Image(224*224*3) is convolved with a 3*3 filter and a stride of 1, to produce 224*224 array-like shown below.Convolved imageThe o/p(24*24)is passed to the Relu activation function to remove the non-linearity and produces feature maps(24*24) of the image.2. Pooling+ReluPooling + Relu layerThe pooling layer looks at larger regions (having multiple patches) of the image and captures an aggregate statistic (max, average, etc.) of each region to make the n/w invariant to local transformations.The two most popular aggregate functions used in pooling are ‘max’ and ‘average’.Max pooling: If any one of the patches says something strongly about the presence of a certain feature, then the pooling layer counts that feature as ‘detected’.Average pooling: If one patch says something very firmly but the other ones disagree, the pooling layer takes the average to find out.3. Fully Connected(FC) layerThe o/p of a pooling layer is flattened out to a large vector. It contains a softmax activation function, which outputs a probability value from 0 to 1 for each of the classification labels the model is trying to predict.summing up above points, the final convolutional neural network looks like -CNN networkFor more details on the above, please refer to here.Best practices for training Convolutional Neural network:There are various techniques used for training a CNN model to improve accuracy and avoid overfitting.Regularization.For better generalizability of the model, a very common regularization technique is used i.e. to add a regularization term to the objective function. This term ensures that the model doesn’t capture the ‘noise’ in the dataset or does not overfit the training data.Objective function = Loss Function (Error term) + Regularization termHence the objective function can be written as:Objective function = L(F(xi),θ) + λf(θ)where L(F(xi),θ) is the loss function expressed in terms of the model output F(xi) and the model parameters θ. The second term λf(θ) has two components — the regularization parameter λ and the parameter norm f(θ).There are broadly two types of regularization techniques(very similar to one in linear regression) followed in CNN:L1 norm: λf(θ) = ||θ||1 is the sum of all the model parametersL2 norm: λf(θ) = ||θ||2 is the sum of squares of all the model parameters2. Dropout.A dropout operation is performed by multiplying the weight matrix Wl with an α mask vector as shown below.Then, the shape of a vector α will be (3,1). Now if the value of q(the probability of 1) is .66, the α vector will have two 1s and one 0.Hense, the α vector can be any of the following three: [1 1 0] or [1 0 1] or [0 1 1].One of these vectors is then chosen randomly in each mini-batch. Let’s say that, in some mini-batch, the mask α=[1 1 0] is chosen. Hence, the new(generalized) weight matrix will be:Dropouts.All elements in the last column become zero. Thus few neurons(shown in the image below) which were of less importance are discarded, making the network to learn more robust features and thus reducing the training time for each epoch.Dropout in a neural network.3. Batch Normalization.This technique allows each layer of a neural network to learn by itself a little bit more independently of other previous layers. For example- In a feed-forward neural networkh4=σ(W4.h3+b4)=σ(W4.(σ(W3.(σ(W2.(σ(W1.x+b1))+b2))+b3))+b4)h4 is a composite function of all previous networks(h1,h2,h3). Hense when we update the weights (say) W4, it affects the output h4, which in turn affects the gradient ∂L/∂W5. Thus, the updates made to W5 should not get affected by the updates made to W4.Thus Batch normalization is performed on the output of the layers of each batch, H(l). O/p layer is normalized by the mean vector μ and the standard deviation vector ^σ computed across a batch.Understanding the above techniques, we will now train our CNN on CIFAR-10 Datasets.Data:CIFAR-10 dataset has 10 classes of 60,000 RGB images each of size (32, 32, 3). The 10 classes are an airplane, automobile, bird, cat, deer, dog, frog, horse, ship and truck. This dataset can be downloaded directly through the Keras API.Problem statement:To experiment with hyperparameters and architectures (mentioned above) for better accuracy on the CIFAR dataset and draw insights from the results.Adding and removing dropouts in convolutional layersBatch Normalization (BN)L2 regularisationIncreasing the number of convolution layersIncreasing the number of filters in certain layersApproach:Initially, to start with, we have a simple model with dataset set to train and test expected to run for 100 epochs and classes set to 10. A simple sequential network is built with 2 convolution layers having 32 feature maps each followed by the activation layer and pooling layer.Dropouts After Conv and FC layersA dropout of .25 and .5 is set after convolution and FC layers. A Training accuracy of 84% and a validation accuracy of 79% is achieved.2. Remove the dropouts after the convolutional layers (but retain them in the FC layer) and use the batch normalization(BN) after every convolutional layer.Training accuracy ~98% and validation accuracy ~79%. This is a case of overfitting now as we have removed the dropouts. With high training accuracy, we can say that the dataset has learned the data.3. Use dropouts after Conv and FC layers, use BN:Training accuracy ~89%, validation accuracy ~82%Significant improvement in validation accuracy with the reduced difference between training and test. We can say that our model is being able to generalize well.4. Remove dropouts from Conv layers, use L2 + dropouts in FC, use BN:Training accuracy ~94%, validation accuracy ~76%.A significant gap between training and test dataset is found. L2 regularization is only trying to keep the redundant weights down but it’s not as effective as using the dropouts alone.5. Dropouts after Conv layer, L2 in FC, use BN after convolutional layerTrain accuracy ~86%, validation accuracy ~83%The gap has reduced and the model is not overfitting but the model needs to be complex to classify images correctly. Hence we shall add more layers as we go forward.6. Add a new convolutional layer to the network.Along with regularization and dropout, a new convolution layer is added to the network.Train accuracy ~89%, validation accuracy ~84%Though training and validation accuracy is increased but adding an extra layer increases the computational time and resources.7. Adding feature maps.Add more feature maps to the Conv layers: from 32 to 64 and 64 to 128.Instead of adding an extra layer, we here add more feature maps to the existing convolutional network. The choice between the above two is situational.Add an extra layer when you feel your network needs more abstraction.Add more feature maps when the existing network is not able to grasp existing features of an image like color, texture well.Train accuracy ~92%, validation accuracy ~84%Though the accuracy is improved, the gap between train and test still reflects overfitting.On adding more feature maps, the model tends to overfit (compared to adding a new convolutional layer). This shows that the task requires learning to extract more (new) abstract features- by adding more complex dense network, rather than trying to extract more of the same features.Conclusion:The performance of CNNs depends heavily on multiple hyperparameters — the number of layers, number of feature maps in each layer, the use of dropouts, batch normalization, etc. Thus, it’s advisable to first fine-tune your model hyperparameters by conducting lots of experiments. Once the right set of hyperparameters are found, the model should be trained with a larger number of epochs.The source code that created this post can be found here. I would be pleased to receive feedback or questions on any of the above.Don’t forget to give us your 👏 !https://medium.com/media/c43026df6fee7cdb1aab8aaf916125ea/hrefUnderstanding the basics of CNN with image classification. was originally published in Becoming Human: Artificial Intelligence Magazine on Medium, where people are continuing the conversation by highlighting and responding to this story.

Comments

There's unfortunately not much to read here yet...

Discover the Best of Machine Learning.

Ever having issues keeping up with everything that's going on in Machine Learning? That's where we help. We're sending out a weekly digest, highlighting the Best of Machine Learning.

Join over 700 Machine Learning Engineers receiving our weekly digest.

Best of Machine LearningBest of Machine Learning

Discover the best guides, books, papers and news in Machine Learning, once per week.

Twitter