The preliminary results of network training on Mark's computer are shown below. This graph shows the accuracy of the networks compared to how long the network trained (the x-axis is epochs trained). Each network records the accuracy of both the training and validation dataset. The first network trained was "top-level", the orange and dark blue lines, which took 3h37m to train to 20 epochs (before the process was using the GPU). Next, the network "test" was when we got the GPU processing all the data, which reduced the training time down to a mere 31m! Finally, the "non-frozen" layer was a different training schema used, which will be explained later in this post.
Overfitting
With these two networks, however, it was clear that 20 epochs of training was not necessary. One of the most important qualities of a neural network is its ability to generalize on data it has not previously seen, which is why it is so important to have a training and validation set to compare. Take the "test" network, where the training accuracy is shown by the red line, and the validation accuracy is shown by the light blue line. The accuracy of the network on both sets continues to improve, but the accuracy of the training set surpasses that of the validation set around epoch 4, and continues to improve more rapidly than the validation set in epochs 8-20.
This kind of behaviour points to something called "overfitting" - where the network is no longer learning patterns, but in fact just learning the specific qualities of each of the images in the training dataset. Because of this, the accuracy of the network improves significantly on the training images, but not so much on new images (such as those in the validation set). Therefore, the networks have overfit by epoch 20, and the discrepancy between training and validation accuracy indicates that these network would not perform well on new data.
"Non-frozen"
Now, I know you've been dying to find out about what these other, significantly steeper pink and green lines are. Well, I won't keep you in suspense any longer, this network is called "non-frozen", and this refers to the layers in MobileNet V2. With the previous two networks, the only weight updates that were being made were on the few layers superimposed on top of MobileNet V2 (see this previous blog post for the test architecture used). However, thanks to the power of TensorFlow, it is extremely easy to actually allow backpropagation to continue to update even the pre-existing MobileNet V2 weights. "Un-freezing" the last 100 layers of MobileNet V2 (yes, it's huge) gave us the accuracy seen in the pink and green lines on the graph. And, like the previous networks, even this showed signs of overfitting after 3 epochs, so we stopped training there.
This is just a beginning step, but it is a promising one. Now, we can focus on amassing exorbitant amounts of data from the internet to train the most accurate produce-recognizing system this world has ever seen. Or at least one that's >90% accurate. Whichever comes first.
Comments