This post will show how to create a CNN feature extractor (VGG_CNN_S). I have been thinking about this kind of a project for a long time and now I had time to make it real. You might want to take quick looks at my previous parts of this deep learning series: Part 1: setting up the environment and Part 2: run a pretrained CNN model.
How does a CNN feature extractor work?
Normally a neural net works somehow like this:
- Feed in the photo
- Let the data flow through the model layers
- Get a label or probabilities for labels.
A CNN feature extractor works a bit differently:
- Feed in the photo (as always)
- Let the data flow through some of the layers (not all)
- Get values from intermediate layers (as features)
- Continue with your own application.
So we will tap into the middle of the network. Data flows through the network layer by layer. However, we do not let it go all the way. Get the data out when it is still raw. This data can be used as features for your own applications.
How was it done?
I started out with a working pipeline, changed minor details and did a lot of sanity checks. At the end I created functions to extract features from any layer on any Lasagne model.
Let’s start of by looking at what the model looks like. The data flows down in this picture.
The first layer accepts inputs, the last layer outputs label probabilities. The AI magic happens in between. It is interesting to compare the model instance with the model definition. “Just” stack the layers and be done with it 😀
I tapped into the fully connected layers “fc6” and “fc7” (see the model definition) before the output. I used two separate runs to get the features, one run for “fc6” and another for “fc7”. Red arrows in the picture below illustrate where the features got out.
Both layers output 4096 dimensional scarce features (a lot of zeros). Only about 1/4 of the values were non-zeros. These dimensions match with the model definition.
Developing a custom image recognition application would continue by training and developing a model based on these features and application’s target labels. In the application you should use photos similar to the photos, which you or someone used to train the pretrained model.
Thank you for reading and keep it up!
Ps. It took approx. 20h to get everything up and running. Sooo, I hope you learned something and got new ideas out of it! 🙂