What is the Yelp Restaurant Challenge?
The challenge is to correctly predict the labels that are assigned to business establishments (such as restaurants) by users as posted on Yelp.com based on photos taken at that establishment. For example, the establishment corresponding to the picture below has the labels "Takes reservations", "Has Alcohol" and "Has Table Service".
There are 9 labels, each establishment can be associated with one or more labels, and each establishment can be associated with one or more photos. Thus, given the photo(s) of an establishment, the challenge is the predict the labels of that establishment. For more details, visit the Kaggle page of the challenge.
Why would Yelp be interested in this?
Yelp is interested in being able to deduce what labels would be relevant to an establishment from photos without needing users to label them. I saw this as an oppportunity to apply convolutional neural networks to tackle this challenge.
Show me some results!
My best model achieved an F1 Score of 0.75, placing it in the top 50 (peak) in the Kaggle competition leaderboards.
One interesting result was that the model can qualitatively distinguish lunch photos from dinner photos. We define a photo as being in the DINNER cluster if its labels are dinner-related. Similarly for the LUNCH cluster.
Some sample photos belonging to the DINNER cluster as predicted by the model are shown below.
Some sample photos belonging to the LUNCH cluster as predicted by the model are shown below.
Below is a plot of test set pictures according to whether they are DINNER or LUNCH pictures as predicted by the model. The plot is created using the t-SNE method for dimension reduction .
How did you do this?
If you want the full details, please consult my final report or my poster for this project.
The model is a pre-trained VGG-16 network that we perform transfer learning on. The Backpropagation Multi-Label Learning Loss is used to handle the Multi-Label Classification nature of the challenge.
The general idea is to use the convolutional network to predict a label for each photo of an establishment. We then calculate a mean label vector for the establishment from the predicted labels of the establishment's photos. The predicted labels for the establishment come from thresholding the mean label vector.