VGG16-Street10 - Street-Feature Analysis with Keras

About

VGG16-Street10 classifies "streets" of the 10 famous cities in the world, and also can be used for "street-feature analysis".

This repository mainly contains the 3 parts following:

Dataset:
- Scripts collecting and preprocessing "streets" images of the 10 cities in the world
- Street10, the dataset itself
Street classifier:
- Jupyter-notebooks (and scripts used in them) for training and testing various CNNs
- VGG16-Street10, CNN that achieved the best classification results
Street-feature analyzer:
- Command-line program that takes street images as input and classifies them and visualizing their characteristic "features"

What is possible with VGG16-Street10 ?

Background

Haven't you ever felt "the atmosphere of a city" when seeing its street ? (Recall when you get back to your mother town after a long journey in foreign countries ...)

For example, even though the two images below seems rather similar, but they must invoke really different impressions for those who have ever lived both in Tokyo and Vancouver (at least very different for me).

Tokyo	Vancouver

(Images are downloaded via Google Image Search with usage right labelled for noncommercial reuse with modification)

So, what's the difference ? What makes the left more "Tokyo-like" and the right "Vancouver-ish" ? These're why I decided to launch this project.

Methods and details

Dataset
- How created
  - Collecting street images with its location labelled: Google Image Search & icrawler
    - e.g.) Collect street images of Tokyo with search-keywords: "Tokyo street -fashion -food -painting -art -graffiti"
    - with GoogleImageCrawler(filter={license=noncommercial,modify}) in order to only collect images labelled for noncommercial reuse with modifications
  - Preprocessing: Remove duplicates, cropping margins
  - Selection: Author's visual judgement of removing images that seem incorrectly labelled or not suit to the concepts of "street" in this project
    - Yes, this process must be a lot problematic. See Questions section below
- Details of dataset: Street10
  - Total number of images: 4,151
  - Data format: .npy file (zipped 748 MB), including array of 224 x 224 x 3 size images each
  - Class list: Western/Eastern, 10 cities: the number of samples
    - Eastern: 1997 images
      - Beijing: 434 images
      - Kyoto: 553 images
      - Seoul: 353 images
      - Singapore: 275 images
      - Tokyo: 382 images
    - Western: 2154 images
      - London: 482 images
      - Moscow: 379 images
      - NYC: 503 images
      - Paris: 436 images
      - Vancouver: 354 images
Street Classifier
- How to train
  - Transfer-learning & Fine-tuning of VGG16-Places365 (VGG16-architecture CNN trained on Places-Dataset)
    - Compared: Baselines(SVM on extracted deep-features of the last layer of VGG16-Places365), ImageNet pre-trained CNN, Randomly-initialized network
  - Computation resources: GPUs offered within Google Colaboratory
- Details of VGG16-Street10 :
  - Fine-tuned CNN of VGG16-Places365
  - Trained layers: The final dense layers & The last convolutional block
  - The number of parameters
    - Trained: 7,410,572
    - Non-trained: 7,636,544
  - Outputs
    - Output1: "Eastern or Western classification" (binary class)
    - Output2: "City classification" (10 classes)
  - File type: .h5 file (Keras-format)
Street-feature analyzer
- Method: (Guided) Grad-CAM (Attribution analysis)
- We can interpret the objects and structures in the highly-attributed (salient) area of an input image by the method as distinctive elements of the street, which are important for the CNN to make the prediction

Overview of results

Classification metrics (accuracy and normalized confusion matrix) against whole 421 test samples

Output 1: Eastern/Western classification	Output 2: 10 cities classification

Results against the example images shown above

	Tokyo	Vancouver
Classification	`[Process 1]: Prediction 1` `- Eastern: 78.714 %` `- Western: 21.286 %` `[Process 1]: Prediction 2 for top 3 cities` `- Tokyo : 80.814 %` `- Kyoto : 12.945 %` `- London : 3.568 %`	`[Process 2]: Prediction 1` `- Western: 99.856 %` `- Eastern: 0.144 %` `[Process 2]: Prediction 2 for top 3 cities` `- Vancouver: 94.909 %` `- NYC : 4.815 %` `- London : 0.166 %`
Street-feature analysis

As for the two images above, VGG16-Street10 could make good predictions !
And from the Guided Grad-CAM results, we could do "street-feature analysis" like ...

Tokyo street has
- crowded objects
- a lot of advertising displays
- narrow road
Vancouver street has
- more open space with larger sky
- characteristic trash box like stuffs
- wider road

I think this kind of analysis can be useful in city development or landscape conservation or so.
(Not satisfied just with this samplewise analysis ? See Questions and Future works sections for the other ideas)

How to use

Set-up

First you have to make Git, Python and the libraries below available on your machine. My versions are shown after colons.
- Python: sys.version == '3.6.8 |Anaconda, Inc.| (default, Feb 21 2019, 18:30:04) [MSC v.1916 64 bit (AMD64)]'
- TensorFlow: tensorflow.__version__ == 1.10.0
- Keras: keras.__version__ == '2.2.2'
- Matplotlib: matplotlib.__version__ == '2.2.2'
- OpenCV: cv2.__version__ == '3.4.1'
- Pillow: PIL.__version__ == 5.4.1
Clone this repository: git clone https://github.com/aviatesk/street-feature-analysis.git
- Via SSH: git clone git@github.com:aviatesk/street-feature-analysis.git
Run run.py and check if the demo works successfully: python run.py

Street classifying and Street-feature analysis

Now you can classify and analyze any .jpg or .png format street image you want. For example, you can re-produce the results above with this command.

python run.py assets/tokyo.jpg assets/vancouver.jpg

run.py also can take online images with their url given (Guided-backpropagation is disabled with online images by default).

python run.py https://wiki.samurai-archives.com/images/b/b9/Shinmachi-nishikikoji.jpg

If you want to only see Grad-CAM results, your can use --only-cam option.

python run.py --only-cam assets/tokyo.jpg assets/vancouver.jpg

Run python run.py -h to see the details about the other options.

Use Street10

You can get Street10 at this link: https://drive.google.com/file/d/1qOIKSr2LA9TbBYf96y1Wr0KrQKBGmHQp/view?usp=sharing (anyone who knows this link can download it).
Unzip Street10.zip and then you would get .npy files each of which keeps each city's street images as an n x 224 x 224 x 3 array.

You may find load_data.py useful to load and see the data, or to convert the data into .png format and make the directory for using keras.preprocessing.image.ImageDataGenerator.flow_from_directory.

Re-run Jupyter notebooks on Colaboratory with GPU

If you want to re-run Jupyter-notebooks in notebooks directory, that train and test VGG16-Street10 and the other many alternative models, you should follow the steps below.

Make Google Drive available with your Google account
Move this repository directory under your "My Drive" (Upload from browser or via Backup and Sync from Google)
Change the name of the directory from "My Drive"/street-feature-analysis (original) to "My Drive"/prj
Move and rename Street10.zip to "My Drive"/prj/data/processed/processed_data.zip
Open "My Drive"/prj/notebooks/notebooks.ipynb within Colaboratory and run code cells with turning its GPU on

Questions

Here are two big possible questions you may have about this project and my answers for them.

To what extent can I trust the dataset Street10 ?
- Unfortunately, the quality of Street10 may not be too high, mainly in terms of its accuracy or consistency. There are two main reasons for this. One is that I could not completely confirm a street is really in the city while my own visual judgement. Second is because the concept of "street" in this project is actually ambiguous, e.g. we can't easily tell the difference between "street" and "building", or "road".
- In conclusion, I don't recommend you to use Street10 for your project without any double-checking or modification.
Is attribution analysis reliable ? Is "street-feature analysis" from VGG16-Street10 trustworthy ?
- Some recent research throws fundamental doubts on the reliability of saliency methods including (Guided) Grad-CAM: https://arxiv.org/pdf/1711.00867.pdf. Even worse, (Guided) Grad-CAM is actually not enough for "street-feature analysis" in the first place, mainly because it only shows samplewise-attribution for an input image and can't detect the universal features across the whole dataset.
- Research for the interpretability of CNNs' deep features is on-going, and likewise "street-feature analysis" needs more enhancements as well.
- From all above, I must say "street-feature analysis" is still in prototype and not applicable to practical usage for now.

If you have any more comment or suggestion for this project, you're welcome to open an issue or make a pull request !

Future works

While there are many on-going researches about the interpretability of neural networks' behavior including Attribution analysis like (Guided) Grad-CAM, it seems that an interface combining Feature visualization with Attribution analysis introduced in https://distill.pub/2018/building-blocks/ could be the next step. I may implement the rich interface for VGG16-Street10 as a future enhancement.

Acknowledgements

This project is carried as the final assignment for DL4US, online deep-learning educating course held by Matsuo Lab., the University of Tokyo. Though almost all what I have done with the project is accessible within this repository, only the final report I submitted is not available because of restriction. But still you can see the slides of presentation that I held at the completion ceremony on Dec.14, 2018, at University of Tokyo (in Japanese).

The initial inspiration for this project was from my friend Ryohei's Instagram post: https://www.instagram.com/p/Bl7Sd2dnwNt8ZVzTu1Rdbe0NPUayQY48K77u-k0/
Check his cool posts with a lot of implications !

References

While developing this project, I referred to many online resources. I show the main references list below. Thanks for all the great authors !

Places Dataset by B. Zhou, A. Lapedriza, A. Khosla, A. Oliva, and A. Torralba: http://places2.csail.mit.edu/
VGG16-Places365: https://github.com/CSAILVision/places365
- Converted to Keras format model by GKalliatakis: https://github.com/GKalliatakis/Keras-VGG16-places365
(Guided-)GradCAM by Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, Dhruv Batra: https://arxiv.org/abs/1610.02391
- Referred to Keras implementation by jacobgil: https://github.com/jacobgil/keras-grad-cam

Author

KADOWAKI, Shuhei - Undergraduate@Kyoto Univ. - aviatesk

Appendix

Here are batchwisely-computed classification and attribution results for examples in each city (original input images on the upper rows and results on the lower - classification results shown in the title position).

Western cities

Eastern cities

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
assets		assets
data/downloaded		data/downloaded
figs		figs
hists		hists
models		models
notebooks		notebooks
presentation-2018-12-14		presentation-2018-12-14
results		results
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
run.py		run.py

aviatesk/street-feature-analysis

Folders and files

Latest commit

History

Repository files navigation

VGG16-Street10 - Street-Feature Analysis with Keras

TOC

About

What is possible with VGG16-Street10 ?

Background

Methods and details

Overview of results

How to use

Set-up

Street classifying and Street-feature analysis

Use Street10

Re-run Jupyter notebooks on Colaboratory with GPU

Questions

Future works

Acknowledgements

References

Author

Appendix

About

Resources

Stars

Watchers

Forks

Languages