Scene Understanding Based on Text Detections

Humans observing images of a place unknowingly deduct a great deal of information.

Automation of this process requires the algorithm to understand what details are relevant.

It can be assumed that the relevant information for finding a geographic location in the city is present in street signs.

This code was created by undergrads as part of a half-year long project, with the goal to find the geographic location of a place pictured in a set of photo, based on the text present (Assuming the photos can create a panorama).

Requirements

The project uses CharNet for the text detection.

fuzzywuzzy 0.18.0
python-Levenshtein 0.12.0
gmplot 1.4.1
googlemaps 4.4.2
numpy 1.18.4

CharNet dependancies

torch 1.4.0
torchvision 0.5.0
opencv-python 3.4.2
opencv-contrib-python 3.4.2
editdistance 0.5.3
pyclipper 1.1.0
shapely 1.7.0
yacs 0.1.7

If you're having probelms downloading the correct torch/torchvision version, please try using:

pip install torch===1.4.0 torchvision===0.5.0 -f https://download.pytorch.org/whl/torch_stable.html

Usage Example

To check a signle scene use:

--single_scene ".\scene" --results_dir ".\output"

scene should be a folder with a set of photos.

for checking multiple scenes at once, order the images in seperate folders and input the parent folder. e.g.:

├── Parent
│   ├── scene 1
│   ├── scene 2
│   ├── scene 3

and then use:

--scenes_dir ".\Parent" --results_dir ".\output"

If you know the photos are ordered by name, you can add the option --dont_reorder to vastly improve runtime.
if --results_dir isn't given, the output will be generated in the input directory.

Team

Hila Manor and Adir Krayden

Supervised and guided by Elad Hirsch

Videos

A video showcasing an overview of 3 runs of the projects:

A simple scene.
A simple scene, yet had problems with Google's GeocodingAPI.
A complicated scene.

A video showcasing a run that used intersecting locations (close places) to find the location

The Algorithm

Panorama Creation
1. Find images order
  - Random images order input is assumed and fixed
  - Feature-based matching
2. Estimate focal length
  - based on homographies
3. Inverse cylindrical warp
  - Use cylindrical panoramas to enable 360° field-of-view.
4. Stitch panorama
  - Stitch with affine transformation to fix ghosting and drift (camera can be hand-held, and not on a tripod)
Signs Extraction
1. Split the panorama to windows
  - enables runs on weaker GPUs
2. Extract text using CharNet on each window
  - Splitted words cause a new search in a centered window
  - CharNet “fixes” detected text by comparing to a synthetic dictionary
3. Concatenate words to signs
  - Match geometry: close words vertically or horizontally
  - Match colors: validate that the backgrounds colors are from the same distribution
4. Catalogue signs by gradeing similarity to street-signs
  - Background color
  - Keywords presence (e.g. avenue, st.)
  - Appearance in online streets list
5. Filter out similar variations and long signs
Location Search
1. Query Google’s GeocodeAPI only for street signs
  - This API doesn’t understand points of interest
2. Query Google’s PlacesAPI for each of the other signs individually.
  - This API can’t handle intersecting data (2 businesses in 1 location)
  - Search for close (geographically) responses
3. Display options to choose from, and open a marked map

Sources

CharNet
- Xing, Linjie and Tian, Zhi and Huang, Weilin and Scott, Matthew R, “Convolutional Character Networks”, Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2019
Heung-Yeung Shum and R. Szeliski, Microsoft Research, “Construction of Panoramic Image Mosaics with Global and Local Alignment”, Sixth International Conference on Computer Vision and IJCV, 1999

Name		Name	Last commit message	Last commit date
Latest commit History 138 Commits
Data		Data
charnet		charnet
code snippets for testing stuff		code snippets for testing stuff
.gitignore		.gitignore
Focal length estimation.docx		Focal length estimation.docx
README.md		README.md
box_algo.py		box_algo.py
charnet_runner.py		charnet_runner.py
google_query.py		google_query.py
image_windows.py		image_windows.py
main_runner.py		main_runner.py
panorama_maker.py		panorama_maker.py
text_algo.py		text_algo.py

HilaManor/Scene-Understanding-Based-on-Text-Extraction

Folders and files

Latest commit

History

Repository files navigation

Scene Understanding Based on Text Detections

Table of Contents

Requirements

CharNet dependancies

Usage Example

Team

Videos

The Algorithm

Sources

About

Topics

Resources

Stars

Watchers

Forks

Languages