WORK IN PROGRESS
Unsupervised method of performing pose estimation of 2D images using renderings of 3D object models. A detailed write-up explaining methodology and presenting current results can be found here (1).
Pipeline:
- Render images of N models in M known poses. Collect 'real images' scraped from web (not included in repo - see arjunkarpur/multi-view-rendering (2))
- Use network to determine features for real & rendered images (start w/ AlexNet trained on ImageNet)
- Calculate distance grid between real images and rendered images (dim: #poses x #models)
- Perform pose estimation
- Generate triplets
- Fine tune same network using triplets
- Perform pose estimation testing for error rates (repeat steps 2-4 w/ new network weights)
To-do (w/ priority):
- (1) Change triplet code to dynamically find triplets during training to speed up training time
- (2) Add in triplet generation using real-to-real comparisons (pos and neg)
- (2) Change distance grid computation code to work with UTCS Condor for faster runtime
- (3) Add in commands and detailed instructions on how to run in README
- (3) Create script to automate pipeline
Links: