GitHub - archana1998/Gradient-Ascent_FK-GRiD: Submission for the Flipkart GRiD 2.0 hackathon under the track "Fashion Intelligence Systems"

Gradient Ascent - Flipkart GRiD 2020

Submission by - Rushabh Musthyala, Archana Swaminathan, Shanmukh Kali Prasad

Problem Definition

A fashion retailer wants to source ongoing and upcoming fashion trends from major online fashion portals and online magazines in a consumable and actionable format, so that they are able to effectively and efficiently design an upcoming fashion product portfolio.

Deliverables -

A mechanism for effectively ranking products on e-Commerce sites
A way to analyse trending and lagging products on fashion portals and magazines
For the solution to be scalable

Subproblems

Scraping data from e-Commerce websites and fashion portals
Cleaning image data to remove unwanted artifacts (extracting only images of shirts)
Learning feature encodings for all of the images
Computing a popularity metric (PM) to effectively combine the rating and number of reviews
Clustering the images based on their encodings to gain insight on what is trending and what is lagging

Requirements

Python 3.6
Selenium
Keras
TensorFlow
Matplotlib
Sklearn
Numpy
Pandas

1. Web Scraping

This was done using Selenium with Python3
We chose 6 locations to scrape images from, covering a range of multipurpose e-Commerce sites, fashion magazines, catalogues and fashion shopping sites
- Vogue India
- Flipkart
- Myntra (above 3 can be accessed by running the data_collection_fkmyvog.py)
- Amazon (run amazon_data_script.py)
- Pinterest Womens Fashion catalogue (run pinterest_woman_script.py)
- Pinterest Mens Fashion catalogue (run pinterest_man_script.py)
From sites like Flipkart and Amazon, we extracted the product name, rating, number of reviews and the image
From the other sites, we extracted the fashion images
The scripts can be easily modified to work on other websites by just changing a few variables according to the architecture of the website, hence this step can be easily scaled up
All of the data scraped is converted to Pandas dataframe and then stored as a CSV

2. Downloading the images and Object Detection

The images can be downloaded from the image links stored in the CSV by running the image_download_script.py
Object Detection was done using a pretrained YOLOv3 architecture that was trained with the DeepFashion2 dataset
Code is available at this repository (https://github.com/archana1998/Clothing-Detection), and can be cloned and used by just running the new_image_demo.py script
This model identifies "long and short top" object categories and crops out just the bounding box of the image, that contains only the t-shirt.
The t-shirt image is then saved and used for feature extraction

3. Learning Feature Encodings

In order to represent our images for later processing, we needed a way to extract the features from each clothing item
We trained a model using the keras library and tensorflow backend
Our model was based on the CNN architecture which is known in the Computer Vision world for being able to learn features from images
We recreated some of the images using the encodings we got and the results were very promising, indicating that out feature encodings/representations are accurate
To create the model, run the script encoder_training_script.py
Alternatively, download the trained model from - https://drive.google.com/file/d/1_ZRFLLusck_1waFl703PK0oDas7NWo0n/view?usp=sharing

4. Computing the Popularity Metric (PM)

We wanted consider both ratings and the number of ratings in our attempt to rank all the products effectively
We came up with a popularity measure which combines the two properly
A Bayesian view of the beta distribution was adopted to come up with a formula to give us a PM given the rating and number of ratings
We loaded in all our e-Commerce data, calculated the feature encodings using the model mentioned earlier
Then computed the PM for each product
Then trained a model to predict the PM given a set of encodings - we can now compare the predicted performance of different products on e-Commerce sites, this is especially useful for designers that want to know how the public would react to their clothes
To create and train the model, run pm_model_train_script.py
Alternatively, you can download the trained model from here - https://drive.google.com/file/d/1QiyeRfWD18GAdJl-lUMxjvp6LF_amTl1/view?usp=sharing
Once the model is created, you can run pm_predictor_script.py to predict the PM for any input image

5. Clustering

Using the encodings previously calculated, we performed clustering on a selection of images determined by the user to visualise the trending and lagging products in the set of images being considered
5 clustering algorithms were tested and evaluated using the Silhouette coefficient and K means clustering gave us the best results
We took the largest cluster to be a representation of the most popular/trending styles of clothes and the smallest clusters to be a representation of what isn't popular
This can be tested by running clustering_script.py

Running instructions

Create an environment with all the packages and libraries specified in the "requirements" section
Download the "Zipped_final.zip" folder from here - https://drive.google.com/file/d/1WI95J600swejVn2-6vzhFRuxKfZa_gQh/view?usp=sharing
Download the encoder model from here - https://drive.google.com/file/d/1_ZRFLLusck_1waFl703PK0oDas7NWo0n/view?usp=sharing
Download the PM predictor model from here - https://drive.google.com/file/d/1QiyeRfWD18GAdJl-lUMxjvp6LF_amTl1/view?usp=sharing
Run clustering_script.py to replicate the clustering step and to visualise trends and lags in selected images from different sites
Run pm_predictor_script.py to calculate the expected PM of any input image given to the model

Note: These instructions are intended to get someone up and running with the application quickly and easily, alternatively you can choose to scrape data and train models from scratch using the respective scripts available in the repo

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
README.md		README.md
amazon_data_script.py		amazon_data_script.py
clustering_script.py		clustering_script.py
csv2html_script.py		csv2html_script.py
data_collection_fkmyvog.py		data_collection_fkmyvog.py
encoder		encoder
encoder_training_script.py		encoder_training_script.py
image_download_script.py		image_download_script.py
pinterest_man_script.py		pinterest_man_script.py
pinterest_woman_script.py		pinterest_woman_script.py
pm_model_train_script.py		pm_model_train_script.py
pm_predictor_script.py		pm_predictor_script.py
remove_dups.py		remove_dups.py

archana1998/Gradient-Ascent_FK-GRiD

Folders and files

Latest commit

History

Repository files navigation

Gradient Ascent - Flipkart GRiD 2020

Problem Definition

Subproblems

Requirements

1. Web Scraping

2. Downloading the images and Object Detection

3. Learning Feature Encodings

4. Computing the Popularity Metric (PM)

5. Clustering

Running instructions

About

Topics

Resources

Stars

Watchers

Forks

Languages