Resume Tailor

Project Description

An unsupervised analysis combining topic modeling and clustering to preserve an individuals work history and credentials while tailoring their resume towards a new career field.

Image source

Motivation

Currently undergoing a career switch from mechanical engineering into data science and data engineering, I was initially unsure of how to preserve what I had accomplished in my career so far while creating a resume that is targeted towards data science/data engineering roles. Through this project I hope to exchange similar words and phrases within my current resume in order to more closely match those in the data field without removing any prior work experience or accomplishments.

Data Sources

Indeed Resume search - inputing select terms (mechanical engineer, data scientist, etc.) will yield search results of individual resumes that fall within that category

Libraries Utilized

phantomjs, selenium - Spawn a pool of workers to request resumes from Indeed without being flagged as a crawler
beautifulsoup4, requests - Retrieve and extract data sources from web
pymongo - Upload and download retrieved resumes from MongoDB instance hosted on AWS
nltk, gensim, scikit-learn - Peform data cleansing (stop words, stemming), create LDA topic model, create TF-IDF matrics, calculate LSI and cosine distance

Process

Crawl Indeed Resumes to retrieve a collection of resumes matching the select search terms (mechanical engineer, data scientist, data engineer, etc.)
Upload each resume retrieved to MongoDB in AWS since data set is quite large (+10gb)
Clean data set (remove stop words, punctuations, etc., apply stemming)
Create LDA topic model from cleaned corpus of resumes
Cluster corpus of resumes based on their topics
Apply same pre-processing transformations to uploaded resume to be tailored to new target career field and gather topics
Change current resumes topics that most closely match current field to similar (synonymous) topics found in intended target field
Measure cosine similarity between modified resume and target field resumes using TF-IDF and LSI
Continue to repeat changes to current resume wording in order to more closely match target field resumes in terms of cosine distance

Results

Original presentation delivered on 08/19/2016

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
src		src
static		static
.gitattributes		.gitattributes
.gitignore		.gitignore
.pyup.yml		.pyup.yml
LICENSE		LICENSE
README.md		README.md
Untitled.ipynb		Untitled.ipynb
cosine.ipynb		cosine.ipynb
id_crawler.ipynb		id_crawler.ipynb
input.ipynb		input.ipynb
location.ipynb		location.ipynb
mongodb_upload.ipynb		mongodb_upload.ipynb
preprocess-come back later.ipynb		preprocess-come back later.ipynb
preprocess.ipynb		preprocess.ipynb
resume_crawler.ipynb		resume_crawler.ipynb

License

bryantbiggs/resume_tailor

Folders and files

Latest commit

History

Repository files navigation

Resume Tailor

Project Description

Motivation

Data Sources

Libraries Utilized

Process

Results

About

Topics

Resources

License

Stars

Watchers

Forks

Languages