Skip to content
This repository has been archived by the owner on Mar 4, 2019. It is now read-only.

An unsupervised analysis combining topic modeling and clustering to preserve an individuals work history and credentials while tailoring their resume towards a new career field

License

bryantbiggs/resume_tailor

Repository files navigation

Resume Tailor

Waffle Code Climate Issue Count Test Coverage

Project Description

An unsupervised analysis combining topic modeling and clustering to preserve an individuals work history and credentials while tailoring their resume towards a new career field.

Image source


Motivation

Currently undergoing a career switch from mechanical engineering into data science and data engineering, I was initially unsure of how to preserve what I had accomplished in my career so far while creating a resume that is targeted towards data science/data engineering roles. Through this project I hope to exchange similar words and phrases within my current resume in order to more closely match those in the data field without removing any prior work experience or accomplishments.


Data Sources

  • Indeed Resume search - inputing select terms (mechanical engineer, data scientist, etc.) will yield search results of individual resumes that fall within that category

Libraries Utilized

  • phantomjs, selenium - Spawn a pool of workers to request resumes from Indeed without being flagged as a crawler
  • beautifulsoup4, requests - Retrieve and extract data sources from web
  • pymongo - Upload and download retrieved resumes from MongoDB instance hosted on AWS
  • nltk, gensim, scikit-learn - Peform data cleansing (stop words, stemming), create LDA topic model, create TF-IDF matrics, calculate LSI and cosine distance

Process

  1. Crawl Indeed Resumes to retrieve a collection of resumes matching the select search terms (mechanical engineer, data scientist, data engineer, etc.)
  2. Upload each resume retrieved to MongoDB in AWS since data set is quite large (+10gb)
  3. Clean data set (remove stop words, punctuations, etc., apply stemming)
  4. Create LDA topic model from cleaned corpus of resumes
  5. Cluster corpus of resumes based on their topics
  6. Apply same pre-processing transformations to uploaded resume to be tailored to new target career field and gather topics
  7. Change current resumes topics that most closely match current field to similar (synonymous) topics found in intended target field
  8. Measure cosine similarity between modified resume and target field resumes using TF-IDF and LSI
  9. Continue to repeat changes to current resume wording in order to more closely match target field resumes in terms of cosine distance

Results

About

An unsupervised analysis combining topic modeling and clustering to preserve an individuals work history and credentials while tailoring their resume towards a new career field

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages