Introduction

This project is focus on classifying and extracting skills from the resume dataset provided by Marti Palan and Hend Labibfrom kaggle. This context is a internship project but use public dataset instead.

JUPYTER NOTEBOOK

1. RESUME CLEAN & ADD SKILLS

This notebook Focused on clean the resume dataset by using pandas. Droped the NaN rows and columns, then reseted the index. After finishing the cleaning, added the 'skills' columns to that dataset and filled in the score for each resume, which is if the resume contains the skill in that column, it result one in the table, otherwise is zero. Saved the dataset as a csv.

2. SIMPLE MATCH & MACHINE LEARNING

By imported the csv file that I got in the first notebook, used the Sørensen–Dice coefficient (Dice similarity coefficient) to calculate the similarity score by setting the TP(True Positive) as the original resume and compared resume both got 1 for the same skill, FP(False Positive) as the original resume and compared resume both got 0 for the same skill, FN(False Negative) as the score of the original resume are not equal to the score of the compared resume. Then listed the ten highest scoring resumes

Secondly, extracted the Category as the label and apply K-means clustering and KNN (for k = 30) on the dataset, however, the accuracy for both algorithms are not good, 1.45% and 13.41% correspodingly. Thus, the resume dataset need more deeper cleaning and remove the noise to increase the accuracy.

3. NLP AND SPACY

Deleted the \x and \n in the resume by using re.sub to accomplish my goal, then used the first resume as the sample resume and pick 30 other resumes from different Categories, apply spacy.similarity to get the 'score_Version_1' column and repeat these step on the resumes after removing the stopwords to get 'score_Version_2' column.

Used spacy and regex to clean the Resume dataset further（Remove stopwords and \n）then find the similarity score.

4. Extract SKILLS

Used a jsonl file from jobzilla, added the 'skills' in to the spacy entity ruler and prase the skills out with a sorted sequence.

5.Similarity of Skills

Choosed a random resume as the sample, then used Sørensen–Dice coefficient and spacy.similarity to get the table that contained the similarity score with a descending list so that it is clear to see which resume in the dataset has the best match to the sample resume.

6.Classification Through Skills

Used Elbow Method with Within-Cluster-Sum of Squared Error and Silhouette to testify the appropriate k value for k-mean clustering, both of the methods showed that 30 is the suitable value which is matched the origional numbers of category in the resume dataset.

7.Functions

Combined the code that I wrote before and created as a list of functions: parse skills, find match resume, and parse the University. Parsed the code to VS code.

Create a package

Recommend Profiles

Parsed out the functions and named it as recommend_profiles.py, then created init.py to make it as a package. Wrote corrosponding API and pytest of that package.

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
jupyter notebook		jupyter notebook
original data and result csv		original data and result csv
recommend_profiles		recommend_profiles
.gitignore		.gitignore
README.md		README.md
jz_skill_patterns.jsonl		jz_skill_patterns.jsonl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

jupyter notebook

jupyter notebook

original data and result csv

original data and result csv

recommend_profiles

recommend_profiles

.gitignore

.gitignore

README.md

README.md

jz_skill_patterns.jsonl

jz_skill_patterns.jsonl

Repository files navigation

Introduction

JUPYTER NOTEBOOK

1. RESUME CLEAN & ADD SKILLS

2. SIMPLE MATCH & MACHINE LEARNING

3. NLP AND SPACY

4. Extract SKILLS

5.Similarity of Skills

6.Classification Through Skills

7.Functions

Create a package

Recommend Profiles

About

Releases

Packages

Languages

xilin-tian/Resume_Classification

Folders and files

Latest commit

History

Repository files navigation

Introduction

JUPYTER NOTEBOOK

1. RESUME CLEAN & ADD SKILLS

2. SIMPLE MATCH & MACHINE LEARNING

3. NLP AND SPACY

4. Extract SKILLS

5.Similarity of Skills

6.Classification Through Skills

7.Functions

Create a package

About

Topics

Resources

Stars

Watchers

Forks

Languages