This project uses machine learning models to classify whether a job posting is fraudulent or not.
The dataset used in this project is the "fake_job_postings.csv" file, which contains information about job postings, including the job title, description, employment type, required experience, required education, industry, function, and whether the posting is fraudulent or not. Data on kaggle : https://www.kaggle.com/datasets/shivamb/real-or-fake-fake-jobposting-prediction
The "FakeJobs.ipynb" notebook contains the code used in this project, including data cleaning, preprocessing, Text Preprocessing ,feature engineering, and machine learning modeling. The notebook uses various libraries, including pandas, numpy, matplotlib, seaborn, and nltk.
This project trains and evaluates four different machine learning models: Logistic Regression, Decision Tree, Random Forest. and The accuracy metric is used to evaluate the performance of the models.
The RF model achieved the highest accuracy score of 0.97 on the testing set,.83 Racall of minor Class , The Naive Bayes model achieved an accuracy score of 0.91
The results show that the Random Forest model are the best performers for detecting fraudulent job postings, with Naieve Base as feature engineering. This project provides a useful tool for job seekers and recruiters to identify potentially fake job postings and protect themselves from fraud.