Skip to content
View juniorcl's full-sized avatar
🏠
Working from home
🏠
Working from home
Block or Report

Block or report juniorcl

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
juniorcl/README.md

Hi there 👋! I'm Clébio de Oliveira Júnior

Physics Teacher and Data Scientist

Linkedin Badge   Medium Badge   DEV Badge   Kaggle Badge   GitLab Badge   Gmail Badge  


I'm a physics teacher and a data scientist with a passion for technology.

I currently develop projects that prioritize solving business problems, from understanding them, analyzing data, to extracting insights and implementing the solution. I also continue to develop myself with improvement and study activities such as a portfolio of data science projects and I also write about the same topic on a blog on Medium.

For more details about my projects and each solution, they are described in the data science project section.

Analytics Tools

  • Data Collection and Storage: MySQL and PostgreSQL.

  • Data Processing and Analytics: Jupyter Notebook, Pandas, Numpy.

  • Development: Python, Git and Clean Code.

  • Data Visualization: Seaborn and Matplotlib.

  • Machine Learning Modeling: Classification, Regression, Clusterization, Time Series and Neural Network.

  • Machine Learning Deployment: Flask and Docker.

Data Science Projects

  • Olist is the largest departament store in Brazilian marketplaces. This project aims to develop and implement a model to predict the time in days until the delivery of a given product. in progress

  • To help the booking of the Airbnb this data science project aim to create a machine learning model to predict the first booking of a new user. Unfortunately the database is very desbalanced which difficult the prediction of the model, the best result was 17.48% +/- 0.4% of accuracy. Therefore new approaches guided by the business will be necessary to improve the results.

  • To help the sales team, this data science project was created to sort a list to improve the cross-selling. The model was able to organize that almost all interested customers (98.31% +/- 0.16%) stay on up to 50% of the list, saving half of the expenses incurred for calls. So, if each call costs R$ 15.00 in 20,000.00 there is an expense of R$ 300,000.00. Using the model it is possible to spend only R$ 150,000.00.

  • Financial transactions fraud is one of the biggest problems faced by financial institutions. Thus, this project uses data science and machine learning to detect and avoid fraudulent transactions. The model got a precision of 96.3% +/- 0.7% and a recall of 76.3% +/- 3.5%. The profit expected by the company is R$ 57,251,574.44.

  • When a client churns, it represents a problem, which results in money loss for the company. In this project, I created a solution using data to predict such behavior and avoid it. The machine learning model was able to detect 76.5% of the client which could churn, by using unseen data as example. It represents a recovery of R$ 2,878,197.97 for the company.

  • Cardio Catch Disease is a company specialized in detecting heart diseases in early stages. For every 5% above 50% of prediction accuracy, there is an increase of 50% on the value charged per client. So, in this data science project, I created a model with a recognition rate of 71.8% +/- 0.5% and the estimated profit generated by using this model may be about R$ 11,285,500.00.

  • To ideate a new strategy of investments in for each sale store may be difficult. Therefore, to help the stack holders to make decisions about individual investments for each and every store in the chain, this data science project created a machine learning model able to predict the sales up to six weeks in advance. Hence, enabling them to calculate the profit per store and the amount of money available to invest.

Data Engineering Projects

  • The Bookclub doesn't collect the data from its website, however they are updated with each purchase, sale or exchange that takes place on the website. For this purpose, this project aims to collect, transforma and load (ETL) data from the website books.toscrape for a MySQL database. The ETL is schenduled using Airflow. Both MySQL and Airflow plataform were active using Docker.

Blog Posts

Pinned

  1. data-science-toolkit data-science-toolkit Public

    A set of functions to help during the data science project.

    Python 1 2

  2. resume resume Public

    My resumes

    2

  3. imersao-dados-alura imersao-dados-alura Public

    Event about data science created by Alura

    Jupyter Notebook 1

  4. dotfiles dotfiles Public

    Script used to configure my operating systems

    Shell 2

  5. webscraping-comment-tripadvisor webscraping-comment-tripadvisor Public

    A study project who aims to create a web scarping for getting informations about comments on the site

    Python 2

  6. lmaoclost/Machine-Health lmaoclost/Machine-Health Public

    A Study project that predict disease with Machine Learning, NodeJS and Data Science

    TypeScript 5