data-cleaning

Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algorithms. Desbordante has a console version and an easy-to-use web application.

data-science data-mining exploratory-data-analysis tabular-data feature-selection data-engineering feature-extraction data-analytics knowledge-discovery data-wrangling data-preprocessing feature-engineering spreadsheets data-exploration data-mining-algorithms data-cleaning data-profiling anomaly-detection data-cleansing correlations

Updated May 20, 2024
C++

scribe-org / Scribe-Data

Star

Wikidata and Wikipedia language data extraction

Updated May 20, 2024
Python

bluestero / urlgenie

Star

Python package to make URL extraction, generalization, validation, and filtration easy.

url-parsing data-processing data-cleaning data-curation generalization data-cleansing data-sanitization url-generalization

Updated May 20, 2024
Python

rchandu23 / PRODIGY_DS_03

Star

Task 3 of Prodigy InfoTech Data Science Internship

machine-learning correlation exploratory-data-analysis jupyter-notebook data-visualization statistical-analysis data-cleaning decision-tree-classifier

Updated May 20, 2024
Jupyter Notebook

Gokulakkrizhna / Airbnb_analysis

Star

This project helps user to perform trend analysis, pattern recognition, and deriving data insights through exploratory data analysis (EDA) for the Airbnb data. Created an interactive Powerbi dashboard to analyze Airbnb data.

python data-science pandas-dataframe plotly data-visualization powerbi data-cleaning data-scraping geo-visualization streamlit-dashboard eda-analysis

Updated May 20, 2024
Python

johnkerl / miller

Star

Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON

Updated May 20, 2024
Go

Yunaeecy / Notebooks

Star

Data analysis

data-science data-visualization data-cleaning

Updated May 20, 2024
Jupyter Notebook

DicksonC96 / Covid-Mobility-Malaysia

Star

A daily auto-updating interactive dashboard project to visualize the impact of community's mobility to daily new COVID-19 cases by leveraging the data from MOH Malaysia, Google, Apple, Waze and TonTon. https://public.tableau.com/views/COVID-19MobilityDashboard/MobilityTrends (Tableau) https://datastudio.google.com/reporting/54616e0e-19c9-4097-bc…

python bash automation jupyter-notebook data-visualization tableau data-cleaning malaysia github-actions covid-19 covid19-data data-studio-google covid-mobility

Updated May 20, 2024
Jupyter Notebook

ruchi020897 / Hotel_Booking_Data_Analysis_Using_Python

Star

This project utilizes Python and various libraries like pandas, matplotlib, and seaborn to examine hotel booking cancellations and other unrelated factors. The aim is to boost revenue generation efficiency and provide valuable business recommendations.

visualization pandas-dataframe exploratory-data-analysis python3 seaborn data-analysis matplotlib data-cleaning

Updated May 20, 2024
Jupyter Notebook

rakeshbangla41 / 8_Week_SQL_Challenge

Star

Solutions for #8WeekSQLChallenge using MySQL

mysql sql data-transformation data-analytics data-analysis data-cleaning 8weeksqlchallenge

Updated May 20, 2024

johnsonhk88 / Data-Science-Challenge-Coursera-Project-Loan-Default-Prediction

Star

Data Science Challenge from Coursera Project : Loan Default Prediction

data-science machine-learning ai deep-learning random-forest exploratory-data-analysis coursera data-cleaning loan-default-prediction xgboost-classifier ml-evaluation

Updated May 20, 2024
Jupyter Notebook

CambioML / uniflow-llm-based-pdf-extraction-text-cleaning-data-clustering

Star

LLM-based text extraction from unstructured data like PDFs, Words and HTMLs. Transform and cluster the text into your desired format. Less information loss, more interpretation, and faster R&D!

data-cleaning llm generative-ai

Updated May 20, 2024
Python

mmahdin / Data-Mining-Projects

Star

Data Mining coursework at AmirKabir University of Technology

data-mining regression pca association-rules data-cleaning

Updated May 19, 2024
Jupyter Notebook

unionai-oss / pandera

Star

A light-weight, flexible, and expressive statistical data testing library

testing schema validation data-validation pandas-dataframe assertions pandas testing-tools data-processing dataframes data-cleaning hypothesis-testing data-verification pandas-validation data-check data-assertions dataframe-schema pandas-validator

Updated May 19, 2024
Python

Asifahmad5848 / Forbes-Billionaires-Analysis

Star

The Forbes Billionaires Analysis project provides a comprehensive exploration of the world's billionaires using data from Forbes. The accompanying Jupyter Notebook (forbes-Billionaires-Analysis.ipynb) contains detailed analysis, visualizations, and insights derived from the Forbes billionaires dataset.

data-science data-transformation eda data-cleaning forbes billionaires

Updated May 19, 2024
Jupyter Notebook

Improve this page

Add a description, image, and links to the data-cleaning topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the data-cleaning topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data-cleaning

Here are 2,793 public repositories matching this topic...

akanz1 / klib

cleanlab / cleanlab

jpcadena / PrDS_2024__TelcoCustomerChurn

voxel51 / fiftyone

psrc / travel-studies

Desbordante / desbordante-core

scribe-org / Scribe-Data

bluestero / urlgenie

rchandu23 / PRODIGY_DS_03

Gokulakkrizhna / Airbnb_analysis

johnkerl / miller

Yunaeecy / Notebooks

DicksonC96 / Covid-Mobility-Malaysia

ruchi020897 / Hotel_Booking_Data_Analysis_Using_Python

rakeshbangla41 / 8_Week_SQL_Challenge

johnsonhk88 / Data-Science-Challenge-Coursera-Project-Loan-Default-Prediction

CambioML / uniflow-llm-based-pdf-extraction-text-cleaning-data-clustering

mmahdin / Data-Mining-Projects

unionai-oss / pandera

Asifahmad5848 / Forbes-Billionaires-Analysis

Improve this page

Add this topic to your repo