Easy to use Python library of customized functions for cleaning and analyzing data.
-
Updated
May 21, 2024 - Python
Easy to use Python library of customized functions for cleaning and analyzing data.
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
Repositorio del proyecto de predicción del problema Telco Customer Churn (kaggle)
The open-source tool for building high-quality datasets and computer vision models
Repo for PSRC's Regional Travel Studies, 2014 onward
Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algorithms. Desbordante has a console version and an easy-to-use web application.
Wikidata and Wikipedia language data extraction
Python package to make URL extraction, generalization, validation, and filtration easy.
Task 3 of Prodigy InfoTech Data Science Internship
This project helps user to perform trend analysis, pattern recognition, and deriving data insights through exploratory data analysis (EDA) for the Airbnb data. Created an interactive Powerbi dashboard to analyze Airbnb data.
Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON
A daily auto-updating interactive dashboard project to visualize the impact of community's mobility to daily new COVID-19 cases by leveraging the data from MOH Malaysia, Google, Apple, Waze and TonTon. https://public.tableau.com/views/COVID-19MobilityDashboard/MobilityTrends (Tableau) https://datastudio.google.com/reporting/54616e0e-19c9-4097-bc…
This project utilizes Python and various libraries like pandas, matplotlib, and seaborn to examine hotel booking cancellations and other unrelated factors. The aim is to boost revenue generation efficiency and provide valuable business recommendations.
Solutions for #8WeekSQLChallenge using MySQL
Data Science Challenge from Coursera Project : Loan Default Prediction
LLM-based text extraction from unstructured data like PDFs, Words and HTMLs. Transform and cluster the text into your desired format. Less information loss, more interpretation, and faster R&D!
Data Mining coursework at AmirKabir University of Technology
A light-weight, flexible, and expressive statistical data testing library
The Forbes Billionaires Analysis project provides a comprehensive exploration of the world's billionaires using data from Forbes. The accompanying Jupyter Notebook (forbes-Billionaires-Analysis.ipynb) contains detailed analysis, visualizations, and insights derived from the Forbes billionaires dataset.
Add a description, image, and links to the data-cleaning topic page so that developers can more easily learn about it.
To associate your repository with the data-cleaning topic, visit your repo's landing page and select "manage topics."