Skip to content

gagolews/datawranglingpy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Minimalist Data Wrangling with Python is envisaged as a student's first introduction to data science, providing a high-level overview as well as discussing key concepts in detail. We explore methods for cleaning data gathered from different sources, transforming, selecting, and extracting features, performing exploratory data analysis and dimensionality reduction, identifying naturally occurring data clusters, modelling patterns in data, comparing data between groups, and reporting the results.

For many students around the world, educational resources are hardly affordable. Therefore, I have decided that this book should remain an independent, non-profit, open-access project. You can read it at:

You can also order a paper copy.

Whilst, for some people, the presence of a "designer tag" from a major publisher might still be a proxy for quality, it is my hope that this publication will prove useful to those who seek knowledge for knowledge's sake.

Please spread the news about this project.

Consider citing this book as: Gagolewski M. (2024), Minimalist Data Wrangling with Python, Melbourne, DOI: 10.5281/zenodo.6451068, ISBN: 978-0-6455719-1-2, URL: https://datawranglingpy.gagolewski.com/.

Any remarks and bug fixes are appreciated. Please submit them via this repository's Issues tracker. Thank you.

About the Author

Dr habil. Marek Gagolewski is currently an Associate Professor at the Systems Research Institute of the Polish Academy of Sciences.

His research interests are related to data science, in particular: modelling complex phenomena, developing usable, general-purpose algorithms, studying their analytical properties, and finding out how people use, misuse, understand, and misunderstand methods of data analysis in research, commercial, and decision-making settings.

He's an author of 90+ publications, including journal papers in outlets such as Proceedings of the National Academy of Sciences (PNAS), Journal of Statistical Software, The R Journal, Information Fusion, International Journal of Forecasting, Statistical Modelling, Physica A: Statistical Mechanics and its Applications, Information Sciences, Knowledge-Based Systems, IEEE Transactions on Fuzzy Systems, and Journal of Informetrics.

In his "spare" time, he writes books for his students (check out Deep R Programming) and develops open-source software for data analysis, such as stringi (one of the most often downloaded R packages) and genieclust (a fast and robust hierarchical clustering algorithm in both Python and R).


Copyright (C) 2022–2024, Marek Gagolewski. Some rights reserved.

This material is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (CC BY-NC-ND 4.0).