Skip to content

Commit

Permalink
adding tons of referrernces
Browse files Browse the repository at this point in the history
  • Loading branch information
debnolan committed May 27, 2023
1 parent 752f611 commit bb5001e
Showing 1 changed file with 42 additions and 43 deletions.
85 changes: 42 additions & 43 deletions content/additional_resources.md
Original file line number Diff line number Diff line change
@@ -1,97 +1,96 @@
(extra_reading)=
# Additional Material

We cover a lot of topics in this book, and if you want to learn more about any of them, we have collected a list of additional resources, including textbooks and online tutorials.
We cover a lot of topics in this book, and if you want to learn more about them, we have collected a list of additional resources that include textbooks and online tutorials.

For an overview of the larger themes in this book see:
More in-depth treatments of the larger themes in this book can be found in the following resources.

+ [*Sampling: Design and Analysis*](https://doi.org/10.1201/9780429298899) by Lohr for topics in scientific sampling;
+ The sampling topics introduced in this book and several more, can be found in [*Sampling: Design and Analysis*](https://doi.org/10.1201/9780429298899) by Lohr. Lohr also contains a treatment of the population, access frame, sampling methods, and sources of bias.

+ [*Statistics*](https://wwnorton.com/books/Statistics/) by Freedman, Pisani, and Purves is useful for introductory statistics related to the urn model;
+ For an introductory treatment of the urn model, confidence intervals, and hypothesis tests, we recommend [*Statistics*](https://wwnorton.com/books/Statistics/) by Freedman, Pisani, and Purves.

+ [*Probability*](https://doi.org/10.1007/978-1-4612-4374-8) by Pitman and [*Introduction to Probaqbility*](https://doi.org/10.1201/b17221) by Hwang and Blitzstein for a more mathematical treatment of probability;
+ A more mathematical treatment of probability that is still introductory we suggest [*Probability*](https://doi.org/10.1007/978-1-4612-4374-8) by Pitman and [*Introduction to Probability*](https://doi.org/10.1201/b17221) by Hwang and Blitzstein.

+ [*Principles of Data Wrangling: Practical Techniques for Data Preparation*](https://www.oreilly.com/library/view/principles-of-data/9781491938911/) by Rattenbury, Hellerstein, Heer, Kandel, and Carreras for more on data wrangling;
+ A resource for data wrangling is [*Principles of Data Wrangling: Practical Techniques for Data Preparation*](https://www.oreilly.com/library/view/principles-of-data/9781491938911/) by Rattenbury, Hellerstein, Heer, Kandel, and Carreras. Many of the organizational topics of wrangling stem from this resource.

+ [* *]() by for Pandas

+ [* *]() by for SQL
+ SQL [*The Essence of Databases*](https://dl.acm.org/doi/book/10.5555/274800) by Roland. W3 School [Introduction to SQL](https://www.w3schools.com/sql/sql_intro.asp)

+ [*Exploratory Data Analysis*](https://archive.org/details/exploratorydataa00tuke_0) by Tukey for EDA;
+ The original test by Tukey,[*Exploratory Data Analysis*](https://archive.org/details/exploratorydataa00tuke_0) offers an introduction to the topic. A more modern treatment can be found in XXXX.

+ [*Fundamentals of Data Visualization: A Primer on Making Informative and Compelling Figures*](https://clauswilke.com/dataviz/) by Wilke for more on visualization;
+ See [*Fundamentals of Data Visualization: A Primer on Making Informative and Compelling Figures*](https://clauswilke.com/dataviz/) by Wilke for more on visualization. Our guidelines do not entirely match Wilke's but they come close and it's helpful to see a variety of opinions on the topic.

+ [*Linear Models with Python*](https://julianfaraway.github.io/LMP/) by Faraway, [*Applied Regression Analysis and Generalized Linear Models*](https://us.sagepub.com/en-us/nam/applied-regression-analysis-and-generalized-linear-models/book237254) by Fox, [*An Introduction to Statistical Learning: With Applications in Python*](https://www.statlearning.com/) by James, Witten, Hastie, Tibshirani, and Taylor, and [*Applied Linear Regression*](https://doi.org/10.1002/0471704091) by Weisberg for more on modeling, transformations, bootstrap, and regularization.
+ The many topics on modeling, including transformations, one-hot encoding, model-selection, cross-validation, and regularization can be found in several sources. We recommend: [*Linear Models with Python*](https://julianfaraway.github.io/LMP/) by Faraway, [*Applied Regression Analysis and Generalized Linear Models*](https://us.sagepub.com/en-us/nam/applied-regression-analysis-and-generalized-linear-models/book237254) by Fox, [*An Introduction to Statistical Learning: With Applications in Python*](https://www.statlearning.com/) by James, Witten, Hastie, Tibshirani, and Taylor, and [*Applied Linear Regression*](https://doi.org/10.1002/0471704091) by Weisberg.

+ [*Mathematical Statistics and Data Analysis*](https://www.cengage.com/c/mathematical-statistics-and-data-analysis-3e-rice/9780534399429/) by Rice for more on confidence intervals and testing.
+ A more formal treatment of confidence intervals, prediction intervals, testing, and the bootstrap can be found in [*Mathematical Statistics and Data Analysis*](https://www.cengage.com/c/mathematical-statistics-and-data-analysis-3e-rice/9780534399429/).

+ [*Monte Carlo theory, methods and examples*](https://artowen.su.domains/mc/) by Owen to learn more about simulation;
+ Owen's online text, [*Monte Carlo theory, methods and examples*](https://artowen.su.domains/mc/) provides a solid introduction to simulation.

+ [*Programming Collective Intelligence*](https://www.oreilly.com/library/view/programming-collective-intelligence/9780596529321/) by Segaran for more on optimization.
+ [*Programming Collective Intelligence*](https://www.oreilly.com/library/view/programming-collective-intelligence/9780596529321/) by Segaran covers the topic of optimization.

+ [*Applied Text Analysis with Python: Enabling Language-Aware Data Products with Machine Learning*](https://www.oreilly.com/library/view/applied-text-analysis/9781491963036/) by Bengfort, Bilbro, and Ojeda for more on text analysis.
+ See [*Applied Text Analysis with Python: Enabling Language-Aware Data Products with Machine Learning*](https://www.oreilly.com/library/view/applied-text-analysis/9781491963036/) by Bengfort, Bilbro, and Ojeda for more on text analysis.

In addition, we provide a list of references for several smaller topics and topics that were lightly touched upon.
In addition, we provide a list of resources for many smaller topics and for topics that were lightly touched upon.

+ To learn more about the interplay between questions and data, we recommend [Questions, Answers, and Statistics](https://iase-web.org/documents/papers/icots2/Speed.pdf) by Speed.
+ To learn more about the interplay between questions and data, we recommend [Questions, Answers, and Statistics](https://iase-web.org/documents/papers/icots2/Speed.pdf) by Speed. In addition Leek and Peng connect questions with the type of analysis in [What is the question? Mistaking the type of question being considered is the most common error in data analysis](https://doi.org/10.1126/science.aaa6146).

+ To learn more about how to analyze data with a time domain, we refer you to [*Time Series Analysis and Its Applications*](https://doi.org/10.1007/978-3-319-52452-8) by Shumway and Stoffer.
+ The broad topic of how to analyze time-series data, we refer you to [*Time Series Analysis and Its Applications*](https://doi.org/10.1007/978-3-319-52452-8) by Shumway and Stoffer.

+ Ethics
+ To learn more about the human contexts and ethics of data, see the [HCE Toolkit](https://data.berkeley.edu/hce-toolkit) and Tuskegee University's [National Center for Bioethics in Research and Health Care](https://www.tuskegee.edu/about-us/centers-of-excellence/bioethics-center).

+ Data privacy - SAM?

+ A proof that the median minimizes absolute error can be found in [*Mathematical Statistics: Basic Ideas and Selected Topics Volume I*](https://www.routledge.com/Mathematical-Statistics-Basic-Ideas-and-Selected-Topics-Volume-I-Second/Bickel-Doksum/p/book/9781498723800) by Bickel and Doksum.

+ For more information about how to handle missing data, see [*Statistical Analysis with Missing Data*](https://www.wiley.com/en-us/Statistical+Analysis+with+Missing+Data,+3rd+Edition-p-9780470526798) by Little and Rubin.

+ The smooth density curve is covered in greater detail in [*Density Estimation for Statistics and Data Analysis*](https://www.routledge.com/Density-Estimation-for-Statistics-and-Data-Analysis/Silverman/p/book/9780412246203) by Silverman.
+ The smooth density curve is covered in detail in [*Density Estimation for Statistics and Data Analysis*](https://www.routledge.com/Density-Estimation-for-Statistics-and-Data-Analysis/Silverman/p/book/9780412246203) by Silverman.

+ For more information on color palettes see Brewer's [ColorBrewer2.0](https://colorbrewer2.org/).
+ For more information on color palettes see Brewer's online [ColorBrewer2.0](https://colorbrewer2.org/).

+ An in-depth treatment of loss functions can be found in Chapter 12 of [*All of Statistics: A Concise Course in Statistical Inference*](https://doi.org/10.1007/978-0-387-21736-9) by Wasserman.
+ An in-depth treatment of loss functions and risk can be found in Chapter 12 of [*All of Statistics: A Concise Course in Statistical Inference*](https://doi.org/10.1007/978-0-387-21736-9) by Wasserman.

+ See [Statistical Calibration: A Review](https://doi.org/10.2307/1403690) by Osborne for more on calibration.

+ Chapter 10 in Fox gives an informative treatment of vector geometry of least squares.

+ Chapter 13 in Fox and Chapter 10 in James et al cover Principal Components.

+ Chapter 14 in Fox covers the maximum likelihood approach to logistic regression.

+ Chapter 4 in James et al covers sensitivity and specificity in more detail.

+ For practice with regular expressions there are many on-line resources such as the W3 Schools tutorial [Python RegEx] (https://www.w3schools.com/python/python_regex.asp), regular expression checkers like [Regular Expressions 101](https://regex101.com/), introductions to the topics as with [An introduction to regular expressions](https://www.oreilly.com/content/an-introduction-to-regular-expressions/) by Nield, and texts like [*Mastering Regular Expressions*](https://dl.acm.org/doi/10.5555/1209014) by Friedl.

Regular expressions

PCAA

netCDF

Parquet

http REST
+ For an online tutorial on how to work with netCDF climate data see [The Beauty of NetCDF](https://www.youtube.com/watch?v=UvNBnjiTXa0)
by Tompkins.

Risk
+ There are many resources on web services, such as HTTP and REST. Some accessible introductory material can be found at [*RESTful Web Services*](https://dl.acm.org/doi/10.5555/1406352)
by Richardson and Ruby.

CV
+ For more on broken-stick regression see [Bent-Cable Regression Theory and Applications](https://doi.org/10.1198/016214505000001177) by Chiu, Lockhart and Routledge.

Broken stick regression
+ For an interesting read, see Andrew Ng's [interview](https://spectrum.ieee.org/andrew-ng-xrays-the-ai-hype) on the gap between test sets and real world use.

Polynomial regression
+ Chapter 7 of James et al introduces polynomial regression using orthogonal polynomials.

Bias-variance decomposition
+ Information about rank tests and other nonparametric statistics can be found in [*Nonparametric Rank Tests*](https://doi.org/10.1007/978-3-642-04898-2_417_) by Hettmansperger.

Rank tests
+ The [The ASA Statement on p-Values: Context, Process, and Purpose](https://doi.org/10.1080/00031305.2016.1154108) by Wasserstein and Lazar provides valuable insights into how to interpret $p$-values. Additionally, the topic of p-hacking is addressed in [The Statistical Crisis in Science](https://doi.org/10.1511/2014.111.460) by Gelman and Loken.

Faraway cautions
+ For a fun explanation of confounding variables see the [xkcd cartoon](https://www.explainxkcd.com/wiki/index.php/2560:_Confounding_Variables) and its explanation.

P-value ASA
+ For more on XML, we recommend [*XML and Web Technologies for Data Sciences with R*](https://doi.org/10.1007/978-1-4614-7900-0) by Nolan and Temple Lang.

P-hacking
+ For more on the technique for simple models to use in the field, see [The lost art of nomography](https://deadreckonings.files.wordpress.com/2008/01/nomography.pdf) by Doerfler.

Prediction intervals
+ Simpson's paradox

AB testing
+ Weighted Regression

Donkey field
+ Reproducible research

Data privacy
+ An informative talk by Ramdas on bias, Simpson's paradox, p-hacking, and other topics see the [screencast](https://www.youtube.com/watch?v=wGcjGH-zIL4) and [slides](https://drive.google.com/file/d/0B7gkaDYGT5X5c245RV93MVRRSjQ/view?resourcekey=0-8nQDM50Tta2SuLkFqAXEqQ).


0 comments on commit bb5001e

Please sign in to comment.