adding tons of referrernces

DS-100 · May 27, 2023 · bb5001e · bb5001e
1 parent 752f611
commit bb5001e
Showing 1 changed file with 42 additions and 43 deletions.
diff --git a/content/additional_resources.md b/content/additional_resources.md
@@ -1,97 +1,96 @@
 (extra_reading)=
 # Additional Material 
 
-We cover a lot of topics in this book, and if you want to learn more about any of them, we have collected a list of additional resources, including textbooks and online tutorials. 
+We cover a lot of topics in this book, and if you want to learn more about them, we have collected a list of additional resources that include textbooks and online tutorials. 
 
-For an overview of the larger themes in this book see:
+More in-depth treatments of the larger themes in this book can be found in the following resources.
 
-+ [*Sampling: Design and Analysis*](https://doi.org/10.1201/9780429298899) by Lohr for topics in scientific sampling;
++ The sampling topics introduced in this book and several more, can be found in [*Sampling: Design and Analysis*](https://doi.org/10.1201/9780429298899) by Lohr. Lohr also contains a treatment of the population, access frame, sampling methods, and sources of bias.  
 
-+ [*Statistics*](https://wwnorton.com/books/Statistics/) by Freedman, Pisani, and Purves is useful for introductory statistics related to the urn model;
++ For an introductory treatment of the urn model, confidence intervals, and hypothesis tests, we recommend [*Statistics*](https://wwnorton.com/books/Statistics/) by Freedman, Pisani, and Purves.
 
-+ [*Probability*](https://doi.org/10.1007/978-1-4612-4374-8) by Pitman and [*Introduction to Probaqbility*](https://doi.org/10.1201/b17221) by Hwang and Blitzstein for a more mathematical treatment of probability;
++ A more mathematical treatment of probability that is still introductory we suggest [*Probability*](https://doi.org/10.1007/978-1-4612-4374-8) by Pitman and [*Introduction to Probability*](https://doi.org/10.1201/b17221) by Hwang and Blitzstein.
 
-+ [*Principles of Data Wrangling: Practical Techniques for Data Preparation*](https://www.oreilly.com/library/view/principles-of-data/9781491938911/) by Rattenbury, Hellerstein, Heer, Kandel, and Carreras for more on data wrangling; 
++ A resource for data wrangling is [*Principles of Data Wrangling: Practical Techniques for Data Preparation*](https://www.oreilly.com/library/view/principles-of-data/9781491938911/) by Rattenbury, Hellerstein, Heer, Kandel, and Carreras. Many of the organizational topics of wrangling stem from this resource. 
 
 + [* *]() by for Pandas
 
-+ [* *]() by for SQL
++  SQL [*The Essence of Databases*](https://dl.acm.org/doi/book/10.5555/274800) by Roland. W3 School [Introduction to SQL](https://www.w3schools.com/sql/sql_intro.asp)
 
-+ [*Exploratory Data Analysis*](https://archive.org/details/exploratorydataa00tuke_0) by Tukey for EDA;
++ The original test by Tukey,[*Exploratory Data Analysis*](https://archive.org/details/exploratorydataa00tuke_0) offers an introduction to the topic. A more modern treatment can be found in XXXX.
 
-+ [*Fundamentals of Data Visualization: A Primer on Making Informative and Compelling Figures*](https://clauswilke.com/dataviz/) by Wilke for more on visualization;
++ See [*Fundamentals of Data Visualization: A Primer on Making Informative and Compelling Figures*](https://clauswilke.com/dataviz/) by Wilke for more on visualization. Our guidelines do not entirely match Wilke's but they come close and it's helpful to see a variety of opinions on the topic.
 
-+ [*Linear Models with Python*](https://julianfaraway.github.io/LMP/) by Faraway, [*Applied Regression Analysis and Generalized Linear Models*](https://us.sagepub.com/en-us/nam/applied-regression-analysis-and-generalized-linear-models/book237254) by Fox, [*An Introduction to Statistical Learning: With Applications in Python*](https://www.statlearning.com/) by James, Witten, Hastie, Tibshirani, and Taylor, and [*Applied Linear Regression*](https://doi.org/10.1002/0471704091) by Weisberg for more on modeling, transformations, bootstrap, and regularization. 
++ The many topics on modeling, including transformations, one-hot encoding, model-selection, cross-validation, and regularization can be found in several sources. We recommend: [*Linear Models with Python*](https://julianfaraway.github.io/LMP/) by Faraway, [*Applied Regression Analysis and Generalized Linear Models*](https://us.sagepub.com/en-us/nam/applied-regression-analysis-and-generalized-linear-models/book237254) by Fox, [*An Introduction to Statistical Learning: With Applications in Python*](https://www.statlearning.com/) by James, Witten, Hastie, Tibshirani, and Taylor, and [*Applied Linear Regression*](https://doi.org/10.1002/0471704091) by Weisberg.  
 
-+ [*Mathematical Statistics and Data Analysis*](https://www.cengage.com/c/mathematical-statistics-and-data-analysis-3e-rice/9780534399429/) by Rice for more on confidence intervals and testing.
++ A more formal treatment of confidence intervals, prediction intervals, testing, and the bootstrap can be found in [*Mathematical Statistics and Data Analysis*](https://www.cengage.com/c/mathematical-statistics-and-data-analysis-3e-rice/9780534399429/).
 
-+ [*Monte Carlo theory, methods and examples*](https://artowen.su.domains/mc/) by Owen to learn more about simulation;
++ Owen's online text, [*Monte Carlo theory, methods and examples*](https://artowen.su.domains/mc/) provides a solid introduction to simulation.
 
-+ [*Programming Collective Intelligence*](https://www.oreilly.com/library/view/programming-collective-intelligence/9780596529321/) by Segaran for more on optimization.
++ [*Programming Collective Intelligence*](https://www.oreilly.com/library/view/programming-collective-intelligence/9780596529321/) by Segaran covers the topic of optimization.
 
-+ [*Applied Text Analysis with Python: Enabling Language-Aware Data Products with Machine Learning*](https://www.oreilly.com/library/view/applied-text-analysis/9781491963036/) by Bengfort, Bilbro, and Ojeda for more on text analysis.
++ See [*Applied Text Analysis with Python: Enabling Language-Aware Data Products with Machine Learning*](https://www.oreilly.com/library/view/applied-text-analysis/9781491963036/) by Bengfort, Bilbro, and Ojeda for more on text analysis.
 
-In addition, we provide a list of references for several smaller topics and topics that were lightly touched upon. 
+In addition, we provide a list of resources for many smaller topics and for topics that were lightly touched upon. 
 
-+ To learn more about the interplay between questions and data, we recommend [Questions, Answers, and Statistics](https://iase-web.org/documents/papers/icots2/Speed.pdf) by Speed. 
++ To learn more about the interplay between questions and data, we recommend [Questions, Answers, and Statistics](https://iase-web.org/documents/papers/icots2/Speed.pdf) by Speed. In addition Leek and Peng connect questions with the type of analysis in [What is the question? Mistaking the type of question being considered is the most common error in data analysis](https://doi.org/10.1126/science.aaa6146). 
 
-+ To learn more about how to analyze data with a time domain, we refer you to [*Time Series Analysis and Its Applications*](https://doi.org/10.1007/978-3-319-52452-8) by Shumway and Stoffer.
++ The broad topic of how to analyze time-series data, we refer you to [*Time Series Analysis and Its Applications*](https://doi.org/10.1007/978-3-319-52452-8) by Shumway and Stoffer.
 
-+ Ethics
++ To learn more about the human contexts and ethics of data, see the [HCE Toolkit](https://data.berkeley.edu/hce-toolkit) and Tuskegee University's [National Center for Bioethics in Research and Health Care](https://www.tuskegee.edu/about-us/centers-of-excellence/bioethics-center).
+
++ Data privacy - SAM?
 
 + A proof that the median minimizes absolute error can be found in [*Mathematical Statistics: Basic Ideas and Selected Topics Volume I*](https://www.routledge.com/Mathematical-Statistics-Basic-Ideas-and-Selected-Topics-Volume-I-Second/Bickel-Doksum/p/book/9781498723800) by  Bickel and Doksum. 
 
 + For more information about how to handle missing data, see [*Statistical Analysis with Missing Data*](https://www.wiley.com/en-us/Statistical+Analysis+with+Missing+Data,+3rd+Edition-p-9780470526798) by Little and Rubin.
 
-+ The smooth density curve is covered in greater detail in [*Density Estimation for Statistics and Data Analysis*](https://www.routledge.com/Density-Estimation-for-Statistics-and-Data-Analysis/Silverman/p/book/9780412246203) by Silverman. 
++ The smooth density curve is covered in detail in [*Density Estimation for Statistics and Data Analysis*](https://www.routledge.com/Density-Estimation-for-Statistics-and-Data-Analysis/Silverman/p/book/9780412246203) by Silverman. 
 
-+ For more information on color palettes see Brewer's [ColorBrewer2.0](https://colorbrewer2.org/).
++ For more information on color palettes see Brewer's online [ColorBrewer2.0](https://colorbrewer2.org/).
 
-+ An in-depth treatment of loss functions can be found in Chapter 12 of [*All of Statistics: A Concise Course in Statistical Inference*](https://doi.org/10.1007/978-0-387-21736-9) by Wasserman.
++ An in-depth treatment of loss functions and risk can be found in Chapter 12 of [*All of Statistics: A Concise Course in Statistical Inference*](https://doi.org/10.1007/978-0-387-21736-9) by Wasserman.
 
 + See [Statistical Calibration: A Review](https://doi.org/10.2307/1403690) by Osborne for more on calibration. 
 
 + Chapter 10 in Fox gives an informative treatment of vector geometry of least squares.
 
++ Chapter 13 in Fox and Chapter 10 in James et al cover Principal Components.
+
 + Chapter 14 in Fox covers the maximum likelihood approach to logistic regression. 
 
 + Chapter 4 in James et al covers sensitivity and specificity in more detail. 
 
++ For practice with regular expressions there are many  on-line resources such as the W3 Schools tutorial [Python RegEx] (https://www.w3schools.com/python/python_regex.asp),  regular expression checkers like [Regular Expressions 101](https://regex101.com/), introductions to the topics as with [An introduction to regular expressions](https://www.oreilly.com/content/an-introduction-to-regular-expressions/) by Nield, and texts like [*Mastering Regular Expressions*](https://dl.acm.org/doi/10.5555/1209014) by Friedl.
 
-Regular expressions
-
-PCAA
-
-netCDF 
-
-Parquet
-
-http REST
++ For an online tutorial on how to work with netCDF climate data see [The Beauty of NetCDF](https://www.youtube.com/watch?v=UvNBnjiTXa0)
+by Tompkins.  
 
-Risk
++ There are many resources on web services, such as HTTP and REST. Some accessible introductory material can be found at [*RESTful Web Services*](https://dl.acm.org/doi/10.5555/1406352)
+by Richardson and Ruby.  
 
-CV
++ For more on broken-stick regression see [Bent-Cable Regression Theory and Applications](https://doi.org/10.1198/016214505000001177) by Chiu, Lockhart and Routledge. 
 
-Broken stick regression
++ For an interesting read, see Andrew Ng's [interview](https://spectrum.ieee.org/andrew-ng-xrays-the-ai-hype) on the gap between test sets and real world use. 
 
-Polynomial regression
++ Chapter 7 of James et al introduces polynomial regression using orthogonal polynomials. 
 
-Bias-variance decomposition
++ Information about rank tests and other nonparametric statistics can be found in [*Nonparametric Rank Tests*](https://doi.org/10.1007/978-3-642-04898-2_417_) by Hettmansperger.  
 
-Rank tests
++ The [The ASA Statement on p-Values: Context, Process, and Purpose](https://doi.org/10.1080/00031305.2016.1154108) by Wasserstein and Lazar provides valuable insights into how to interpret $p$-values. Additionally, the topic of p-hacking is addressed in [The Statistical Crisis in Science](https://doi.org/10.1511/2014.111.460) by Gelman and Loken. 
 
-Faraway cautions
++ For a fun explanation of confounding variables see the [xkcd cartoon](https://www.explainxkcd.com/wiki/index.php/2560:_Confounding_Variables) and its explanation.
 
-P-value ASA
++ For more on XML, we recommend [*XML and Web Technologies for Data Sciences with R*](https://doi.org/10.1007/978-1-4614-7900-0) by Nolan and Temple Lang.
 
-P-hacking
++ For more on the technique for simple models to use in the field, see [The lost art of nomography](https://deadreckonings.files.wordpress.com/2008/01/nomography.pdf) by Doerfler.
 
-Prediction intervals
++ Simpson's paradox
 
-AB testing
++ Weighted Regression
 
-Donkey field
++ Reproducible research
 
-Data privacy
++ An informative talk by Ramdas on bias, Simpson's paradox, p-hacking, and other topics see the [screencast](https://www.youtube.com/watch?v=wGcjGH-zIL4) and [slides](https://drive.google.com/file/d/0B7gkaDYGT5X5c245RV93MVRRSjQ/view?resourcekey=0-8nQDM50Tta2SuLkFqAXEqQ).