Skip to content

Users and Citations

Marco De Nadai edited this page Aug 4, 2016 · 37 revisions

References using statsmodels

The following is a collection of references that mention statsmodels, in most cases statsmodels was used for part of the analysis.

Journal articles

2014

  • Kubilius, Jonas. 2014. “A Framework for Streamlining Research Workflow in Neuroscience and Psychology.” Frontiers in Neuroinformatics 7. doi:10.3389\/fninf.2013.00052. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3894454

    Quote: "In recent years, Python and its scientific packages emerged as a promising platform for researchers in neuroscience and psychology, including PsychoPy for running experiments (Peirce, 2007, 2009), pandas 1 and statsmodels 2 for data analysis, PyMVPA (Hanke et al., 2009) and scikit-learn (Pedregosa et al., 2011) for machine learning data analyses, and NeuroDebian (Halchenko and Hanke, 2012) as an overarching platform providing an easy deployment of these tools."

    Comment: nothing specific to statsmodels, as far as I can see, but points out the platform packages for python in science, specifically neuroscience and psychology.

2013

  • Amsterdamer, Yael, Yael Grossman, Tova Milo, and Pierre Senellart. 2013. “Crowd Mining.” In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, 241–52. SIGMOD ’13. New York, NY, USA: ACM. doi:10.1145\/2463676.2465318. http://doi.acm.org/10.1145/2463676.2465318 .

    Quote: "External libraries used were StatsModels4, containing an implementation of multivariate Gaussian integration through MC sampling [14], and the Orange data mining library [9]."

    Comment: Reference [14] in the quote is for Genz who wrote the underlying implementation used through scipy

  • Beltrame, Luca, Luca Bianco, Paolo Fontana, and Duccio Cavalieri. 2013. “Pathway Processor 2.0: A Web Resource for Pathway-Based Analysis of High-Throughput Data.” Bioinformatics 29 (14): 1825–26. doi:10.1093/bioinformatics/btt292.

    Comment: They mention that they use FDR from statsmodels in the supplementary material. The main article says that they use Python and R as backend.

  • Bucci, L., R. Ostan, E. Giampieri, E. Cevenini, E. Pini, M. Scurti, R. Vescovini, et al. 2014. "Immune Parameters Identify Italian Centenarians with a Longer Five-Year Survival Independent of Their Health and Functional Status." Experimental Gerontology. doi:10.1016/j.exger.2014.01.023. http://www.sciencedirect.com/science/article/pii/S0531556514000357 .

    Quote: "All analyses were executed using R software, Python statistical libraries (statsmodels, pandas and scipy packages) and SPSS 19.0 for windows (SPSS Inc., Chicago, IL, USA)."

  • Dabdoub, S. M., A. A. Tsigarida, and P. S. Kumar. 2013. "Patient-Specific Analysis of Periodontal and Peri-Implant Microbiomes." Journal of Dental Research 92 (12 suppl): 168S-175S. doi:10.1177/0022034513504950.

    Quote: "Single and multiple comparisons of distributions were carried out with the statistical facilities provided by JMP (SAS Institute Inc.), as well as the Python libraries SciPy, pandas, and statsmodels."

  • Doba, Karyn, Laurent Pezard, Guillaume Berna, Jean Vignau, and Jean-Louis Nandrino. 2013. “Affiliative Behaviour and Conflictual Communication during Brief Family Therapy of Patients with Anorexia Nervosa.” PLoS ONE 8 (8): e70389. doi:10.1371/journal.pone.0070389.

    Quote: under "Acknowledgments" "Most of the computation of this article was done using free software and we are indebted to the developers and Debian maintainers of the following packages: TeX Live, vim, python, pgf/tikz, python-numpy, pandas, scikits.statsmodels, joblib to mention only a few."

  • Fritsch, Virgile, Gaël Varoquaux, Benjamin Thyreau, Jean-Baptiste Poline, and Bertrand Thirion. 2012. “Detecting Outliers in High-Dimensional Neuroimaging Datasets with Robust Covariance Estimators.” Medical Image Analysis 16 (7): 1359–70. doi:10.1016/j.media.2012.05.002.

    Quote: "We removed the effect of gender, handedness and acquisition center by using a robust regression based on Mestimators (Huber, 2005), using the scikit.statsmodels Python package (Seabold and Perktold, 2010) implementation."

  • Gonzalez-Perez, Abel, Christian Perez-Llamas, Jordi Deu-Pons, David Tamborero, Michael P. Schroeder, Alba Jene-Sanz, Alberto Santos, and Nuria Lopez-Bigas. 2013. "IntOGen-Mutations Identifies Cancer Drivers across Tumor Types." Nature Methods 10 (11): 1081-82. doi:10.1038/nmeth.2642. http://www.nature.com/nmeth/journal/v10/n11/full/nmeth.2642.html

    Quote: "In addition to third-party (and in-house) software and data, IntOGen-mutations pipeline installation requires some Python libraries. The most important of these are the numpy and scipy scientific computing libraries and the statsmodels Python statistical library."

  • Picelli, Simone, Åsa K. Björklund, Omid R. Faridani, Sven Sagasser, Gösta Winberg, and Rickard Sandberg. 2013. “Smart-seq2 for Sensitive Full-Length Transcriptome Profiling in Single Cells.” Nature Methods 10 (11): 1096–98. doi:10.1038/nmeth.2639.

    Quote: "We estimated the Huber robust mean and s.d. (Huber’s proposal 2, implemented in python package statsmodels) and transformed each observation into s.d. from the mean ([observation-mean]/stdev)."

  • Sturgill, David, John H. Malone, Xia Sun, Harold E. Smith, Leonard Rabinow, Marie-Laure Samson, and Brian Oliver. 2013. “Design of RNA Splicing Analysis Null Models for Post Hoc Filtering of Drosophila Head RNA-Seq Data with the Splicing Analysis Kit (Spanki).” BMC Bioinformatics 14 (1): 320. doi:10.1186/1471-2105-14-320.

    Quote: "FDR correction is performed by the Benjamini-Hochberg method implemented in the StatsModels package (Skipper Seabold, Josef Perktold, http://statsmodels.sourceforge.net/).

2012

  • Bilina, Roseline, and Steve Lawford. 2012. “Python for Unified Research in Econometrics and Statistics.” Econometric Reviews 31 (5): 558–91. doi:10.1080/07474938.2011.553573.

    Quote: "Also useful are the ‘Sage’ mathematics system (www.sagemath.org), the statsmodels Python statistics package (statsmodels.sourceforge.net), and the ‘SciPy Stats Project,’ a blog that developed out of the ‘Google Summer of Code 2009’ (www.scipystats.blogspot.com)."

    Comment: Overview of Python for econometrics, written mostly before 2010 (when statsmodels was still young)

Conference Articles

  • De Nadai, M. and Vieriu, R.~L. and Zen, G. and Dragicevic, S. and Naik, N. and Caraviello, M. and Hidalgo, C.~A. and Sebe, N. and Lepri, B. 2016. "Are Safer Looking Neighborhoods More Lively? A Multimodal Investigation into Urban Life" ACM Multimedia 2016. Forthcoming. http://arxiv.org/abs/1608.00462.

    Quote: "Most of the computation of this article was done using free software and we are indebted to the developers and maintainers of the following packages: python, pandas, scikits.statsmodels, pysal to mention only a few."

  • Buschmeier, Hendrik, and Marcin Wlodarczak. 2013. “TextGridTools: A TextGrid Processing and Analysis Toolkit for Python.” In Tagungsband Der 24. Konferenz Zur Elektronischen Sprachsignalverarbeitung (ESSV 2013). http://pub.uni-bielefeld.de/publication/2561620.

    Quote: "In recent years a wide range of data analysis libraries have evolved and matured in the Python ecosystem [4]. They include, among others, tools for data analysis (pandas), numerical (NumPy) and scientific (SciPy) computing, data visualisation (Matplotlib), statistical modelling (Statsmodels) and natural language processing (NLTK)."

    Comment: This mentions the python ecosystem, but does not use statsmodels. They have their own implementation of agreement measures, like Cohen's, Fleiss' kappa.

  • Fritsch, V., B. Da Mota, G. Varoquaux, V. Frouin, E. Loth, J.-B. Poline, and B. Thirion. 2013. “Robust Group-Level Inference in Neuroimaging Genetic Studies.” In 2013 International Workshop on Pattern Recognition in Neuroimaging (PRNI), 21–24. doi:10.1109/PRNI.2013.15.

    Quote: "We use a Python implementation of robust regression available in the statsmodels 1 library, which we optimized for our application."

  • Maggio, Martina, and Henry Hoffmann. "ARPE: A Tool To Build Equation Models of Computing Systems}." Presented as part of the 8th International Workshop on Feedback Computing}. USENIX}.

    Quote: "ARPE also contains a python script that uses the statsmodels python library to provide multivariate linear regression model OLS and statistics for the data."

Thesis

Working Papers

The following is a selective list of working papers that are interesting because of the field where statsmodels is used, or to illustrate which parts of statsmodels are used in different areas.

  • Agar, J. R. R., and P. Barmby. 2013. “M31 Globular Cluster Structures and the Presence of X-Ray Binaries.” arXiv:1308.6748 [astro-Ph]. http://arxiv.org/abs/1308.6748.

    Quote: "We used maximum likelihood estimation as implemented in the Python statsmodels package to model the probability of a cluster containing an LMXB; the results of the logistic regression analysis are given in Table 5."

  • Scheffel, Eric Michael. 2012. “Political Uncertainty in a Data-Rich Environment”. MPRA Paper. March 13. http://mpra.ub.uni-muenchen.de/37353/.

    Quote: "To check that we have not made any obvious mistakes, wherever possible we have always made sure that our results are exactly identical to those obtained from the mature Python library Statsmodels which contains a VAR procedure."

    Comment: using statsmodels as reference implementation. statsmodels is still missing several of their methods, e.g. FAVAR.

Other

  • Guerrero, Gina B., and Andrew Duchowski. 2013. “Animating Eyes” Digital Production Arts, Clemson University, CPSC 412/612 Eye Tracking Methodology and Applications http://andrewd.ces.clemson.edu/courses/cpsc412/fall13/teams/reports/group1.pdf

    Quote: "To determine if the two sets of data per eye were statistically equivalent, two one-sided tests (TOST) were done on the velocity data. <...> We used Python’s statsmodels module, which already has a function available for TOST: ttost ind()."

    Comment: This is a project report by a student and professor. It is interesting because it uses TOST in a, for me, unexpected field and application.

Books

Books with some examples using statsmodels

  • Hauck, Trent. 2013. Instant Data Intensive Apps with Pandas How-To. Packt Publishing Ltd.
  • Idris, Ivan. 2012. NumPy Cookbook. Packt Publishing Ltd.
  • McKinney, Wes. 2012. Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython. O’Reilly Media, Inc.
Clone this wiki locally