Skip to content
Gary Hollis edited this page Jun 5, 2021 · 18 revisions

Welcome to cl-ana!

cl-ana is a Free (GPL) Common Lisp library for data analysis designed to be easy to understand, extend, and use in full or in part, while doing so in a Lispy way. Its primary use so far has been analyzing particle accelerator data, so there is a strong bias towards supporting larger (i.e. not fitting in memory) datasets (I work with ~10 TB on a regular basis).

This wiki serves as the tutorial/user's manual & guide.

Features

  • Tabular data analysis: Datasets are represented by tables, and can be written to/read from a variety of formats including HDF5, ntuples (like GSL and PAW), and Comma Separated Value (CSV) files. Foreign types are converted automatically into Lisp types and vice versa for tables using foreign storage formats, e.g. HDF5, ntuples. Analysis can be done in a Lispy way using table-reduce or do-table with familiar syntax.
  • Histogramming/Binned data analysis: Binned data analysis is supported via contiguous and sparse histograms, complete with integration/projection, arithmetic operations, and support for functional techniques (map, reduce, filter).
  • Visualization/Plotting: gnuplot is used as the backbone for plotting. Generic functions are used for almost everything to do with plotting, so it is easy to extend the existing plotting functions to allow for custom objects/types to be plotted however the user wants. Methods already exist for plotting ordinary Lisp functions, dataset samples as alists, histograms (1-D and 2-D), and strings as formulae (e.g. "sin(x)").
  • Fitting: Nonlinear least squares fitting is provided by placing a nicer front-end to GSLL's interface to the GNU Scientific Library's fitting abilities. Lisp functions are used for the fit functions, and objects to be fitted against need only a single method defined for them to be immediately useable with the fitting capabilities. Histograms and data samples as alists already have methods defined for them.
  • Generic Mathematics: Common Lisp's math functions are not extendable, so cl-ana provides extendable versions. Built on top of this are error propogation and quantities (values with units, e.g. 5 meters) which are useable simultaneously (e.g. (* (+- 1 0.3) :meter) represents 1 +- 0.3 meters). In addition, sequences have methods defined which treat sequences of arbitrary depth (e.g. lists of lists of ...) as tensors, with element-wise versions of all the generic math functions already defined for them (acts much like MATLAB/GNU Octave). Basic linear algebra and Lorentz transformations are also supported (since I need them for physics), but these are not core components of cl-ana.
  • Dependency-Oriented Programming: The higher level of cl-ana supports a paradigm of programming I call Dependency-Oriented Programming (DOP) in which the programmer writes recipes for result targets as opposed to explicit function calls or an imperative procedure. (GNU Make is an example of this paradigm.) DOP excels whenever you frequently need to recalculate a large number of quantities in response to changes in source data or in definitions, as well as whenever your project tends to develop in a non-linear fashion or dependencies are not yet well understood in the beginning of the project. The combination of cl-ana's DOP with Emacs+SLIME provides a powerful and extensible data analysis software suite.

Installation and tutorial/user's guide

To continue on to installation and the tutorial/users guide: click tutorial.