Skip to content

Chapter 3: Functional data structures

jimshowalter edited this page Mar 29, 2021 · 27 revisions

Contents

  • Notes: Chapter notes and links to further reading related to the content in this chapter
  • Tests: Basic assert-based tests are available for functions sum, product and append, as well as for the functions developed in exercises 3.2 through to 3.10
  • FAQ: Questions related to the chapter content. Feel free to add questions and/or answers here related to the chapter.

Notes

The Wikipedia article on algebraic data types has further discussion about the theory behind ADTs.

Linked lists

The singly-linked list (also called a cons list) we cover in this chapter is one of the simplest purely functional data structures. It has good performance for linear traversal, but it's not very good for random access or list concatenation.

Random access vectors and finger trees

A better structure for random access is Vector in the standard library. It provides constant time (or nearly enough to constant time) access to arbitrary elements in the structure. Vectors can be seen as a specialization of the idea of a Finger Tree.

Difference lists

The Difference List can provide efficient (constant time) concatenation of lists. The idea is that instead of having a list, we simply compose functions that operate on lists. We can compose functions in constant time, and pass an actual list to the composite function as late as possible.

Cost amortization

Reasoning about complexity of algorithms works a bit differently in a persistent (immutable) setting. We often make use of the fact that the cost of an expensive operation can be amortized over a vastly larger number of very inexpensive operations. An example of this kind of amortization is the cost of the concatenation operation on difference lists (see above). Operating on an actual list takes O(n) time, but we can spread this cost over a number of operations that we compose using the DList. This is an example of cost amortization.

Purely Functional Data Structures

Chris Okasaki's book Purely Functional Data Structures (Cambridge University Press, 1999; ISBN: 0521663504) gives a thorough treatment of amortization. It is also the canonical text on efficient data structures, both classic and new, from the perspective of functional programming. We highly recommend picking up this book if you're interested in data structures. The dissertation that the book is based on is also available from Carnegie Mellon University's website.

Rose trees

The tree structure that we introduce at the end of the chapter is called a Rose Tree. It is a nonempty multi-way tree that contains data at the nodes rather than at the leaves.

The algebra of data types

The "algebraic" in algebraic data types means something specific. This is a reference to the fact that such data types are composed of sums and products of other types. More specifically, data types form a seminearring.

See the following links:

Zippers

Since an algebraic data type is a type-level function involving sums and products, we can take the derivative of such a function, yielding a data structure called a zipper. The zipper for a given structure is like the original structure, but with a movable "focus" or pointer into the structure. This can be used to insert, remove, and modify elements under the focus.

For example, a list zipper consists of one element under focus together with two lists: one enumerating all the elements to the left of the focus, and another enumerating all the elements to the right of the focus. The focus can be moved left or right (like a zipper on a piece of clothing) and elements will move through the focus. The element under focus can then be removed or modified, and we can insert a new element by consing onto the lists to its left or right.

Type inference

Scala's type system is complicated by the presence of path-dependent types and subtyping. As a result Scala has only very limited, local type inference.

Whereas other programming languages like Haskell or ML may have some species of Hindley-Milner type inference, Scala has what's called "flow-based" inference. That is, type information "flows" from arguments to results. Given the types of the arguments, Scala is able to infer the result type of a function, unless that function is recursive. This also goes for values that are not functions, which can be considered 0-argument functions for the purpose of this discussion.

We gain type inference benefits from grouping arguments into two argument lists, as in xs.foldRight(0)(_ + _). Type information flows from the first argument list to the second when inferring type arguments to a function call. Note that no inference benefit can be gained from adding more than two argument lists to a function. When inferring the type arguments to a function call, Scala's typer does not consult any argument lists beyond the first.

See the Scala Language Specification for more information on Scala's type inference. Specifically sections 4.6.4 (Method Return Type Inference), 6.26.4 (Local Type Inference), and 8.3 (Type Parameter Inference In Patterns).

Links

Tests

Basic assert-based tests are available for sum, product and append, as well as for functions the reader is invited to write in exercises 3.2 through to 3.10.

The tests can be found here: https://github.com/philipschwarz/fpinscala/blob/master/exercises/src/main/scala/fpinscala/datastructures/List.scala#L91)

There is a test for each function written in the exercises. The test for function x is test_x. E.g. the test for function foldRight is test_foldRight.

To run a test, e.g. test_foldRight, just run test method fpinscala.datastructures.List.test_foldRight (e.g. in the IntelliJ IDEA Scala Console, evaluate fpinscala.datastructures.List.test_foldRight).

If a tests is successful, then no output is produced.

If any one of the test's assertions fails, then the message associated with the first such failing assertion is output. e.g.

scala> fpinscala.datastructures.List.test_length java.lang.AssertionError: assertion failed: length of single-element list is one ...

If you run a test method before implementing the function that it tests, e.g. foldLeft, then you'll get output of the following form:

scala> fpinscala.datastructures.List.test_foldLeft java.lang.RuntimeException: todo ...

Also provided is a main test method, called test, that calls all the subordinate test methods.

FAQ

Why do you declare Nil as a case object instead of a case class within the definition of our functional sealed trait List?

case object is more appropriate because there only ever needs to exist one empty list: Nil. case class Nil will actually cause an error because case classes require an explicit parameter list:

<scala> case class Nil
<console>:1: error: case classes without a parameter list are not allowed;
use either case objects or case classes with an explicit `()' as a parameter list.
case class Nil
              ^

Below is an alternate, invariant formation of List. case class is actually required here because Nil takes a type parameter (so there is now one empty list Nil[A] for each type A):

sealed trait List[A]
case class Nil[A]() extends List[A]
case class Cons[A](head: A, tail: List[A]) extends List[A]