Skip to content

Chapter 15: Stream processing and incremental IO

raygit edited this page Sep 24, 2014 · 12 revisions

Contents

  • Notes: Chapter notes and links to further reading related to the content in this chapter
  • FAQ: Questions related to the chapter content. Feel free to add questions and/or answers here related to the chapter.

Notes

As mentioned in the text, an I/O monad is a kind of lowest common denominator for embedding externally-interpreted effects in a pure language--the model when programming within IO is much the same as ordinary imperative programming. Hence, ever since the IO monad was first introduced in the early 1990s, functional programmers have been interested in finding more compositional ways of assembling programs that talk to the external world.

For a while, lazy I/O was still quite commonly used, despite its problems. Oleg Kiselyov, a prominent Haskell programmer and researcher, was very vocal in pointing out the problems with lazy I/O, and popularized the concept of iteratees, which is an early "ancestor" of the library we developed in this chapter. For more background, see Oleg's page on stream processing. For a gentler exposition to iteratees, also see "Iteratee: Teaching an old fold new tricks" in Issue 16 of The Monad Reader. Oleg's site is a treasure trove of resources covering various aspects of FP, we highly recommend spending some time there.

In the past 5 years or so, there have been a number of variations developed on the basic idea of iteratees, generally aimed at making simpler to use and/or more expressive libraries. In Haskell, see the conduit and pipes packages, as well as Edward Kmett's machines package, which is the Haskell library which most closely related to the one in this chapter.

There's been some pushback in the Haskell community that the conduit and pipes packages are overly complicated, and this has led to the development of the io-streams package. The library is simpler in the sense that it is specialized to IO and bakes certain features directly into the basic stream types used, but in our opinion, this library isn't really a full replacement for a library like conduit, pipes, or machines. An important goal of a stream processing library is to allow for pure stream processing logic (often a majority of processing logic) to be defined separately from any I/O. Although having a more convenient abstraction for I/O streams is useful, more general purpose, abstract libraries are still important.

In the Scala world, the scalaz-stream library developed out of work on machines and the library developed here. Prior to that, there were ports of iteratees in the core scalaz library.

Functional reactive programming

Streaming processing and incremental I/O might not seem to have much to do with UI programming, but the problems have similarities. Functional Reactive Programming (FRP) originated in the 1990s with work done by Conal Elliott, Paul Hudak, and others (see Elliott's list of publications and the 1997 paper Functional Reactive Animation by Elliott and Hudak). FRP is often put forward as a solution to the problem of describing interactive UIs in a functional way.

The FRP research has developed somewhat in parallel to the various approaches to streaming I/O mentioned above. According to Elliott, what uniquely identifies FRP is the use of continuous time, and on providing a simple, precise denotation for the data types and various combinators. An FRP library is based around two types of signals:

  • Behavior[A]: a time-varying A value, having a denotation Time => A.
  • Event[A]: a discrete, time-varying sequence of A values, having a denotation of List (Time, A).

These types (and their denotations) are deliberately abstract, which means that implementations are free to expose different sets of primitive combinators, and implementations are free to use very different implementation strategies, depending on what primitives are exposed to users. See the chapter notes for chapter 9 for more about this style of design. For instance, the algebra might include the following functions:

  • def foldP[B,A](e: Event[A], z: B)(f: (B,A) => B): Event[B] for left-folding the sequence of values represented by an Event.
  • def sample[A,B](e: Event[A], b: Behavior[B]): Event[B] for sampling from a continuous behavior whenever an Event emits a value

Each of these operations can be given a clear interpretation in terms of the denotations of Event and Behavior, but implementations may use some more interesting representation of these types to facilitate efficient interpretation.

The FRP use of continuous time is important for much the same reasons that non-strictness is important. We saw in chapter 5 how non-strictness let us write more modular code, by decoupling the description of a computation (which may be infinite) from its evaluation. In the same way, continuous time lets us decouple the description of a time-varying program from any sampling or discretization that occurs at the end of the day when running our program.

In some ways, the denotations for Event and Behavior are a little too flexible--there is no enforcement of causality, for instance--a Behavior[A] => Behavior[B] could in priciple let the value of the output behavior at time t depend on the value of the input behavior at time t + k, essentially "looking into the future" to determine the value at the present. Elliott discusses some of these problems in this post. The flexibility of the semantic model also means it is not immediately clear what actual algebra should be exposed to the programmer--we clearly want something expressive, but also limited "in the right ways" such that it is possible to implement efficiently. This is a challenging design problem in and of itself.

FRP is a deep area within functional programming. Here are just a few links to learn more:

  • Push-pull FRP by Elliott provides a nice, modern introduction to the ideas of FRP and discusses issues in crafting an efficient implementation. We also like Heinrich Apfelmus' blog, which has a number of posts talking about FRP and various implementation issues.
  • Arrowized FRP (AFRP) arose in part because of the difficulties in formulating efficient implementations of the FRP model. In AFRP, behaviors and events are not first class, instead we have first-class signal transformers, generally based on some variation of the Arrow algebra. See The Reactive Arcade, which has an introduction to Yampa, an AFRP system, and also Causal Commutative Arrows and their Optimization, which discusses ways of optimizing AFRP for efficient implementation.
  • With both arrowized and traditional FRP, there are questions about the best way to allow for various forms of context-sensitivity and dynamic switching between signals (sometimes called "dynamic event switching"). The need for this can arise in many situations, but, for instance, it can be used for modeling UIs in which the user can add and remove new UI elements by interacting with existing UI elements on the page. In traditional FRP, use of dynamic event switching can make it easy to introduce time leaks, due to accidentally retaining the full history of an Event--Heinrich Apfelmus discusses this issue here. And in AFRP, since the signals themselves are not first class, it isn't immediately obvious what the best way is to represent and interact with a dynamic, changing set of signals (though this issue has been also addressed in the AFRP line of research, see Antony Courtney's thesis pg 123, also the Reactive Arcade and FRP, continued papers).

Over time, the term "FRP" has been somewhat diluted, and the term is sometimes incorrectly used to refer to systems with discrete time, and even decidedly non-functional libraries making use of explicit callbacks and side effects!

FAQ

Feel free to add questions (and answers) here relevant to the chapter.

Considering that the book is now in final print, i'm not sure where the errata should be held but this page seems like a good place to start so, here goes ...

Q1) At Chapter 15, Section 15.3.3 which is page 285 of the final print of the ebook, i am following the book diligently and noticed a typographical error and the code reads:

case class Await[A,O](
req: Is[I]#f[A], recv: Either[Throwable,A] => Process[Is[I]#f,O]
) extends Process[Is[I]#f,R]

which should read

case class Await[A,O](
req: Is[I]#f[A], recv: Either[Throwable,A] => Process[Is[I]#f,O]
) extends Process[Is[I]#f,O] <--- the type parameter should be O instead of R