Skip to content

Universal Functions

rrjohnson edited this page Aug 1, 2016 · 20 revisions

An UFunc (Universal Function) is conceptually a function that can operate on scalars, vectors, matrices, counters and, with a little effort, your own collection types. It's much like Numpy's Universal Functions.

UFuncs makes Breeze very extensible. With little effort, you can add a new collection type (e.g. Vector) and reuse all builtin UFuncs. Or if you want to add a new numeric type that supports operations like log, you can define UFunc implementations for those operations and then use UFuncs on collections of your data type.

All code snippets at this page assume that you have imported necessary packages:

import breeze.linalg._
import breeze.math._
import breeze.numerics._

There are three UFunc types: element-wise UFuncs (like log), operators UFuncs (e.g. a+b) and reduction UFuncs (like sum).

Elementwise UFuncs

The most common UFuncs, usually inheriting from MappingUFunc. They usually take one scalar or collection and apply the underlying function to the scalar or elementwise to all elements in the collection:

scala> log(1.0)
Double = 0.0

scala> log(Complex(1.0, 1.0))
breeze.math.Complex = 0.3465735902799727 + 0.7853981633974483i

scala> log(Array(1.0, 2.0))
Array[Double] = Array(0.0, 0.6931471805599453)

scala> log(DenseMatrix((1.0, 2.0), (3.0, 4.0)))
breeze.linalg.DenseMatrix[Double] =
0.0                 0.6931471805599453
1.0986122886681098  1.3862943611198906

By default, these UFunc return result as a new object. If you want the UFunc to alter the argument, you can do it:

scala> log.inPlace(someMatrix)

Operator UFuncs

These are UFuncs that are usually not used directly. For example, OpAdd is an operator UFunc, but it's usually used as a + b, rather than OpAdd(a, b).

scala> val a = DenseVector(1.0, 2.0)

scala> val b = DenseVector(3.0, 4.0)

scala> a + b // binary operator
breeze.linalg.DenseVector[Double] = DenseVector(4.0, 6.0)

scala> operators.OpAdd(a, b)
breeze.linalg.DenseVector[Double] = DenseVector(4.0, 6.0)

scala> -a // unary operator
breeze.linalg.DenseVector[Double] = DenseVector(-1.0, -2.0)

scala> operators.OpNeg(a)
breeze.linalg.DenseVector[Double] = DenseVector(-1.0, -2.0)

Reduction UFuncs

These are UFuncs that conceptually "accumulate" collections into a smaller result. For instance, the sum UFunc predictably sums all elements together. Reduction UFuncs also work with broadcasting.

scala> val mat = DenseMatrix((1.0, 2.0, 3.0),
                             (4.0, 5.0, 6.0))

scala> sum(mat)
Double = 21.0

scala> sum(mat(::, *)) // sum of each column
breeze.linalg.DenseMatrix[Double] = 5.0 7.0 9.0

scala> mat(*, ::) + DenseVector(10.0, 20.0, 30.0)
breeze.linalg.DenseMatrix[Double] =
11.0  22.0  33.0
14.0  25.0  36.0

scala> val onesMat = DenseMatrix.ones[Double](3, 3)
breeze.linalg.DenseMatrix[Double] =
1.0  1.0  1.0
1.0  1.0  1.0
1.0  1.0  1.0

scala> mat(::, *) := DenseVector(10.0, 20.0)

scala> mat
breeze.linalg.DenseMatrix[Double] =
10.0  10.0  10.0
20.0  20.0  20.0

Provided UFuncs

Elementwise UFuncs

All of these live in the breeze.numerics package object.

  • Functions from scala.math:

    • exp: Returns e^x.
    • log: Returns the natural logarithm of x. log(b, x) is the log of x in base b!
    • log1p: Returns log(1 + x). This is more accurate than explicit log(1 + x) for x < 1.
    • sqrt: Square root.
    • sin, cos, tan, asin, acos, atan: Trigonometric functions using radians.
    • sinh, cosh, tanh, asinh, acosh, atanh: Hyperbolic functions using radians.
    • toDegrees: Converts radians to degrees.
    • toRadians: Converts degrees to radians.
    • floor: Rounds numbers to the next lowest integer.
    • ceil: Rounds numbers to the next highest integer.
    • round: Rounds numbers.
    • rint: Rounds numbers and returns a Double.
    • signum: Returns the sign of the value(s) as ±1, ±0 or NaN.
    • abs: Returns the absolute value of the argument.
    • isOdd: Returns true if the argument is equal to 1 modulo 2.
    • isEven: Returns true if the number is divisible by two.
  • Gamma function:

  • Error function and variants, commonly used as the CDF of the Gaussian distribution.

    • erf: the error function.
    • erfi: The imaginary error function with real arguments. erfi(x) = -i erf(i * x).
    • erfc: Equals 1 - erf.
    • erfinv: Returns the inverse erf.
    • erfcinv: Returns the inverse erfc.
  • Bessel function:

  • Special and utility functions:

    • sigmoid: the sigmoid function from logistic regression/neural nets. 1/(1+exp(-x))
    • I: the indicator function (1.0 if true, 0.0 if false)
    • logI: the log of the indicator function logI(true) == 0.0, logI(false) == -inf
    • lbeta: The log of the Beta function
    • clip(a, lower, upper): returns an array such that all elements are "clipped" at the range (lower, upper)

Operator UFuncs

These are UFuncs that are usually called using infix syntax (in the case of a binary operators) or prefix syntax if the case is an unary operator. Some of these functions support in-place version, i.e. you can write a :+= b instead of a := a + b.

A := B is a special operator named OpSet. It alters A in such way that it's the same as B. Note, that this is not an usual assignment (=) so A still points to the same object. Only the content of A is mutually altered.

: at the beginning of an operator means that the operation is elementwise. For example, :* is elementwise multiplication, like Matlab's .*.

Not all operators are defined for all Tensors, or for all value types. (Some reasonable things are missing. We are adding them as necessary.) Undefined operators is a compile time, not runtime, error.

All operators are defined in breeze.linalg.NumericOps trait and in breeze.linalg.operators package.

Basic Arithmetic

Name Example In-place usage
OpAdd a + b, a :+ b a += b, a :+= b
OpSub a - b, a :- b a -= b, a :-= b
OpMulMatrix a * b a *= b
OpMulScalar a :* b a :*= b
OpDiv a / b, a :/ b a /= b, a :/= b
OpSolveMatrixBy a \ b -
OpMod a % b, a :% b a %= b, a :%= b
OpPow a :^ b a :^= b
OpMulInner a dot b -
OpNeg -a -

Element-wise Equality and Comparison

OpEq OpNe OpLT OpLTE OpGT OpGTE isClose
a :== b a :!= b a :< b a :<= b a :> b a :>= b isClose(a, b)

Boolean Operations

name example in-place usage
OpAnd a & b, a :& b a &= b, a :&= b
OpOr a | b, a :| b a |= b, a :|= b
OpXor a ^^ b, a :^^ b a ^^= b, a :^^= b
OpNot !a

Reduction UFuncs

  • Basic UFuncs:

    • sum: Returns a(1) + ... + a(n).
    • product: Returns a(1) * ... * a(n).
    • softmax: Return the softmax of the collection. log(exp(a(1)) + ... exp(a(n))). Also known as logsumexp when used to sum up small probabilities in the log domain
    • any: Returns true if any element is non-zero.
    • all: Returns true if all elements are non-zero.
    • min, max: Fast implementation for performance critical loops where arr.max/arr.min is too slow.
    • argmin, argmax: Returns the index (key) of the minimum/maximum element.
    • norm: Computes the norm of an object.
    • normalize: Normalizes the argument such that its norm is 1.0. Returns value if value's norm is 0.
    • argsort: Returns a sequence of indexes (keys) sorted by value.
    • argtopk: Returns the top k indices with maximum value.
  • Linear algebra UFuncs:

    • det: Returns the determinant of the given real matrix.
    • eig: Returns triple: vector of real and imaginary parts of the eigenvalues as well as matrix of the corresponding eigenvalues.
    • inv: Returns the inverse of a given real matrix. Raises MatrixSingularException when the matrix is not invertible.
    • logdet: Computes the log of the determinant of the given real matrix and returns a pair consisting of determinant sign (1.0 or -1.0) and the computed logarithm.
    • LU: Computes the LU factorization of the given real M-by-N matrix X such that X = P * L * U where P is a permutation matrix (row exchanges). Returns a tuple consisting of a matrix A and an integer array P. The upper triangular portion of A resembles U whereas the lower one resembles L. The diagonal elements are all 1. For 0 <= i < M, each element P(i) denotes whether row i of the matrix X was exchanged with row P(i-1) during computation
    • svd: Computes singular value decomposition of given M-by-N matrix and returns an M-by-M matrix U, a vector of singular values, and a N-by-N matrix V'.
    • trace: Returns sum of diagonal elements.
  • Statistical UFuncs (you have to import breeze.stats._!):

    • mean: Returns arithmetic mean (sum divided by number of elements)
    • variance: Returns variance
    • stddev: Returns standard deviation, i.e. square root of variance
    • meanAndVariance: Returns an object with mean, variance and count fields. The last field is the number of elements.

Implementing UFuncs

Elementwise UFuncs

Let's start with the canonical way to make a UFunc. I'm going to use log as an example, for now (In Breeze, log is actually implemented as a MappingUFunc - see below).

import breeze.generic.UFunc
object log extends UFunc

The basic idea behind a UFunc is that it's a marker for others to provide implementation via implicit parameters. Almost no functionality needs to be defined inside the object itself. UFuncs define several methods, most of which are named apply and those differ just in how many arguments they take. There's also inPlace, which I'll describe later. The simplest apply method takes 1 argument and an implicit implementation argument, and forwards the argument to the implicit.

trait UFunc {
  // ...
  def apply[Arg1, Result](arg1: Arg1)(implicit impl: Impl[Arg1, Result]):Result = impl(arg1)
  def apply[Arg1, Arg2, Result](arg1: Arg1, arg2: Arg2)(implicit impl: Impl2[Arg1, Arg2, Result]):Result = {
    impl(arg1, arg2)
  }
}

The Impl type looks a little scary, but it's actually not too bad. It's an alias for another trait, named UFunc.UImpl, defined inside the companion object to UFunc.

trait UFunc {
  type Impl[Arg1, Result] = UFunc.UImpl[this.type, Arg1, Result]
  // ...
}

object UFunc {
  trait UImpl[Tag, Arg1, Result] {
    def apply(arg1: Arg1):Result
  }
}

There are several different implementation traits, one for each arity (currently up to 3). To define an implementation, you just need to put an implicit in scope. Anywhere that Scala will look for an implicit is fine (companion objects, etc.). One particularly convenient place to define it is inside the UFunc itself:

object log extends UFunc {
   implicit object implDouble extends Impl[Double, Double] {
     def apply(a: Double) = scala.math.log(a)
   }
}

Now, we can use it:

log(1.0) // == 0.0

One thing that's nice about this approach is that if we get a new type, we can just define a new Impl for it. For instance, in Complex's companion object, we define an implementation for the log of a Complex number.

object Complex {

    // ...

    implicit object implComplex extends log.Impl[Complex, Complex] {
     def apply(a: Complex) = // ...
   }
}

Now, log(Complex(1.0, 1.0)) works as well.

Making Element Wise Functions Mappable Over Collections including Vectors

We can use implicit functions to automatically add operations for collections of things. There is a special class of UFuncs called "MappingUFuncs" that work elementwise over their input (e.g. sin, cos), rather than doing some kind of reduction or aggregation (e.g. sum, mean).

The actual definition of log is a member of this class so that it can also be applied to vectors. MappingUFuncs automatically allow implementations for all Vector types, Arrays, Matrices, and Counters if the underlying value type has an implementation.

object log extends UFunc with MappingUFunc    {
   implicit object implDouble extends Impl[Double, Double] {
     def apply(a: Double) = scala.math.log(a)
}

If you would like to add a new collection type, you will need to add a CanMapValues implicit to extend the UFunc to this collection.

There's also InPlaceImpl[Arg] and InPlaceImpl2[Arg1AndSink, Arg2] for mutating operations and they work analogously.

Operators

The next step is thinking more about operators. For instance, to enable addition between two types A and B, you define an implicit of type OpAdd.Impl2:

implicit object addAandB extends OpAdd.Impl2[A, B, Result] {
   def apply(a: A, b: B) = ...
}

Now, if A extends NumericOps[A]---or if A has a conversion to NumericOps[A]---then if we say A + B, this implicit will be used to do the addition. To do mutating operators, you create a OpAdd.InPlaceImpl2[A, B] implicit instead:

implicit object addAandB extends OpAdd.InPlaceImpl2[A, B] {
   def apply(a: A, b: B) { ... /* alter a */ }
}

Note that if you want to support unary operators, you need to implement Impl1 instead of Impl2.

Reduction UFuncs

Creating reducing UFuncs like sum or mean is a little trickier than creating MappingUFuncs. In particular, you should write an implementation that uses CanTraverseValues, which is a trait for collections that can be traversed or iterated over. It's basically like a marker that supports foreach, but it has extra methods that allow some implementations to be faster than they would be otherwise.

trait CanTraverseValues[From, A] {
  /**Traverses all values from the given collection. */
  def traverse(from: From, fn: ValuesVisitor[A]): Unit
  def isTraversableAgain(from: From):Boolean
}

The traverse method takes a CanTraverseValues.ValuesVisitor[A], which is like a Function1[A, Unit], but has a few extra (optional) methods:

  trait ValuesVisitor[@specialized A] {
    def visit(a: A)
    def visitArray(arr: Array[A]):Unit = visitArray(arr, 0, arr.length, 1)

    def visitArray(arr: Array[A], offset: Int, length: Int, stride: Int):Unit = {
      var i = 0
      while(i < length) {
        visit(arr(i * stride + offset))
        i += 1
      }
    }

    // visit numZero values that all have value zeroValue
    def zeros(numZero: Int, zeroValue: A)
  }

All you have to define is visit and zero. However, if you want your UFunc to be as fast as possible, you can override visitArray.

OK, let's say we're implementing sum. That might look like this:

object sum extends UFunc {
  implicit def sumFromTraverseDoubles[T](implicit traverse: CanTraverseValues[T, Double]): Impl[T, Double] = {
    new Impl[T, Double] {
      def apply(t: T): Double = {
        var sum = 0.0
        traverse.traverse(t, new ValuesVisitor[Double] {
          def visit(a: Double) = {sum += a}
          def zeros(count: Int, zeroValue: A) {}
        })

        sum
      }
    }
  }
}

The actual sum uses the Expand Macro to define sum for all primitives, and it overrides visitArray for efficiency.

Using UFuncs in generic functions

Finally, there's how to use UFunc Impls to create generic operations. This is the whole reason for why UFuncs work the way they do. Basically, we want to be able to generically define methods in terms of their pieces. So, for instance, it would be nice to be able to be able to define a method like normalize in terms of norm (and also division). UFuncs let us do this:

object normalize extends UFunc {
  implicit def fromNormAndDivide[Arg](implicit normImpl: norm.Impl[Arg, Double], divImpl: OpDiv.Impl[Arg, Double, Arg]):Impl[Arg, Arg] = new Impl[Arg, Arg] {
    def apply(a: Arg) = {
       val normalizer = norm(a)
       divImpl(a, normalizer) // we could instead require that Arg extend NumericOps and use /, but eh.
    }
}

With those few lines of code, we can now provide implementations of normalize for everything that can provide us with norm and division by Doubles. Pretty slick, eh?

Enabling UFuncs for your collection type

If you want a new collection type to work with UFuncs, you need to provide a few implicits:

  • CanMapValues[From, FromElement, ToElement, To] (for mapping UFuncs)
  • CanZipMapValues[From, FromType, ToType, ToType] (for supporting binary UFuncs)
  • CanTransformValues[From, FromElement, FromElement] (for inPlace, if your collection is mutable)
  • CanTraverseValues[From, FromElement] (for sum and its kin.)

Matrix-y collections will also want to implement:

  • CanCollapseAxis[From, Axis._0.type, ColumnType]
  • CanCollapseAxis[From, Axis._1.type, RowType]

These provide broadcasting and all their related goodness.

TODO Abstracting over UFuncs