Skip to content

Latest commit

 

History

History
567 lines (444 loc) · 21.3 KB

04_dialyzer.asciidoc

File metadata and controls

567 lines (444 loc) · 21.3 KB

Dialyzer and Static Types

Unlike Haskell and other members of the ML family, Erlang is not a statically typed language. There are a number of historical reasons for this, mostly that when Erlang was being created the people who created it did not really know how to build a type system. While Erlang is not a statically typed language it is however a strongly typed language, in that it will mostly not automatically convert types for you and if you pass the wrong type it will not silently handle it like some languages would.

That being said, over the past few years an optional system for static types has been added to the Erlang language in the form of type annotations. These are strictly optional in that if you don’t specify them the code will still compile and they do not affect the way that the code runs. What you can do with them is use tools like dialyzer to check that the program is consistent. If you do not include types dialyzer will dynamicly derive the types from the code. But it will often assign a more broad type than might be called for.

Using type annotations and dialyzer it is possible to express many kinds of invariants as types which can then be tested, as part of a compile cycle.

However there are some very nice tools for providing type annotations and doing static type testing. These give some of the benefit that a type system like Haskell’s, without having to invest in the full mechanism of Haskell’s types.

Note
Thanks Kostis for all of this
Almost all of the tools relating to types are the product of the work of Kostis Sagonas and his teams at Uppsala University (Sweden) and the National Technical University of Athens (Greece).

The main tool for testing types is Dialyzer, which is a utility that can check compiled code for type inconsistencies. This can be a powerful way to check your code for all sorts of problems, in terms of consistency of how functions are called. For example, if you have some parts of your application that treat string data as lists and other parts that use binaries, dialyzer will find that for you.

Or to show an example from an application that I have worked on, we had a function that was changed from having four arguments to five. When we put the code in production there was one place where we had not converted it to the new signature. Had we run dialyzer on this we would have caught it before deploying our code to production. When we finally did run Dialyzer on it we found that the same error was lurking somewhere else in a part of the code base that was not currently in use, but would have appeared when that was moved to production.

One of the goals of Dialyzer was that it should not require any changes to your source code to be useful. You can specify types in your function and record definitions, but it is not required. If you do nothing but take your existing code and run Dialyzer on it, then Dialyzer should be able to show many possible patterns in your code base.

Dialyzer Internals

TODO: Expand

The way dialyzer works is by treating the parameters and return values of your project as a large constraint problem. It tries to figure out what types each function accepts and returns, then sees if everything matches up. Once it has done that it tries to tighten up the constraints and repeats over several passes.

How Erlang views types

Before we dive into how to use types to test our code, we have to understand how Erlang types work. When learning Erlang a programmer will be exposed to pattern matching and the various primitive types that Erlang uses, including atoms, integers, binaries, pids and so on, as well as the compound types which generally are tuples, arrays and records.

The Erlang type system can be very expressive, but it does not always work the same way type systems in Object Oriented languages such as Java work. There are some ideas that can be easy to express in an Erlang type but would be very hard to express in Java and vice versa.

The major difference is that the Erlang type system creates new types by algebraic composition and not by inheratence. This enables you to have useful optional types such as {ok, something()}|nothing which can represent an optional value, and therefore avoid the Null pointer type errors that can often result in many object oriented languages.

When combined with Erlang’s pattern matching this can result in very clean code. TODO Expand

The basic types that are seen in Erlang will be familiar to anyone who has used the language for a while, they they include such things as pid(), integer(), atom(), float(), binary(), reference(),
and so on.

Types can be constructed by unions of basic types, So the number() type is equivalent to integer()|float().

An atom can be its own type, so the boolean() type is the same as 'true'|'false', and you could define other unions of atoms, so HTTP verbs would be 'get'|'post'|'put'... (the quotes are optional but are good style). Integers can also be a good type, so you could have a type http_success_codes() can be defined as 200|201|202|203|204 or just 200..204.

An array can e indicated as one would expect with square brackets, the type [] indicates an empty array, while an array of integers would be shown as [integer()], and a string is defined as [char()] a non empty string is defined as [char(),...].

A tuple can be defined with braces as one would expect, so {ok, string()} would be a common return value of a function.

A record is its own type and is represented as +#record{}. Each field in the record of course can also have type information attached to it, so

TODO: Types of funs

One of the major streignths of the Erlang type system is that types can be joined together. For example the function lookup_item_by_id/1 might have the return value {ok, #item{}}|nothing which will express all possible successful results of this call, it will either find the item (presumably in a database) or return the atom nothing to indicate that there is no such item. The truly exceptional case of unrecoverable failure (for example if someone dropped the stored procedure from the database) is not mentioned and would have to be handled by the process supervisor.

If you want to have a data type that should be only used in one module you can create it as an opaque type. To do this create the type with -opaque([typename/0]). Instead of the normal export mechanism. An opaque type can be passed around, but any attempt to access the internal structure or pattern match on it will result in an error from Dialyzer.

Setting up Dialyzer to run the first time

In order to run dialyzer you need to create a Persistent Lookup Table (PLT). This is a file that normally lives in your home directory (but you can put it anywhere). This file contains basic information about the type signatures of system applications. This file should contain all of the basic erlang applications that are in use. This can include basic system things like mnesia and inets as well as application specific dependencies such as cowboy or webmachine. This way dialyzer can check that all of your program’s calls to the interfaces of those modules are consistent, without having to rescan them every time.

To create a basic PLT file, use a command line like this one. It will take a few minutes to run and then create a PLT file in the current directory. If you omit the --output_plt command then it will create a PLT file under a default name in your home directory. It is probably a good idea to have a separate PLT file for each project, as different projects may use different versions of dependencies.

Basic PLT File
dialyzer  --build_plt --apps kernel stdlib mnesia inets ssl crypto \
	--output_plt project.plt

If you find that you have forgotten to add some applications to your PLT file, then you can always add more applications to a PLT file with a command like this. You can also add more than one application at a time.

Add to PLT file
dialyzer  --add_to_plt httpc  --plt project.plt

If your project, like mine, has many dependences that you have pulled of off github with rebar you can also add those to your PLT file. I have a custom script that I use to run dialyzer that will check if the PLT file exists, and if it does not, will create it for me. This script will pull all of the rebar dependencies, compile them and build the PLT file.

Note
On my macbook air this takes 5 minutes or so to run. Its a good time to get a cup of tea. But you only have to build the PLT file once. You may have to redo it after massive changes to the dependencies of a code base.
build Project PLT
PLT=api.plt
if [ ! -f $PLT ]; then
   rm -rf deps
   rebar clean
   rebar get-deps
   rebar compile

   dialyzer  --build_plt --apps kernel stdlib mnesia inets ssl crypto \
       --output_plt $PLT
   apps=(deps/*/ebin)
   for app in "${apps[@]}"; do
       echo "Adding $app to PLT\n"
       dialyzer  --add_to_plt $app  --output_plt $PLT
   done
   echo ""
fi

Dialyzer Options

Running Dialyzer

Warning
Dialyzer will hurt your feelings

Once the PLT file has been created then we can run dialyzer on the code base. Normally dialyzer is run on the compiled beam file but it is also possible to run it on Erlang source files.

To run the basic dialyzer on a source projects from the Unix command line use a command like this. This requires that ebin contain the beam files to be examined that have been compiled with the debug_info flag. This can be done with rebar by specifying {erl_opts, [debug_info]}. in the rebar.config file.

dialyzer
dialyzer ebin --plt $PLT

If you have more than one subdirectory of Erlang applications then you can run dialyzer on all of them at once, by giving dialyzer all of the ebin directories.

The first time you run dialyzer it will find a lot of issues in your code base. The creators of dialyzer have worked very hard to make sure that there are no false positives, so if dialyzer says there is an error you can be confident that it is an actual code problem.

Some errors represent a major problem, such as a call to a function that does not exist or a mismatched return value, which in theory should be found via other forms of testing eventually. However dialyzer will also find errors like guard clauses that can never match.

Understanding Errors

TODO: Expand this a lot

Dialyzer, when run, can emit several kinds of errors. The most simple kind of error is that of an unknown function. In this case what it really meant is a function in an unknown module (if there is a known module but no function of a given name or arity that will be reported elsewhere). In this example, the system is showing three missing functions. The first one is an actual problem as it represents a real bug. In this case I had meant to call the function emysql:execute/3 but had left out the "y" in emysql so of course this function does not exist.

Unknown Functions
Unknown functions:
  emsql:execute/3
  s_template:render/1
  mochiweb_cookies:cookie/3

The second function is a render function from a erlydtl (django) template that is created at run time, as such it is not included in the set of beam files that are being checked by dialyzer. (TODO How to fix this)

Using Dialyzer with Erlydtl

if your project contains some ErlyDTL templates that are used to render HTML (or other text, you will want to compile them with debug info so that Dialyzer can work with them. To do this with rebar, place the templates in the "template" directory, and add this to your rebar.config file.

rebar.config
{erlydtl_opts, [
{compiler_options, [report, return, debug_info]}
  ]}.

The final error, mochiweb_cookies:cookie/3 is a function in a module that was not included in the plt file, so it should just be added to the PLT file.

A more subtle case that Dialyzer can catch is when patterns do not match. Which is to say that a function can return several things and your code does not match all of them.

For example if a function could return {ok,Value} or {error, REASON} and your code only matches the {ok, Value} variation then Dialyzer will point it out to you by giving you a set of errors like in this example.

A more subtle bug is that in this case the function get_component:get_component/1 will never return {error, not_found} so there is a whole chain of error checks that are catching errors that may never show up. .never_match.erl

link:examples/src/never_match.erl[role=include]

From this code dialyzer will produce an error like this one

src/never_match.erl:15: The pattern {'error', 'not_found'} can never match the type {'ok',[{<<_:32>>,<<_:32>>},...]}

This one takes some understanding to figure out what you are supposed to do with it. What it is saying is that in never_match.erl:15 the pattern {'error', 'not_found'} will never match and that this block is in effect dead code. This is because the get_component/1 function will always return {'ok', jsx:json_term()} and can never return {'error', 'not_found'}. Note in this case the -spec for that function says that it can return {'error', 'not_found'}, but the code says it can not, so Dialyzer will figure out the correct success typing.

Sometimes the error message is a bit complex, and takes some parsing to understand. This error comes from a call to upload an object to S3.

Reformatted for readability
src/cmp_s_actions.erl:53: The call
erlcloud_s3:put_object(
	Bucket::any(),
	Key::nonempty_string(),
	Value::binary(),
	[{'acl', 'public_read'}],
	[{[45 | 99 | 101 | 110 | 111 | 112 | 116 | 121,...],[47 | 97 |
	101 | 103 | 105 | 109 | 110 | 112,...]},...])
will never return since the success typing is
     (string(),string(),maybe_improper_list(binary() |
     maybe_improper_list(any(),binary() | []) | byte(),binary() | []),
     [atom() | tuple()],[{string(),string()}] |
     {'aws_config',string(),string(),string(),non_neg_integer(),string(),string(),string(),string(),string(),'undefined'
     | non_neg_integer(),'undefined' |
     string(),string(),string(),non_neg_integer(),fun((pos_integer(),_)
     ->
      {'attempt',pos_integer()} | {'error',_}),'false' | 'undefined' |
      string(),'false' | 'undefined' | string(),'undefined' |
      string()}) -> [{'version_id',_},...]
 and the contract is (string(),string(),iolist(),proplist(),[{string(),string()}] | aws_config()) -> proplist()

Adding type annotations to functions

By default dialyzer will try to infer what types each function takes as arguments and returns, however this often results in an overly broad type signature that will accept things that it should not.

As a developer Erlang gives us a way to fix this problem; we can tell the compiler what types we think a function should be working with. This also provides someone reading the code with some extra information about the intent of the developer, which will be checked by dialyzer.

To add a type specification to a function use the -spec() declaration as in this example. This tells dialyzer that the function known_content_type/2 takes 2 arguments, a wm_reqdata record and a payload record, and it returns a tuple consisting of a boolean, and the same two record types. However unlike a comment, this declaration can be checked with dialyzer, so if the function changes, or a record definition changes, it will be caught.

Type Annotation
-spec(known_content_type(#wm_reqdata{}, #payload{}) -> {boolean(), #wm_reqdata{}, #payload{}}).
known_content_type(ReqData, Context) ->
    ContentType = wrq:get_req_header("content-type", ReqData),
    Values = [{undefined,				true},
	      {"application/json",			true},
	      {"application/json; charset=UTF-8",	true}],
    Allowed = proplists:get_value(ContentType, Values, false),
    {Allowed, ReqData, Context}.

Most erlang abstract types are written in a way that they look like functions. So, for example, the type integer is integer() and a boolean is boolean().

In addition to abstract types, there are also literal types; these are atoms or integers that represent themselves. So, for example, the atom 'ok' is a literal type that will match itself, or the number 42 is also a literal type.

Types can also be combined with a vertical bar to make more complex type. So the type number() is defined as integer() | float(). You could also define a new type as 'ok'|'error' which might be the value returned by a function.

Erlang has a number of predefined types. They can be found in the Erlang reference manual (http://www.erlang.org/doc/reference_manual/typespec.html). These cover all of the basic types you would expect, such as integer(), binary(), iolist(), atom() and so on.

A record can also be used as its own type with the syntax #RecordName{}. When defining a record it is often a good idea to define types for each field in the record as shown in this example. Here we have a user record with three fields, all of which have a type of binary() as they are string fields, however the avatar field could also be blank.

-record(user, {
	  user_id			:: binary(),
	  email                         :: binary(),
	  avatar = none                 :: binary() | none
}).

We could make that declaration more expressive if we replace the binary() type with specific types. In this example we replace the first two fields generic binary() with a more specific type of user_id() and email() that are, for now, just aliases of binary but we could also make them more specific if we choose.

-type user_id() ::binary().
-type email()   ::binary().

-record(user, {
	  user_id                       :: user_id(),
	  email                         :: email(),
	  avatar = none                 :: binary() | none
}).

Adding Type annotations to records

Adding type annotations to behaviors

Creating your own types

TODO: Some examples here

To create a new type in a source file .erl or .hrl you can use the -type directive or the -opaque directive. A type can be exported from a module with the -export_type directive. When you export a type from a module it can be referred to in other modules by the syntax MODULE:TYPE() just like a function could be.

If you have a type that should be treated as opaque outside the module where it is defined, then instead of defining it with -type use the -opaque directive. The syntax is the same, but the difference is that dialyzer will issue a warning if any of the internal structure of that type is used anywhere else. This includes pattern matching on the type or any effort to reference the type inside of the object.

Warning
The Erlang compiler and runtime will still allow you to reference the internal structure of an opaque type, Only Dialyzer will warn you about doing this.

Parameterized types

You may have noticed that Erlang type declarations look a lot like functions with an arity of 0. Which of course will lead the observant reader to ask if it is possible to have a type with an arity of more than 0, of course the answer to that is yes.

types
-spec(find_user(_,_)     -> {ok, User}     | not_found).
-spec(find_customer(_,_) -> {ok, Customer} | not_found).
-spec(find_item(_,_)     -> {ok, Item}     | not_found).

It is possible that you might have some type specifications like those above in your code. This violates the basic principle of Don’t repeat yourself, and is more wordy than might be needed.

What would be nice is if we could encapsulate the whole envelope structure into a type that could take the return type as a parameter.

That would look like this: .Maybe type spec

-type maybe(A)           -> {ok, A} | not_found.
-spec(find_user(_,_)     -> maybe(User).
-spec(find_customer(_,_) -> maybe(Customer).
-spec(find_item(_,_)     -> maybe(Item).

Now the structure is much cleaner as the declaration moves the ok and the | not_found into the type itself and we just have to focus on the actual desired return values.

Note
if this looks familiar it is based on the Maybe Monad in Haskell.

Make Dialyzer go faster

You may have noticed that dialyzer can take a while to run and wished it would run faster. (Or at least I have). In this case there are a few options you can take to make it run faster.

The first and most obvious option is to run it on a faster computer. Dialyzer will take advantage of multiple cores, so putting more cores on the project will help if you have them.

Of course your spouse or boss may not let you go buy a top of the line computer, and even if you do it may still be worth it to optimize your build.

You can dialyzer a part of your project instead of the whole thing. So make a custom PLT that includes everything else and only run dialyzer on the parts you are working on, just make sure to run it on everything at some point.

Using Dialyzer with a CI System

Note
We may wish to move this to the CI Chapter

Of course a tool like Dialyzer is only useful if you run it on a regular basis. Ideally it should be run over every commit that is pushed into the repository. If your project is already using a CI system like Jenkins or Travis then it makes sense to have that system also run dialyzer over the code base after running the unit tests. (Or it could be done in parallel, it doesn’t really matter).