Skip to content

Metabase Clojure Style Guide

Dennis Schridde edited this page Jun 26, 2023 · 73 revisions

Introduction

Metabase has very strict code quality standards for the backend codebase; open-source code has a long life so a bit of extra polish is worth it. The guidelines below are additional guidelines on top of the Clojure Style Guide. Be sure to follow all of the guidelines in that style guide as well unless otherwise noted in this document.

Table of Contents

Dependencies

  1. Dependencies in deps.edn should have a comment explaining their purpose. [link]

    Why? This makes maintaining the dependency list easy and helps us avoid introducing duplicate dependencies. It also helps prevent shady kittens from introducing potentially dangerous dependencies that could cause security vulnerabilities.

Namespace Declarations

  1. Keep namespace :requires ordered alphabetically. Don't use prefix list notation: [link]

    ; bad
    (ns metabase.example
     (:require
      (clojure.data xml json)))
    
    ; good
    (ns metabase.example
     (:require
      [clojure.data.json :as json]
      [clojure.data.xml :as xml]))

    You can M-x cljr-clean-ns to have clj-refactor put everything in the right place. Don't worry, if you get it wrong, we have a linter that will let you know about it.

    Why? This makes reading and editing the :require declaration easier and faster for everyone involved, and prevents duplicate :requires.

    See Stuart Sierra's How to ns for more about prefix lists.

  2. Follow namespace aliasing/refer patterns used elsewhere in the codebase. [link]

    For example, metabase.util should always be required like

    (:require [metabase.util :as u])

    Why? If we are consistent with our namespace aliases, people won't need to refer to the namespace declaration to determine what namespace an alias is for.

  3. Make sure to :require any namespace whose functions you refer to, e.g. don't use clojure.pprint/pprint without requiring its namespace. [link]

    Avoid using fully-qualified namespaces rather than an alias so it is clear the namespace has been required without needing to refer to the ns declaration.

    Why? Using an external function without requiring its namespace means it becomes implicitly dependent on some other namespace requiring (i.e., loading) it. If nobody requires clojure.pprint, calls like clojure.pprint/pprint will cause an Exception.

  4. Keep namespace :imports ordered alphabetically. Use prefix list notation: [link]

    ; bad
    (ns metabase.example
     (:import
      java.time.ZonedDateTime
      java.net.URI))
    
    ; good
    (ns metabase.example
     (:import
      (java.net URI)
      (java.time ZonedDateTime)))

    Actually the same advice for keeping namespace :requires applies here: let clj-refactor handle it for you.

    Why? This makes reading and editing the :import declaration easier and faster for everyone involved, and prevents duplicate imports.

    See Stuart Sierra's How to ns for more about prefix lists.

Variable Names & Metadata

  1. Prefer longer, more verbose names for functions and variables; avoid abbreviations unless they are well-known and conventional in the Clojure world. acc, i, pred, coll, n, s, k, and f are examples of well-known conventions; any Clojure developer has seen them before and can tell you what they mean. Avoid unconventional abbreviations like tbl and unclear variable names like xs'. A good function or variable name should make its purpose immediately clear. [link]

    ;; too cryptic
    (defn mayb+1 [n]
      (when n
        (inc n)))
    
    ;; too verbose
    (defn add-one-to-a-number-if-non-nil [n]
      (when n
        (inc n)))
    
    ;; just right
    (defn maybe-inc [n]
      (when n
        (inc n)))

    Why? Code is read many more times than it is written, and clearer variable names make using and tweaking your code easier for others. Apple's Objective-C style guidelines, while not directly applicable to Clojure code, are a good further discussion. There's also a good discussion around names in Clojure here.

  2. Avoid misleading variable and function names. The names of a variable or function should clearly and unambiguously describe its purpose. [link]

    ;; bad
    (defn nil-or-maplist? [v]     ; coll would be a better variable name because it's more specific
      (or (nil? v)
          (and (sequential? v)    ; v can actually be an array, vector, list, or lazy seq
               (every? map? v)))) 
    
    ;; good
    (defn nil-or-sequence-of-maps? [coll]
      (or (nil? coll)
          (and (sequential? coll)
               (every? map? coll))))

    Why? Poorly-named functions are prone to being used in cases where they're inappropriate or avoided in cases when they would be suitable.

  3. Pure function names should be nouns describing the value they return. [link]

    For example, a function to compute a user's age based on their birthdate should be called age, not calculate-age or get-age.

    Why? A pure function is one which can be replaced with its value without affecting the result, so the name should reflect that.

    (Recommended reading: https://stuartsierra.com/2016/01/09/how-to-name-clojure-functions)

  4. Don't repeat the usual alias of the namespace a function belongs to in the name of a function itself. [link]

    (ns metabase.config)
    
    ;; bad
    (defn config-is-dev? [] ...)
    
    ;; good
    (defn is-dev? [] ...)

    Why? It's obvious that is-dev? in the example above is referring to dev, because it's in the config namespace. It's also needlessly noisy when using the function in another namespace:

    ;; bad
    (when (config/config-is-dev?)
      ...)
    
    ;; good
    (when (config/is-dev?)
      ...)

    In some cases, following this rule will require you to use a (:refer-clojure :exclude [...]) form in your namespace declaration. This is acceptable, and should be taken as a sign that you're following this rule correctly.

  5. Make everything ^:private unless it is used elsewhere. [link]

    Don't make things public just for the sake of tests. Use the var form (e.g. #'redshift/execute!) instead in your tests.

    Why? It's much easier to read and refactor code when you know its scope is limited to the current namespace.

  6. Tag variables with :arglists metadata if they are functions but wouldn't otherwise have it, such as when using def to define partial functions or function compositions. [link] e.g.

    (def ^{:arglists '([n])} plus-one (partial + 1))

    Why? Good editors use this metadata show the expected arguments to a function as you're writing code.

  7. Try to organize namespaces in such a way that you don't need to use declare. This usually means putting the public portion of a namespace near the end of a file. [link]

    Why? Avoiding declare when unnecessary forces us to read and write code in a consistent manner, that is, from top to bottom. When code is written in this consistent order we can safely assume referenced functions sit somewhere above their reference in the namespace; this makes the code easier to navigate.

  8. Only mark constants ^:const if it needs to be evaluated at compile-time or in situations where performance requires it.[link]

    What ^:const does

    When something is ^:const, it will be evaluated at compile time as opposed to launch time. This is handy in some places if we want to say look at directory structure of the source code or do some sort of expensive calculation. When running locally with clojure/clj, the value will be computed on every launch; when building an uberjar, since everything is AOT-compiled, the ^:const values will be computed just once during the build process, and people running the JAR will only see the resulting value.

    Downsides of ^:const:

    • ^:const variables can't be mocked with with-redefs in tests.
    • Changes to ^:const variables won't be picked up by things referencing the variables unless their definitions are reloaded as well. This negatively affects the REPL-based development flow.
    • A one-off copy of the value will get spliced into every place referencing it. If it's being referenced in multiple places it can result in larger code.

    When to use ^:const

    Use ^:const only when it's needed for critical performance reasons (for example unboxed numbers in tight loops where var lookup and unboxing would have a significant performance impact) or because something has to be calculated during compile time. Here's an example good usage from metabase.util/metabase-namespace-symbols:

    ;; This is made `^:const` so it will get calculated when the uberjar is compiled. `find-namespaces` won't work if
    ;; source is excluded; either way this takes a few seconds, so doing it at compile time speeds up launch as well.
    (defonce ^:const ^{:doc "Vector of symbols of all Metabase namespaces, excluding test namespaces. This is intended for
      use by various routines that load related namespaces, such as task and events initialization."}
      metabase-namespace-symbols
      (vec (sort (for [ns-symb (ns-find/find-namespaces (classpath/system-classpath))
                       :when   (and (.startsWith (name ns-symb) "metabase.")
                                    (not (.contains (name ns-symb) "test")))]
                   ns-symb))))

    What about existing ^:const stuff?

    We have a lot of stuff incorrectly marked ^:const in our codebase. Please do not hesitate to fix them if you come across them.

Documentation & Comments

  1. Every public var in Metabase must have a useful docstring. A useful docstring should clearly explain the purpose of the function, its inputs and outputs, and anything else that is otherwise not immediately clear. If there are other functions that have similar purposes, explain how the use-cases for this function differ. [link]

    Why? Code is read many more times than it is written, and a potential contributor (or even yourself in the future) won't necessarily understand the purpose of it, even if it seems clear to you when you wrote it. Aim to keep the barrier to entry for a potential contributor as low as possible. It saves time to have a good docstring describing the behavior of a function so someone doesn't need to jump to its implementation to work out how it behaves; it will make debugging or tweaking it easier in the future.

  2. Format docstrings according to Markdown conventions. [link]

    Why? Our documentation is meant to be used with marginalia or the Instant Clojure Cheatsheet.

  3. Judiciously use comments to explain sections of code that would not immediately be clear to someone else. Avoid comments that do little more than repeat what the code already says. [link]

    Why? It's easier to tweak and understand code that has clear explanations about its purpose.

  4. Make sure to update comments and docstrings when you change the code they describe. [link]

    Why? Comments and docstrings are only useful if they're up-to-date.

  5. Comments that are on a line by themselves should start with two semicolons, unless they denote a new section of code, in which case they should start with three; comments on the same line as code should start with a single semicolon. [link]

    ;;; UTIL FUNCTIONS
    
    ;; TODO - this is a preposterous function
    (defn call-twice [f x]
      (f (f x))) ; should we make this configurable somehow?

    Why? It's Lisp convention.

    EDIT: It appears this is now covered in the Clojure Style Guide.

    For more discussion, see:

Functions

  1. Break up larger functions (> 10 lines) whenever possible. [link]

    Why? Small functions are much easier to test, understand, and tweak. Extra credit: there's a good discussion about keeping Clojure functions simple and small here.

  2. Prefer higher-level looping constructs like for over lower-level functions like map and filter. [link]

    ;; acceptable
    (->> [:a :b nil :c]
         (filter #(or (keyword? x) (string? x))
         (map name)
         (map s/upper-case))
    
    ;; better
    (for [k     [:a :b nil :c]
          :when (or (keyword? k)
                    (string? k))]
      (s/upper-case (name k)))

    Why? for is usually clearer than a long chain of lower-level functions grouped together with the ->> macro; this is especially true when filter would require a one-off fn (with for, you can name variables, making things clearer).

  3. Functions in metabase.util should be written to a higher standard. They should be as general as possible and be well-tested, and be especially well-documented. In general, they should be ones that wouldn't be out-of-place in a library like medley. [link]

    Why? metabase.util is used almost everywhere so it's prudent to avoid cluttering it with functions that aren't generally useful. Since so much other code depends on it, it's also prudent to make sure the functions are well-documented and work exactly as described.

Indentation

  1. Align the values in consecutive single-line var declarations when they're of ther same type or for the same purpose. [link]

    ;; good
    (def ^:private outer-clause            #"\[\[.*?\]\]")
    (def ^:private outer-clause-prefix     #"^\[\[(.*?)\s.*\]\]$")
    (def ^:private incomplete-outer-clause #"\[\[.*?\{\{.*?\}\}.*?\]\]")
    
    ;; bad
    (def ^:private outer-clause #"\[\[.*?\]\]")
    (def ^:private outer-clause-prefix #"^\[\[(.*?)\s.*\]\]$")
    (def ^:private incomplete-outer-clause #"\[\[.*?\{\{.*?\}\}.*?\]\]")

    Why? This enhances readability and is nicer to look at.

  2. Try to keep lines 118 characters wide or less.[link]

    You can achieve this in Emacs with the following:

    (add-hook 
     'clojure-mode-hook 
     (lambda ()
       (setq-local fill-column 118)
       (setq-local clojure-docstring-fill-column 118)))

    Why? Doing so makes code easier to read in PRs and in the browser in GitHub, and makes editing it easy even on computers with very small screens. GitHub seems to cut off all lines that are over 119 characters, so keeping lines to 118 or less gives us one extra character just to be safe.

Record Types & Protocols

  1. Don't write any code causes reflection warnings. [link]

    In your REPL, check for them by calling

    (set! *warn-on-reflection* true)

    Why? Reflection causes performance issues, and fixing it is an easy win.

    NOTE Putting this line in the code while you're working on it is fine, but be careful not to check it in. It will cause Clojure to complain about libraries we depend on, logging dozens of warnings for code outside our project upon launch. We do not want to needlessly alarm users who are not doing Metabase development.

Clone this wiki locally