Skip to content
Oliver Kennedy edited this page Jul 14, 2019 · 1 revision

Ugh. Case sensitivity is a nightmare.

TL;DR

  • NEVER use String for identifiers.
  • Use mimir.algebra.ID (case-sensitive) everywhere you can.
  • NEVER change the case on an ID (i.e., DO NOT use _.id.toUpperCase or _.id.toLowerCase or Oliver will be very upset).
  • Use sparsity.Name to talk to case-insensitive or variably-cased external interfaces, but normalize it to an ID before using it internaly.

The problem.

SQL is normally not case-sensitive, except when identifiers are quoted (e.g., "foo" or `foo`), in which case the identifier is case sensitive.

Data sources vary in their case sensitivity. For example Spark follows the SQL quoting model while Filesystems / URLs typically are case sensitive (unless you're on a Mac). Data sources also vary in their name standardization. For example, Spark downcases variable names, while many SQL implementations up-case them.

In short, in prior iterations of the Mimir code, we've had a mountain of bugs, hacks, and ugly workarounds dealing with case-sensitivity issues. These issues are now largely gone thanks to three rules.

  1. Never never never ever use String to store an identifier.
  2. All identifiers in Mimir's internals are case-sensitive. You acknowledge this contract by wrapping all identifiers in mimir.algebra.ID.
  3. The interfaces between Mimir and the outside world may need case-insensitive identifiers. sparsity.Name is used for this purpose. If its quoted field is set, the name is case-sensitive. If not set, the name is case-insensitive.

Names should only be used at the interfaces between Mimir and the outside world and resolved into their case-sensitive form before use. An unquoted name should be matched against every candidate ID using equalsIgnoreCase (or equivalent), and the first matching ID should replace it.

********** BEGIN Message from supreme high leader Oliver ***********
 * I don't want to see *anywhere* in the code ANY of the following
 * - [var].id.toUpperCase
 * - [var].id.toLowerCase
 * - [var].id.equalsIgnoreCase
 * or anything along these lines.  In fact, unless you have a 
 * particularly good reason to do so (several acceptable reasons
 * listed below), you should NEVER access [var].id.  Acceptable
 * reasons include:
 * - You're talking to a backend that is case sensitive (e.g. Spark)
 * - You're printing debug information.
 * - The id gets immediately wrapped in a StringPrimitive and shoved into the MetadataBackend.
 * If you're talking to a mixed-case backend (e.g., GProM), ID
 * values MUST be treated as quoted.
 ********** END Message from supreme high leader Oliver ***********