Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for automatic unit conversions #588

Open
SanderMertens opened this issue Feb 23, 2017 · 2 comments
Open

Support for automatic unit conversions #588

SanderMertens opened this issue Feb 23, 2017 · 2 comments

Comments

@SanderMertens
Copy link
Member

SanderMertens commented Feb 23, 2017

To facilitate easier integration between applications that use different sets of units for measurement data (Fahrenheit vs. Celcius, Miles vs. Meters) an architecture is required that can automatically detect and translate measurements from one unit to another.

The automatic unit conversion system should be flexible enough to describe SI units while at the same time allowing users to extend the system with custom units and conversions.

To ensure that such an architecture can be deployed in distributed systems, it should rely on minimal exchange of information (if any at all) before applications can start communicating with each other.

Corto has been designed with a "be generous in what you accept, strict in what you send out" philosophy, which in practice means that data that enters an application can take many forms before it is normalized to the local representation (type) that is defined by the application. There can be as many local representations (types) in a system as there are applications.

The same philosophy should apply to the system that will support unit conversions.

In this system, an application shall define the semantics of a measurement, whereas the (serialized) data contains meta information about the units of the measurement. For example, a system can have two applications which both publish temperature (semantics), where one application publishes Fahrenheit, and the other publishes Celcius (units).

The types defined by these applications will specify that they contain temperature data, whereas the serialized data specifies with which unit the measurement was published. Upon receiving this data, the application can lookup the unit that it received within the semantic "group" defined by its type, and perform the conversion (if necessary).

Unit group (quantity)

Units should be grouped by semantic meaning. For example, meter, kilometer and centimeter would all belong to the length group. If units are in the same group, it means that they are different representations of the same thing, and conversions are be possible between them.

Unit system

Units can be optionally annotated by a "unit system", which is an overlay concept that can span units from multiple semantic groups. A unit system allows a subscriber to receive all data in units annotated by that system without having to manually specify units for each value. This would for example allow a web application to switch from imperial to metric simply by changing the unit system of the subscriber.

Default unit

A type shall be able to specify a default unit for that type. The default unit specifies the unit of the data that is stored in instances of that type (in that application). Data will be converted from and to the default unit. Once set, the default unit shall not be changed. This simplifies mapping of datatypes and makes writing applications easier, as code otherwise would always have to check/convert units before doing anything with the data.

To prevent an explosion of types, where for each unit a different type is required, it shall also be possible to specify default units for members and collection elements. This allows a member for example to specify that its type is int32, and the unit is Temperature.

An alternative to the latter would be to make a unit extend from corto/lang/type, in which case the unit itself could be used as type for a member, thus specifying the default unit.

In the same sense, a semantic group distance could also inherit from corto/lang/type, and map to a datatype that can dynamically change unit at runtime. A usecase for such a feature could be to dynamically adjust the unit to the one that is most used in the system, which would reduce the number of unit conversions.

A difficulty for setting units dynamically is that different units might have different in-memory representations. For example, time can be represented as a single integer, but also as a time_t like struct. To address this issue, the corto/lang/any type could be used for values of which the unit is unknown at compile time.

Automatic unit scaling

When measurements can span multiple orders of magnitude, it would be convenient if the framework supports dynamic scaling of units based on the measurement value. For example, if the default unit is bytes, but the measurement is 150MB, it would be inconvenient if this would be represented as 150000000B. To facilitate this, it should be possible to annotate units with a range that allows the framework to select the unit that best fits the measurement value.

When automatically selecting a unit, only units from the same unit system should be selected. This would prevent the system from automatically switching from Kilometer to Mile.

Dynamic conversions

Some units, like currency, do not have fixed conversion ratios. The framework shall support an API with which a user can retrieve the most up-to-date conversion ratio.

Unit notation

Units shall be uniquely identified within a semantic group with a unit id. It shall be possible to use this id in serialized formats to indicate the unit of a measurement. For example, the following (cx) definition shows how a measurement could potentially be created and populated:

length/meter myMeasurement: 15mi

Here, distance is the semantical group, meter is the unit (we'll assume for now that units can be used as types) and km is the unit id for length/mile. After this operation, the value of myMeasurement should be 24140.16.

Using a unit symbol as a postfix may not be suitable for all serialization formats. For example, JSON data is typically converted to JavaScript objects, and having to parse a string like "15km" is less convenient than storing the unit and measurement separately ({"value":15,"unit":"km"})

Unit symbols

Units shall be able to optionally specify a symbol, which is used when visualizing a value of a unit. For example, a value representing dollars shall use the symbol $. Units shall be able to specify whether the symbol should be placed before or after the value. If no symbol is provided, the unit id shall be used.

Automatic conversion of expressions

Certain unit relationships are defined by expressions. For example, Ohm = Voltage / Ampere. The framework should be able to take this into account, so that when expressions with units are evaluated, the appropriate unit is assigned to the result of the expression:

voltage v = 120v
ampere a = 5a
ohm o = v / a // type of expression will be ohm
@jleeothon
Copy link
Member

About automatic unit scaling: does float64 not solve this? Isn't it able to naturally scale up numbers? What about solving this with some Big Integer implementation. Big Integers are probably at some point necessary, and they could be the default way to represent unit'ed values?

@SanderMertens
Copy link
Member Author

SanderMertens commented Feb 26, 2017

Example definition of units using a quantity and a unit class:

quantity length{}
unit meter: length, "m"
unit mile: length, "mi", "% * 1609.344" ::
    unit geographical: length, "mi", "% * 1853.7936"

quantity temperature{}
unit kelvin: temperature, "K"
unit celcius: temperature, "C", "% + 273.15"
unit fahrenheit: temperature, "F", "(% + 459.67) *  5 ⁄ 9"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants