Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimal frac_bits calculation for fixed point arithmetics #308

Open
julianhoever opened this issue Aug 19, 2023 · 2 comments
Open

Optimal frac_bits calculation for fixed point arithmetics #308

julianhoever opened this issue Aug 19, 2023 · 2 comments

Comments

@julianhoever
Copy link
Contributor

  • Fixed bit width (total_bits) for all values in a model
  • Fractional part should be calculated depending on the minimum and maximum values
@julianhoever julianhoever self-assigned this Aug 19, 2023
@julianhoever
Copy link
Contributor Author

Optimal frac_bits calculation

What is the goal?

  • Fixed bit width (total_bits) for all values in a model
  • Fractional part should be calculated depending on the minimum and maximum values

Mathematical relationship

Given fixed bit width ($b_{total}$) and a variable bit width for the fractional part ($b_{fractional}$) for representing numbers. $Q$ is the set of all quantized values with given $b_{total}$ and $b_{fractional}$.

Then holds:

$$\min(Q) = -1 * \frac{2^{b_{total} - 1}}{2^{b_{fractional}}} = -1 * 2^{b_{total} - b_{fractional} - 1}$$

$$\max(Q) = \frac{2^{b_{total}} - 1}{2^{b_{fractional}}}$$

This leads to the following optimal frac_bits calculation for a given input tensor $T$ and fixed $b_{total}$.

If $|\min(T)| \geq |\max(T)|$
$$b_{fractional} = \text{clamp}(b_{total} - \lfloor \log_2(|\min(T)|) \rceil - 1, 0, b_{total} - 1)$$
Else if $|\min(T)| < |\max(T)|$
$$b_{fractional} = \text{clamp}(\lfloor \log_2(\frac{2^{b_{total}} - 1}{\max(T)}) \rceil, 0, b_{total} - 1)$$

With this calculation you get a pair ($b_{total}$, $b_{fractional}$) that can be used to update the parameters of the arithmetics to automatically optimize the fixed point representation.

Required code structure

General code structure for adaptable parameters for Arithmetics depending on the input values of the quantize function (not limited to fixed point).

classDiagram

class Sequential
class CreatorLayer {
    +register_arithmetics(arithmetics : Arithmetics)
}
class Arithmetics {
    +quantize(inputs : Tensor) Tensor
}
class ConcreateArithmetics {
    -quantization_params
}

Sequential "1" o-- "*" CreatorLayer
Sequential "1" *-- "1" Arithmetics

Arithmetics <|.. ConcreateArithmetics
CreatorLayer "1" *-- "1" Arithmetics

note for Sequential "for layer in layers:\nlayer.register_arithmetics(global_arithmetics)"
note for ConcreateArithmetics "updates the parameters according to the inputs of the quantize function"

Problems

  • How to initialize the arithmetics of a CreatorLayer?
    • If not part of a Sequential?

@julianhoever
Copy link
Contributor Author

These are my first thoughts how to approach this issue. The formulas in the mathematical section may be wrong. I have to verify them next week (just a first shot). The proposed architecture is also a bit problematic. Maybe @glencoe you can have a look at it?

@julianhoever julianhoever removed their assignment Sep 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant