Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do you know how NVDLA implement the non-linear activation functions #2

Open
bandjshen opened this issue Mar 7, 2018 · 6 comments
Open

Comments

@bandjshen
Copy link

From the RTL Source code ,it use 9 bits as address for LUT , and it can 4*16bits data in parallel ;
but it divide it's table into LO and LE , what this means ? how to decide to use LO or LE ?
why it both get the LE[addr] and LE[addr+1] ?

there are also scale/shift/offset/bias after LUT , what this means ?

@JunningWu
Copy link
Owner

I will check those codes these days.

@JunningWu
Copy link
Owner

JunningWu/AIChip#11

@JunningWu
Copy link
Owner

The Single Data Point Processor (SDP) allows for the application of both linear and non-linear functions onto individual data points. This is commonly used immediately after convolution in CNN systems. The SDP provides native support for linear functions (e.g., simple bias and scaling) and uses lookup tables (LUTs) to implement non-linear functions. This combination supports most common activation functions as well as other element-wise operations including: ReLU, PReLU, precision scaling, batch normalization, bias addition, or other complex non-linear functions, such as a sigmoid or a hyperbolic tangent.

I guess U shall understand SDP's functions first, then U will know the RTL code very well.

@bandjshen
Copy link
Author

i know which functions inside SDP , but for detail solution , how to design LUT to implement ELU/Sigmoid/Tanh is not know ;
I'm a hw engineer , the code to calculate the table may exist in driver code when this layer try to support Sigmoid for example ;
another question is , SDP is 16 pipeline in parrellel , but for EW(the non-linear activation) , it seems only 4 pipeline ; it will hurt the performance .

@bandjshen
Copy link
Author

Hi, Junning
Do you know in BS/BN instance , there are register config port cfg_alu_algo_rsc_z[1:0]
what this bits means ?
and how about cfg_alu_shift_value_rsc_z[5:0] ?

Thanks

@weikun750
Copy link

The index is consist of 9 bit INT and 35 bit FRAC, so if the index is inbetween of 2 entrys, the output will be Y0FRAC + Y1(1<<35 - FRAC), that's why, the HW always implement the addr and addr+1 logics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants