Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace timer based RNG seed with ADC reference based one #116

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

DeeFuse
Copy link
Contributor

@DeeFuse DeeFuse commented Aug 27, 2021

Remove "Arduino.h" dependency
Fixes #115

@lonjil
Copy link

lonjil commented Aug 27, 2021

I don't think that that function actually generates 8 bits of entropy, so it should probably do more than 32 calls. I tried 256 calls and it was still much much faster than the clock-based code, so a large number of calls definitely wouldn't be problematic. It might also be a good idea to skip the ad-hoc shifting and masking code used in that function and feed everything directly into the hashing function.

@DeeFuse
Copy link
Contributor Author

DeeFuse commented Aug 27, 2021

Unfortunately I got no AVR board to test the entropy.
I can imagine that the result of the AD conversion will fluctuate in a range 0+X < n < 1023 - Y thus the higher bits might not contribute much entropy. I think the masking and shifting should compensate for that.
Instead of calling hydro_random_rbyte 256 times (and masking the result to one bit) I'd rather increase the number of nConversions inside this function.
@jedisct1 Whats your opinion on the entropy quality?
Some insight into the entropy quality can be seen on the bottom of this page

@lonjil
Copy link

lonjil commented Aug 27, 2021

When I tried it, I didn't mask hydro_random_rbyte down. I just let it feed the output directly to hydro_hash_update 256 times by changing ebits += 8 to ebits += 1.

@DeeFuse
Copy link
Contributor Author

DeeFuse commented Aug 27, 2021

I had a look into the hydro_hash_update function, it seems that the input is XORed with the state and then fed into the gimli_core_u8 every gimli_RATE (16) bytes.
I could see a benefit of creating a configuration macro like LIBHYDROGEN_RNG_ENTROPY_BITS instead of the hardcoded 256 bits in the hydro_random_init functions.
That would allow to select the "quality" of the RNG initialisation depending of your needs (better quality vs faster initialisation)

@jedisct1
Copy link
Owner

I agree that we should feed everything to the hash function instead of shifting&masking.

How many unbiased bits can we safely assume here?

With ebits += 1, the initialization procedure completes in 400ms on the Uno, which should be acceptable for most applications.

If you feel like it's not, ebits += 2 would remain safe with a single unbiased bit.

@jedisct1
Copy link
Owner

That would allow to select the "quality" of the RNG initialisation depending of your needs (better quality vs faster initialisation)

Having an unsafe option is not a good idea.

@DeeFuse
Copy link
Contributor Author

DeeFuse commented Aug 27, 2021

I can't give a definite answer on the number of unbiased bits from the code alone.
I'll look if i can find an AVR arduino in my stash to generate a larger number of random numbers to have a dataset to work with.

@lonjil
Copy link

lonjil commented Aug 27, 2021

The Uno's serial USB connection is unfortunately very slow, but I have a few hundred KiB of output data so far. Testing with nConversions=20, nConversions=100, nConversions=1, and in a bit I'll change the code to remove the shifting and looping completely and try that.

@DeeFuse
Copy link
Contributor Author

DeeFuse commented Aug 27, 2021

A basic test could be to simply create the average of all random uint8_t outputs of the generator function and observe if it converges towards 127.
An other option would be to count the amount of each random number.
The conversion loop (nConversion) has to be >= 2, otherwise the ADC trickery with the High/Low pulling of the reference and the resulting random sample interval would not work.
IMHO the shift / mask can be omitted but not the conversion loops.

@lonjil
Copy link

lonjil commented Aug 27, 2021

That is true. Right now I'm testing it with a global instead so that it still pulls up and down for each call, without looping internally. Also, seems like the main slowdown was actually random_rbyte, not the serial interface, so I have a very large amount of data for N=1 and what I described in the previous sentence.

@lonjil
Copy link

lonjil commented Aug 27, 2021

Doing some more testing, any prescaler option other than 64 seems to produce fewer unique readings from the ADC. Also, regardless of settings, at least half of all readings are 0.

@DeeFuse
Copy link
Contributor Author

DeeFuse commented Aug 27, 2021

I'd say that the 0 readings are expected, ever 2nd itteration the ADC gets grounded to be able to recharge it.
we could optimize the code to ground and recharge in the same itteration.

@lonjil
Copy link

lonjil commented Aug 27, 2021

I generated 17MiB with this code

uint8_t
random_rbyte()
{
    uint8_t rnd = 0;

    ADMUX = 0b01001111; // Low (0V)
    ADMUX = 0b01001110; // High (1.1V)
    ADCSRA |= 1 << ADSC; // Start a conversion

    // ADSC is cleared when the conversion finishes
    while ((ADCSRA >> ADSC) & 1) {
        // wait for the conversion to complete
    }

    rnd = ADCL; // do not swap this sequence. Low has to be fetched first.
    rnd ^= ADCH; // the value is always between 0-3

    return rnd;
}

and got the following byte value distribution

      2 01000101
      4 01001001
    428 01001010
    295 01001011
      8 01010100
     40 01010101
      5 01010110
    328 01010111
    239 01011000
      1 01011101
      2 01011110
     11 01011111
     58 01100000
      4 01100001
     15 01100011
   1684 01100100
   2217 01100101
      9 01100110
      7 01101001
      9 01101010
      5 01101100
     68 01101101
    317 01101110
     68 01101111
    129 01110000
   3413 01110001
   2605 01110010
     10 01110011
      4 10010011
      5 10010100
     13 10011001
      9 10011010
     11 10011011
     10 10011100
      2 10100111
      5 10101000
     87 10101001
    647 10101010
  28555 10101011
 358204 10101100
1229554 10101101
1625192 10101110
 634598 10101111
 959201 10110000
1987364 10110001
1220261 10110010
 933451 10110011
1656613 10110100
2056008 10110101
1719886 10110110
1588002 10110111
 986552 10111000
 669690 10111001
 120899 10111010
   9067 10111011
  12170 10111100
   7023 10111101
  13495 10111110
  10785 10111111
  16736 11000000
   7771 11000001
   8332 11000010
   1009 11000011
    282 11000100
    198 11000101
     64 11000110
     59 11000111
     52 11001000
     44 11001001
      7 11001010
      2 11001011

@jedisct1
Copy link
Owner

How frequently do you get the same value twice in a row? What distribution do you get with only the 2 or 4 last bits? And with a Von Neumann extractor?

@lonjil
Copy link

lonjil commented Aug 27, 2021

What kind of tools are good for this sort of analysis? I've just been stringing commands together in bash, but that's starting to feel quite inadequate. Looking at every individual pair of bytes, 158261 have the same byte twice in a row. Here's the actual file I generated https://0x0.st/-tyR.txt (it's raw binary data, not sure why 0x0 added the .txt) if you want to do your own analysis on it.

@lonjil
Copy link

lonjil commented Aug 27, 2021

Last 4 bits:

 976124 0000
1998552 0001
1231198 0010
 934489 0011
1658592 0100
2058465 0101
1719964 0110
1588391 0111
 986848 1000
 669845 1001
 121999 1010
  37930 1011
 370389 1100
1236646 1101
1639006 1110
 645462 1111

Last 2 bits:

3991953 00
5963508 01
4712167 10
3206272 11

@DeeFuse
Copy link
Contributor Author

DeeFuse commented Aug 27, 2021

I'm doubt that 1 instruction is not enough to discharge the internal capacitor to have a sufficient long charge time and range for a good distribution

ADMUX = 0b01001111; // Low (0V)
ADMUX = 0b01001110; // High (1.1V)

@lonjil
Copy link

lonjil commented Aug 28, 2021

I changed it so that it sets it to low right at the end of the function, and sets it to high right before sampling. This seems to increase the number of unique readings.

@lonjil
Copy link

lonjil commented Aug 29, 2021

I generated 1.6GiB. It has 100 different distinct byte values. I'm currently writing a tool in Zig to extract and analyze it.

@lonjil
Copy link

lonjil commented Aug 29, 2021

I ran a Von Neumann extractor on just the lowest bit of each byte, and the output was pretty even according to ent:

Entropy = 7.984414 bits per byte.

Optimum compression would reduce the size
of this 31211520 byte file by 0 percent.

Chi square distribution for 31211520 samples is 673722.13, and randomly
would exceed this value less than 0.01 percent of the times.

Arithmetic mean value of data bytes is 127.5149 (127.5 = random).
Monte Carlo value for Pi is 3.187330063 (error 1.46 percent).
Serial correlation coefficient is 0.001175 (totally uncorrelated = 0.0).

It also passes a few dieharder tests, but the size (30M) is not big enough for most of the tests.
Doing an extraction on every bit of every byte gave quite a crappy result. Next I'm going to separately test each bit position in the same manner.

I'm currently running another arduino sketch that generates 4096 samples and then resets, to see if there are any right-after-boot corelations between runs.

@lonjil
Copy link

lonjil commented Aug 29, 2021

I have now done Von Neumann extraction on each bit position.
Here are the sizes of each output file:

Bytes    MB   File             Output size / Input size
31211520  30M von_neumann_bit0 0.14922
23101440  23M von_neumann_bit1 0.11045
18657280  18M von_neumann_bit2 0.08920
18440192  18M von_neumann_bit3 0.08816
 1937408 1.9M von_neumann_bit4 0.00926
  962560 940K von_neumann_bit5 0.00460
  372736 364K von_neumann_bit6 0.00178
  323584 316K von_neumann_bit7 0.00155

According to ent [1] each of these files is high entropy. Combined with the decrease in file sizes, I interpret this as meaning that while the high bits are very biased, they are still quite decorrelated. This doesn't mean that they're independent, but I think it's a good sign. I think an iterated Von Neumann extractor might be useful to get more examinable data out of the raw data, but I'm not sure.

[1] https://paste.sr.ht/~lonjil/b6814c2139c2f55c9d37abece27aab44bc568823

@jedisct1
Copy link
Owner

This is great!

How long does it take to extract 256 bits from the lower bit?

@lonjil
Copy link

lonjil commented Aug 30, 2021 via email

@lonjil
Copy link

lonjil commented Aug 30, 2021

As a point of reference regarding performance, this hydro_random_init [1] implementation takes ~517 milliseconds to run hydro_init(). 2048 samples, which I believe means 131 (?) or so invocations of the Gimli function. With 1024 samples, it takes ~268 milliseconds. 4096 samples ends up at ~1014 milliseconds.

[1] https://paste.sr.ht/~lonjil/4a2cd787141af107fb5ac3b7eb3914b9d2034eef

@DeeFuse
Copy link
Contributor Author

DeeFuse commented Aug 30, 2021

I see that you removed the ADC converisontime "randomisation".
With that in place the randomnes of the readings could be increased.

@lonjil
Copy link

lonjil commented Aug 30, 2021 via email

@lonjil
Copy link

lonjil commented Aug 31, 2021

I ran an implemention of NIST 800-90B non-IID min-entropy estimator on 2 million samples, and it gave 0.315568 bits per sample.

@DeeFuse
Copy link
Contributor Author

DeeFuse commented Aug 31, 2021

I've tested the codes with a ATmega2561 in a comercial appliance I had at hand.
Following porblems emerged:

  1. The V_BG ADMUX settings are different (compared to the ATmega328)
  2. External (strong) AREF pullups will result in identical readings every call (with @lonjil code as well as wiht the StackOverflow suggestion)

@lonjil
Copy link

lonjil commented Aug 31, 2021

Since different AVR micros differ so strongly on so many points, it may be necessary to have bespoke routines for every model :/

@jedisct1
Copy link
Owner

This is sad :(

And we won't be able to test on every AVR model.

@lonjil
Copy link

lonjil commented Aug 31, 2021

I guess the code will have to have preprocessor directives for the target chip, and an error instructing people to open an issue for un-implemented models.

Also, I wonder if there is a published in-depth description of the ADC hardware anywhere, so that someone with the appropriate physics knowledge could figure out a lower bound on the min-entropy, rather than the upper bound we get from sample testing.

@DeeFuse
Copy link
Contributor Author

DeeFuse commented Sep 1, 2021

I think we won't be able to determine a definite min-entropy bound for that type of source.
After running the tests on my boards I think this aproach is not fitting for this librarys goar of beeing "...hard-to-misuse..." when it is possible to accidentially "disable" the random data source by applying a "common hardware design"
TL;DR: I think we should not rely on external circumstances / hardware designs for a source of random data

I'm currently testing an other (WDT + Timer based) RNG which would be independent of external / user influencable circumstances. The randomnes here will come from the "low accuracy" of the internal 128kHz RC oscillator for the WDT.

Entropy = 7.957624 bits per byte.

Optimum compression would reduce the size
of this 4096 byte file by 0 percent.

Chi square distribution for 4096 samples is 239.62, and randomly
would exceed this value 74.71 percent of the times.

Arithmetic mean value of data bytes is 127.8352 (127.5 = random).
Monte Carlo value for Pi is 3.131964809 (error 0.31 percent).
Serial correlation coefficient is 0.031658 (totally uncorrelated = 0.0).

@lonjil
Copy link

lonjil commented Sep 7, 2021

@DeeFuse would you mind posting the code for that?

@DeeFuse
Copy link
Contributor Author

DeeFuse commented Sep 11, 2021

@lonjil You can find the code I've used here (cleaned up)
My test hardware was a ATmega2561 with F_CPU = 14745600UL

@jedisct1
Copy link
Owner

Old PR and discussion, but still relevant :)

What should we do?

@DeeFuse
Copy link
Contributor Author

DeeFuse commented Apr 1, 2022

I could clean up my GIST with the updated code and push it to my PR.
I'm no security / crypto expert. How do you feel about the implementation of randomnet used in my code?

@jedisct1
Copy link
Owner

jedisct1 commented Apr 1, 2022

The implementation looks fine, but I don't have the hardware to test it on.

@DeeFuse
Copy link
Contributor Author

DeeFuse commented Apr 1, 2022

If you want to have a look into the randomnes of this implementation I can send you a bin file for analysis.

@justind000
Copy link
Contributor

The implementation looks fine, but I don't have the hardware to test it on.

I have a lot of different boards available and can help out if needed.

@DeeFuse
Copy link
Contributor Author

DeeFuse commented Apr 6, 2022

I created a 10k number sequence as well as 10 shorter dumps after booting the MCU: random.zip
I've done some reading about the AVR hardware differences which need to be accounted for to make it foolproof to use it.
After implementing some #ifdef to catch those hardware differences I'll feel comfortable to push / merge the code into libhydrogen.
@justind000 I've updated my gist with the latest code. Feel free to run it on your AVRs. The random numbers are printed to USART1 @ 115200 baud

@DeeFuse
Copy link
Contributor Author

DeeFuse commented Apr 6, 2022

Shall we backup the states of the used perihpherals (WDT / TIM1) and restore them after the random number generation?

@jedisct1
Copy link
Owner

jedisct1 commented Apr 6, 2022

Shall we backup the states of the used perihpherals (WDT / TIM1) and restore them after the random number generation?

That sounds like a good idea. Applications may not expect them to be changed.

@DeeFuse
Copy link
Contributor Author

DeeFuse commented Apr 7, 2022

I got a question about the coding style:
I can calculate the required timer parameters at compile time or at runtime.
The main difference is that the code when compilet with -01 or higher will be equal but at -O0 there is a huge difference.
The drawback of the precompiler calculation is the readability.
Which solution shall I include? test1 - Preprocessor / test2 - runtime (or optimized) https://godbolt.org/z/z37E6ox6s

@jedisct1
Copy link
Owner

jedisct1 commented Apr 7, 2022

Runtime.

@DeeFuse
Copy link
Contributor Author

DeeFuse commented Apr 8, 2022

I'm thinking about disabling the function 'hydro_random_reseed' when building on an AVR unless the user explicitly enables it.
If the user calls this function while running their own code it could cause a lot of problems due to the reconfiguration of TIM1 and the ~4second execution time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Init RNG seeding seemingly very predictable on Arduino Uno
4 participants