Maintaining high capture rates for long captures #344

vogelpi · 2021-06-16T13:02:18Z

Hi,
using the CW-Lite and CW-Husky capture boards in batch mode we get pretty good capture rates of a several hundreds of AES power traces per second. However, when capturing millions of traces I noted that the capture rate starts to drop significantly over time.

I had a close look on the ChipWhisperer API and identified a couple of things that seem to be non-ideal. For some of them, I could successfully create workarounds using the provided functionality of the API. For others, I did not yet find a solution. Anyway, I would be curious to get your opinion. I can imagine you already faced some of these problems and that maybe there are better ways to prevent them than what I did.

Frequent array resizing of trace segments. New trace storage segments start with a size of 1 trace (traces.cur_seg.tracehint) and then get increased by 25 traces on demand. This results in frequent array resizing. I don't know how bad this really is for performance. But it would probably be possible to start new trace segments with the configured number of traces per segment, i.e. traces.seg_len, instead and avoid the resizing completely. In our capture setup I was able to avoid this problem by checking tracehint and setting it to seg_len directly using setTraceHint(). This improves performance.
For more details see addWave() in chipwhisperer/common/traces/_base.py.
After appending a trace to the storage segment, the _updateRanges() function inside chipwhisperer/common/api/TraceManager.py is called which loops over all previously captured traces. As a result the complexity of the append() function increases linearly with the number of captured traces. This is a major performance bottleneck for long captures. What I did to avoid it is to only keep the latest two trace segments enabled using setTraceSegmentStatus(). _updateRanges() then only compares the ranges of all traces in the newest two segments. Using this trick, I can maintain the same capture rate for nearly 10 Mio traces.
All previously captured traces are kept in memory at all times. Keeping everything in memory is bad for two reasons: 1) at some points previous segments are moved to swap which is bad for performance, 2) if the machine really runs out of memory, nothing is saved to disk at all (this happened to me just now when capturing 10 Mio traces on a machine with 32 GiB of memory + 32 GiB of NVMe swap). The API already splits traces into storage segments (by default 10000 traces per segment) but I couldn't figure out how to store previous segments to disk and remove them from memory afterwards. My understanding is that removing traces from memory also removes them from the project. It would be great if I could get some guidance here.
Related to 3: One important reason for the high memory consumption is that by default traces are stored as double-precision floating-point numbers. Especially when using CW-Lite/CW-Husky, 16-bit integers would actually suffice. However, it seems that the conversion is already happening as part of OpenADC.py. Is there a way to change the data format used for storage inside the ChipWhisperer API?

The text was updated successfully, but these errors were encountered:

colinoflynn · 2021-06-21T15:24:13Z

I see you've got a merged in changes already - so we were running slower this past week here.

As a side-note - the "project format" in ChipWhisperer is one of those very "funky" things, I don't know if it's truly worth fixing as-is or just needs a total overhaul. Internally we ended up often just writing traces to arrays using Zarr or other tools, rather than trying to reinvent the storage part.

Depending on how analysis is done this might make more sense - the ChipWhisperer system originally started more heavily tuned towards training & education. For higher performance CPA and similar attacks we are normally using external libraries (LASCAR or SCARED right now), which don't directly work with CW format.

Depending what your end goal is we can look at what makes most sense there (fixing CW vs. doing another format).

16-bit integers would actually suffice

I think this is planned change for husky already - we could do this (or as an option) w/ CW-Lite. The backstory there is mostly "from the beginning" we translate to floating point because that is what people were used to seeing (that is the serious answer).

Academia doesn't like change & I found people liked seeing their plots with "smallish" numbers for power, so I scaled everything. It also seemed to be easier to translate into MATLAB (which is what a lot of people were using before that already had the algorithms), the unsigned int raw values sometimes seemed to explode certain algorithms. So basically this is like the classic "width of horses ass defined the road width" situation, there was no reason to keep it going.

alex-dewar · 2021-08-16T21:56:37Z

On the latest develop, you can now pass as_int=True to scope.get_last_trace() to get an integer representation of the trace

vogelpi · 2021-08-17T12:29:45Z

sorry @colinoflynn, I completely forgot about this issue. I fully understand your points. And as always it's very hard to fit all the needs. Anyway, the existing project format and all the infrastructure around it was completely fine for us to get started. I guess we are now at the edge where we need to think a little and define what the best way forward/trace format is for us.

Thanks @alex-dewar for pointing out the addition of this new argument. I've tried it out but ran into some other issues. I will open a new issue for this.

Both the OpenADC scope and the Trace class can already output and store integers instead of doubles for the waves, respectively. But the corresponding arguments were not exposed to the API previously. This commit exposes these arguments to the API to allow users capture e.g. waves as uint16 instead of doubles. This allows to reduce the memory and storage requirements of long captures by roughly 4x. This is related to newaetech#344.

vogelpi · 2022-05-10T11:04:40Z

After a long time, I've finally been able to successfully test as_int=True argument proposed by @alex-dewar . To actually reduce the memory footprint, one also has to change the data type in the trace.append() function. I've filed PR #401 to enable this.

vogelpi mentioned this issue Jun 16, 2021

Maintain high capture rate for long batch captures lowRISC/ot-sca#46

Merged

colinoflynn assigned colinoflynn and alex-dewar Jun 21, 2021

vogelpi mentioned this issue May 10, 2022

Expose data type arguments for capturing non-float waves #401

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Maintaining high capture rates for long captures #344

Maintaining high capture rates for long captures #344

vogelpi commented Jun 16, 2021

colinoflynn commented Jun 21, 2021

alex-dewar commented Aug 16, 2021

vogelpi commented Aug 17, 2021

vogelpi commented May 10, 2022

Maintaining high capture rates for long captures #344

Maintaining high capture rates for long captures #344

Comments

vogelpi commented Jun 16, 2021

colinoflynn commented Jun 21, 2021

alex-dewar commented Aug 16, 2021

vogelpi commented Aug 17, 2021

vogelpi commented May 10, 2022