Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maintaining high capture rates for long captures #344

Open
vogelpi opened this issue Jun 16, 2021 · 4 comments
Open

Maintaining high capture rates for long captures #344

vogelpi opened this issue Jun 16, 2021 · 4 comments
Assignees

Comments

@vogelpi
Copy link
Contributor

vogelpi commented Jun 16, 2021

Hi,
using the CW-Lite and CW-Husky capture boards in batch mode we get pretty good capture rates of a several hundreds of AES power traces per second. However, when capturing millions of traces I noted that the capture rate starts to drop significantly over time.

I had a close look on the ChipWhisperer API and identified a couple of things that seem to be non-ideal. For some of them, I could successfully create workarounds using the provided functionality of the API. For others, I did not yet find a solution. Anyway, I would be curious to get your opinion. I can imagine you already faced some of these problems and that maybe there are better ways to prevent them than what I did.

  1. Frequent array resizing of trace segments. New trace storage segments start with a size of 1 trace (traces.cur_seg.tracehint) and then get increased by 25 traces on demand. This results in frequent array resizing. I don't know how bad this really is for performance. But it would probably be possible to start new trace segments with the configured number of traces per segment, i.e. traces.seg_len, instead and avoid the resizing completely. In our capture setup I was able to avoid this problem by checking tracehint and setting it to seg_len directly using setTraceHint(). This improves performance.
    For more details see addWave() in chipwhisperer/common/traces/_base.py.
  2. After appending a trace to the storage segment, the _updateRanges() function inside chipwhisperer/common/api/TraceManager.py is called which loops over all previously captured traces. As a result the complexity of the append() function increases linearly with the number of captured traces. This is a major performance bottleneck for long captures. What I did to avoid it is to only keep the latest two trace segments enabled using setTraceSegmentStatus(). _updateRanges() then only compares the ranges of all traces in the newest two segments. Using this trick, I can maintain the same capture rate for nearly 10 Mio traces.
  3. All previously captured traces are kept in memory at all times. Keeping everything in memory is bad for two reasons: 1) at some points previous segments are moved to swap which is bad for performance, 2) if the machine really runs out of memory, nothing is saved to disk at all (this happened to me just now when capturing 10 Mio traces on a machine with 32 GiB of memory + 32 GiB of NVMe swap). The API already splits traces into storage segments (by default 10000 traces per segment) but I couldn't figure out how to store previous segments to disk and remove them from memory afterwards. My understanding is that removing traces from memory also removes them from the project. It would be great if I could get some guidance here.
  4. Related to 3: One important reason for the high memory consumption is that by default traces are stored as double-precision floating-point numbers. Especially when using CW-Lite/CW-Husky, 16-bit integers would actually suffice. However, it seems that the conversion is already happening as part of OpenADC.py. Is there a way to change the data format used for storage inside the ChipWhisperer API?
@colinoflynn
Copy link
Contributor

I see you've got a merged in changes already - so we were running slower this past week here.

As a side-note - the "project format" in ChipWhisperer is one of those very "funky" things, I don't know if it's truly worth fixing as-is or just needs a total overhaul. Internally we ended up often just writing traces to arrays using Zarr or other tools, rather than trying to reinvent the storage part.

Depending on how analysis is done this might make more sense - the ChipWhisperer system originally started more heavily tuned towards training & education. For higher performance CPA and similar attacks we are normally using external libraries (LASCAR or SCARED right now), which don't directly work with CW format.

Depending what your end goal is we can look at what makes most sense there (fixing CW vs. doing another format).

16-bit integers would actually suffice

I think this is planned change for husky already - we could do this (or as an option) w/ CW-Lite. The backstory there is mostly "from the beginning" we translate to floating point because that is what people were used to seeing (that is the serious answer).

Academia doesn't like change & I found people liked seeing their plots with "smallish" numbers for power, so I scaled everything. It also seemed to be easier to translate into MATLAB (which is what a lot of people were using before that already had the algorithms), the unsigned int raw values sometimes seemed to explode certain algorithms. So basically this is like the classic "width of horses ass defined the road width" situation, there was no reason to keep it going.

@alex-dewar
Copy link
Contributor

On the latest develop, you can now pass as_int=True to scope.get_last_trace() to get an integer representation of the trace

@vogelpi
Copy link
Contributor Author

vogelpi commented Aug 17, 2021

sorry @colinoflynn, I completely forgot about this issue. I fully understand your points. And as always it's very hard to fit all the needs. Anyway, the existing project format and all the infrastructure around it was completely fine for us to get started. I guess we are now at the edge where we need to think a little and define what the best way forward/trace format is for us.

Thanks @alex-dewar for pointing out the addition of this new argument. I've tried it out but ran into some other issues. I will open a new issue for this.

vogelpi added a commit to vogelpi/chipwhisperer that referenced this issue May 10, 2022
Both the OpenADC scope and the Trace class can already output and store
integers instead of doubles for the waves, respectively. But the
corresponding arguments were not exposed to the API previously. This
commit exposes these arguments to the API to allow users capture e.g.
waves as uint16 instead of doubles. This allows to reduce the memory
and storage requirements of long captures by roughly 4x.

This is related to newaetech#344.
@vogelpi
Copy link
Contributor Author

vogelpi commented May 10, 2022

After a long time, I've finally been able to successfully test as_int=True argument proposed by @alex-dewar . To actually reduce the memory footprint, one also has to change the data type in the trace.append() function. I've filed PR #401 to enable this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants