Do not use `ZYAN_ASSERT` for user passed arguments #297

flobernd · 2021-12-21T11:02:06Z

No description provided.

flobernd · 2021-12-21T15:08:50Z

We probably should add a new fuzzer project to test the ZydisDecoderDecodeOperands() function with randomly crafted inputs (context and instruction).

flobernd · 2021-12-21T15:41:07Z

include/Zydis/DecoderTypes.h

     */
-    const void* definition;
+    ZyanU16 definition_id;


This change allows us to actually fuzz the code (open todo) and as well guarantees we never read invalid memory, even if the user passes nonsense data.

And all while requiring less space in the struct! Nice.

athre0z · 2022-01-03T06:50:11Z

While I see why this is necessary under our current API guarantees, I really can't say that I like it too much. It adds a pretty significant amount of branches to our already way too branchy code. I ran a benchmark and the performance regression is pretty noticeable. :/

Performance impact

Full decode + Formatting

Clang

Before

Benchmark 1: ./bench-zydis-full-fmt 0x14 0x400 0x2460400 /home/ath/devel/disas-bench/input/xul.dll
  Time (mean ± σ):     13.963 s ±  0.090 s    [User: 13.941 s, System: 0.007 s]
  Range (min … max):   13.851 s … 14.122 s    10 runs

After

Benchmark 1: ./bench-zydis-full-fmt 0x14 0x400 0x2460400 /home/ath/devel/disas-bench/input/xul.dll
  Time (mean ± σ):     15.572 s ±  0.034 s    [User: 15.549 s, System: 0.007 s]
  Range (min … max):   15.528 s … 15.614 s    10 runs

Regression

~12%

GCC

Before

Benchmark 1: ./bench-zydis-full-fmt 0x14 0x400 0x2460400 /home/ath/devel/disas-bench/input/xul.dll
  Time (mean ± σ):     14.245 s ±  0.086 s    [User: 14.221 s, System: 0.007 s]
  Range (min … max):   14.094 s … 14.362 s    10 runs

After

Benchmark 1: ./bench-zydis-full-fmt 0x14 0x400 0x2460400 /home/ath/devel/disas-bench/input/xul.dll
  Time (mean ± σ):     15.503 s ±  0.089 s    [User: 15.478 s, System: 0.007 s]
  Range (min … max):   15.301 s … 15.602 s    10 runs

Regression

~9%

athre0z · 2021-12-22T11:45:43Z

include/Zydis/DecoderTypes.h

     */
-    const void* definition;
+    ZyanU16 definition_id;


And all while requiring less space in the struct! Nice.

athre0z · 2022-01-03T06:11:05Z

src/Decoder.c

-                ZYAN_ASSERT(definition->size[context->eosz_index] ||
-                    (instruction->meta.category == ZYDIS_CATEGORY_AMX_TILE));
+                // ZYAN_ASSERT(definition->size[context->eosz_index] || (instruction->meta.category == ZYDIS_CATEGORY_AMX_TILE));


Is this intentionally commented out? I feel like commented out asserts will confuse us in the future. I'd either reinstate it as a return condition or get rid of it entirely.

It's still a valid assertion, but as instruction is now passed by the user, it might contain invalid data, so we can't rely on this one anymore. On the other hand I don't want to remove it entirely, because it caused an issue in the past. Might convert it to a plain comment.

flobernd · 2022-01-03T08:16:45Z

@athre0z

I ran a benchmark and the performance regression is pretty noticeable. :/

That's not nice. Have you been able to profile it? Probably the new branches in ZydisGetInstructionDefinition might be the issue. I will check on Windows as well.

While I see why this is necessary under our current API guarantees...

This was actually my initial reason to do everything in one single pass when we started. Sadly, each approach has some drawbacks. Let's try to find out what's causing the regression and optimize it as good as we can.

athre0z · 2022-01-03T08:40:14Z

Here's a perf diff. Note that -4% doesn't mean that ZydisDecoderDecodeInstruction got faster -- it's percentages that are being compared. In other words: ZydisDecoderDecodeOperands etc got heavier, so the share of cycles spent in ZydisDecoderDecodeInstruction decreased in comparison.

$ perf diff build-master/perf.data build-flo/perf.data

# Event 'cpu_core/cycles:u/'
#
# Baseline  Delta Abs  Shared Object         Symbol
# ........  .........  ....................  .....................................
#
    52.25%     -4.17%  bench-zydis-full-fmt  [.] ZydisDecoderDecodeInstruction
     1.90%     +2.76%  bench-zydis-full-fmt  [.] ZydisDecoderDecodeOperands
     1.15%     +1.25%  bench-zydis-full-fmt  [.] ZydisGetInstructionDefinition
     4.17%     +1.13%  libc-2.33.so          [.] __memset_avx2_unaligned_erms
    21.56%     -0.56%  bench-zydis-full-fmt  [.] ZydisDecodeOperands
     0.96%     +0.37%  bench-zydis-full-fmt  [.] ZydisDecodeOperandMemory
     1.77%     -0.27%  bench-zydis-full-fmt  [.] ZydisRegisterEncode
     4.27%     -0.19%  bench-zydis-full-fmt  [.] main
     0.84%     -0.11%  bench-zydis-full-fmt  [.] ZydisGetAccessedFlags
     0.65%     -0.10%  bench-zydis-full-fmt  [.] ZydisReadDisplacement
     0.60%     -0.09%  bench-zydis-full-fmt  [.] ZydisGetElementInfo
               +0.09%  bench-zydis-full-fmt  [.] ZydisCalcRegisterId
     0.68%     -0.08%  bench-zydis-full-fmt  [.] ZydisGetInstructionEncodingInfo
     0.43%     +0.07%  bench-zydis-full-fmt  [.] memset@plt
     2.05%     -0.05%  bench-zydis-full-fmt  [.] ZydisRegisterGetWidth
     0.89%     -0.04%  bench-zydis-full-fmt  [.] ZydisFormatterFormatInstruction
     1.69%     -0.02%  bench-zydis-full-fmt  [.] ZydisFormatterFormatInstructionEx
     2.76%     -0.01%  bench-zydis-full-fmt  [.] ZydisDecoderTreeGetChildNode
     0.76%     -0.01%  bench-zydis-full-fmt  [.] ZydisReadImmediate
     0.61%     -0.00%  bench-zydis-full-fmt  [.] ZydisGetOperandDefinitions
               +0.00%  ld-2.33.so            [.] update_usable.constprop.0
     0.00%     -0.00%  ld-2.33.so            [.] _dl_start
     0.00%     -0.00%  ld-2.33.so            [.] _start
     0.00%             ld-2.33.so            [.] init_cpu_features.constprop.0

flobernd · 2022-07-02T21:24:20Z

@athre0z How do we proceed with this PR?

athre0z · 2022-07-03T12:59:03Z

Well, I'm not really excited about a 10% perf regression for everyone for what is essentially "make debugging easier". The single commit in this PR does a bunch of different things, some of which are entirely uncontroversial and will likely improve rather then regress performance, e.g. using lookup tables instead of chains of ternary expressions. We should probably start by splitting this one commit into multiple smaller ones and then do partial merges.

As for the argument validation, we'll have to make a decision as to whether we simply define this as undefined behavior and keep performance intact or burn a ton of CPU on validation.

flobernd · 2022-07-03T13:41:07Z

Just go with undefined behavior is a bad idea, because it will often lead to invalid memory access or other critical side effects. We don't see these effects on master yet, because our fuzzer does not test this. Imho we have to continue using sanity checks for public APIs.

There must be a way to mitigate the performance issues.

flobernd force-pushed the improve-user-input-validation branch 4 times, most recently from e5fa624 to 615a8f5 Compare December 21, 2021 14:28

flobernd requested a review from athre0z December 21, 2021 15:06

flobernd marked this pull request as ready for review December 21, 2021 15:08

flobernd commented Dec 21, 2021

View reviewed changes

Do not use ZYAN_ASSERT for user passed arguments

906f0f1

flobernd force-pushed the improve-user-input-validation branch from f05434e to 906f0f1 Compare December 22, 2021 09:11

athre0z reviewed Jan 3, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do not use `ZYAN_ASSERT` for user passed arguments #297

Do not use `ZYAN_ASSERT` for user passed arguments #297

flobernd commented Dec 21, 2021

flobernd commented Dec 21, 2021

flobernd Dec 21, 2021

athre0z Dec 22, 2021

athre0z commented Jan 3, 2022 •

edited

athre0z Dec 22, 2021

athre0z Jan 3, 2022

flobernd Jan 3, 2022

flobernd commented Jan 3, 2022

athre0z commented Jan 3, 2022 •

edited

flobernd commented Jul 2, 2022

athre0z commented Jul 3, 2022 •

edited

flobernd commented Jul 3, 2022

Do not use ZYAN_ASSERT for user passed arguments #297

Are you sure you want to change the base?

Do not use ZYAN_ASSERT for user passed arguments #297

Conversation

flobernd commented Dec 21, 2021

flobernd commented Dec 21, 2021

flobernd Dec 21, 2021

Choose a reason for hiding this comment

athre0z Dec 22, 2021

Choose a reason for hiding this comment

athre0z commented Jan 3, 2022 • edited

Performance impact

Full decode + Formatting

Clang

Before

After

Regression

GCC

Before

After

Regression

athre0z Dec 22, 2021

Choose a reason for hiding this comment

athre0z Jan 3, 2022

Choose a reason for hiding this comment

flobernd Jan 3, 2022

Choose a reason for hiding this comment

flobernd commented Jan 3, 2022

athre0z commented Jan 3, 2022 • edited

flobernd commented Jul 2, 2022

athre0z commented Jul 3, 2022 • edited

flobernd commented Jul 3, 2022

Do not use `ZYAN_ASSERT` for user passed arguments #297

Do not use `ZYAN_ASSERT` for user passed arguments #297

athre0z commented Jan 3, 2022 •

edited

athre0z commented Jan 3, 2022 •

edited

athre0z commented Jul 3, 2022 •

edited