Ignore page_collection_lock #1838

tunz · 2023-05-24T21:17:18Z

Unicorn is already ignoring page locks, so we also don't need to manage the page collection and its tree objects. It helps us to avoid redundant object allocations and speed up the execution 3x faster for my case.

wtdcode · 2023-05-25T20:40:58Z

qemu/accel/tcg/translate-all.c

@@ -677,9 +676,7 @@ page_collection_lock(struct uc_struct *uc, tb_page_addr_t start, tb_page_addr_t
            continue;
        }
        if (page_trylock_add(uc, set, index << TARGET_PAGE_BITS)) {


But you also remove these function calls, which have side effects obviously.

I think page_trylock_add also does nothing other than allocating memories since it's ignoring locks. or did I miss anything?

Another side effect is page_entry_destroy call when destroying objects, but it's also same. page_lock is no-op in Unicorn, so I think it's a meaningless call.

page_collection_lock is used by tb_invalidate_phys_range, which is not non-ops in my experience.

page_trylock_add also inserts new pages, no?

hmm interesting. I'm running it without page_collection_lock/unlock, but no problem yet. tb_invalidate_phys_range uses the page collection only for page_collection_unlock call.

page_trylock_add does not insert new pages. It creates a page entry and insert it to the tree, but it's not used anywhere. I guess it's only for managing order of locks.

wtdcode · 2023-05-25T22:22:39Z

For the speedup, I suggest you sharing the backtrace and pprof reports if possible.

tunz · 2023-05-26T03:17:47Z

Here is a snippet of profiling. Profiled by WPR and tested on Windows.

Dev branch

- helper_le_stq_mmu_x86_64 (68.38%)
  - store_helper (67.94%)
    - notdirty_write (33.63%)
      - page_collection_lock_x86_64 (30.94%)
        - _malloc_base (15.25%)
        - g_tree_new_full (13.45%)
        - page_find_alloc (1.35%)
        - ...
      - tb_invalidate_phys_page_fast_x86_64 (1.57%)
      - ...
    - page_collection_unlock_x86_64 (15.25%)
    - _free_base (9.64%)
    - helper_ret_stb_mmu_x86_64 (3.81%)
    - find_memory_region (1.35%)
    - ...
  - ...
- helper_ret_stb_mmu_x86_64 (7.17%)
  - store_helper (7.17%)
  - ...
- helper_le_stl_mmu_x86_64 (4.04%)
- helper_lookup_tb_ptr_x86_64 (2.47%)
- helper_le_ldq_mmu_x86_64 (1.79%)
- ...

With this fix:

- helper_le_stq_mmu_x86_64 (32.70%)
  - store_helper (31.35%)
    - store_helper <itself> (15.13%)
    - notdirty_write (10.81%)
      - tb_invalidate_phys_page_fast_x86_64 (7.57%)
      - ...
    - ...
- helper_lookup_tb_ptr_x86_64 (7.30%)
- helper_le_ldq_mmu_x86_64 (5.14%)
- helper_uc_tracecode (3.78%)
- helper_ret_stb_mmu_x86_64 (3.24%)
  - store_helper (2.97%)
  - ...
- ...

Percentage means a proportion of CPU usage. Execution time is reduced from 370s to 116s.

wtdcode · 2023-05-26T13:27:00Z

Here is a snippet of profiling. Profiled by WPR and tested on Windows.

Dev branch
- helper_le_stq_mmu_x86_64 (68.38%)
  - store_helper (67.94%)
    - notdirty_write (33.63%)

That's it and it matches my guess, this is slow because unicorn (qemu) has to ensure previous writes does invalidate any previous dirty code pages. This is specially handled due to Self Modifying Code (refer to qemu whitepaper if you wish).

  - page_collection_lock_x86_64 (30.94%)
    - _malloc_base (15.25%)
    - g_tree_new_full (13.45%)
    - page_find_alloc (1.35%)
    - ...
  - tb_invalidate_phys_page_fast_x86_64 (1.57%)
  - ...
- page_collection_unlock_x86_64 (15.25%)
- _free_base (9.64%)
- helper_ret_stb_mmu_x86_64 (3.81%)
- find_memory_region (1.35%)
- ...
...

helper_ret_stb_mmu_x86_64 (7.17%)

store_helper (7.17%)

...

helper_le_stl_mmu_x86_64 (4.04%)

helper_lookup_tb_ptr_x86_64 (2.47%)

helper_le_ldq_mmu_x86_64 (1.79%)

...
With this fix:
helper_le_stq_mmu_x86_64 (32.70%)

store_helper (31.35%)

store_helper (15.13%)

notdirty_write (10.81%)

tb_invalidate_phys_page_fast_x86_64 (7.57%)

...

...

helper_lookup_tb_ptr_x86_64 (7.30%)

helper_le_ldq_mmu_x86_64 (5.14%)

helper_uc_tracecode (3.78%)

helper_ret_stb_mmu_x86_64 (3.24%)

store_helper (2.97%)

...

...
Percentage means a proportion of CPU usage. Execution time is reduced from 370s to 116s.

I think a better solution is to check how qemu solves this because I feel that it shouldn't have too much overhead. Maybe we miss some core mechanism when manipulating qemu code.

tunz · 2023-05-26T18:47:20Z

Oh, thanks for the hint. Now I see this problem: Unicorn always returns false for cpu_physical_memory_get_dirty and true for cpu_physical_memory_is_clean. So, tlb_set_dirty is never called in notdirty_write, and Unicorn always takes the slow path.

wtdcode · 2023-05-26T21:02:16Z

Oh, thanks for the hint. Now I see this problem: Unicorn always returns false for cpu_physical_memory_get_dirty and true for cpu_physical_memory_is_clean. So, tlb_set_dirty is never called in notdirty_write, and Unicorn always takes the slow path.

Thanks for your hint. This is an old known issue I notice when I developed Unicorn2. The root cause is a bit long story but keep it for short: for uc1 compatibility . However, now we have some ctl for flushing code cache so this could be improved in some way but I need to figure it out.

Ignore page_collection_lock

75d26b7

wtdcode reviewed May 25, 2023

View reviewed changes

Comment out more unused page lock functions

cfaa5be

tunz mentioned this pull request May 30, 2023

Avoid notdirty_write #1839

Open

wtdcode added this to the Unicorn 2.1.0 milestone Aug 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ignore page_collection_lock #1838

Ignore page_collection_lock #1838

tunz commented May 24, 2023

wtdcode May 25, 2023

tunz May 25, 2023 •

edited

tunz May 25, 2023

wtdcode May 25, 2023

wtdcode May 25, 2023

tunz May 25, 2023

wtdcode commented May 25, 2023

tunz commented May 26, 2023 •

edited

wtdcode commented May 26, 2023

tunz commented May 26, 2023 •

edited

wtdcode commented May 26, 2023 •

edited

Ignore page_collection_lock #1838

Are you sure you want to change the base?

Ignore page_collection_lock #1838

Conversation

tunz commented May 24, 2023

wtdcode May 25, 2023

Choose a reason for hiding this comment

tunz May 25, 2023 • edited

Choose a reason for hiding this comment

tunz May 25, 2023

Choose a reason for hiding this comment

wtdcode May 25, 2023

Choose a reason for hiding this comment

wtdcode May 25, 2023

Choose a reason for hiding this comment

tunz May 25, 2023

Choose a reason for hiding this comment

wtdcode commented May 25, 2023

tunz commented May 26, 2023 • edited

wtdcode commented May 26, 2023

tunz commented May 26, 2023 • edited

wtdcode commented May 26, 2023 • edited

tunz May 25, 2023 •

edited

tunz commented May 26, 2023 •

edited

tunz commented May 26, 2023 •

edited

wtdcode commented May 26, 2023 •

edited