Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very slow on domain joined Windows machine #71

Open
lzybkr opened this issue Aug 10, 2020 · 3 comments
Open

Very slow on domain joined Windows machine #71

lzybkr opened this issue Aug 10, 2020 · 3 comments
Assignees

Comments

@lzybkr
Copy link

lzybkr commented Aug 10, 2020

It took ~30 minutes to run procs on my domain joined machine with about ~225 processes running.

I profiled about 30 seconds and the hotspot is this stack:

OTHER <<advapi32!LookupAccountSidW>>
+ procs!procs::process::windows::get_name
+ procs!procs::process::windows::get_groups
+ procs!procs::process::windows::collect_proc
+ procs!procs::view::View::new
+ procs!procs::run_default
+ procs!procs::run
+ procs!procs::main
+ procs!std::rt::lang_start::
+ procs!std::rt::lang_start_internal
+ procs!std::rt::lang_start<()>
+ procs!main
+ procs!__scrt_common_main_seh
+ OTHER <<ntdll!RtlUserThreadStart>>
@dalance dalance self-assigned this Aug 10, 2020
dalance added a commit that referenced this issue Aug 11, 2020
@dalance
Copy link
Owner

dalance commented Aug 11, 2020

I found too many LookupAccountSidW call, and tried to add cache for SID at the latest master.
Could you try it?
(If you don't have Rust compiler, I can provide the latest binary.)

@lzybkr
Copy link
Author

lzybkr commented Aug 11, 2020

That's a huge improvement, but it's still too slow to be usable - about 40 seconds for 300 processes, the hot stack is basically the same.

Changing the cache to use the result of ConvertSidToStringSidW helps some more, getting closer to usable - about 9 seconds.

Many of the processes have 380 groups, so reducing the number of calls to LookupAccountSidW would help. Avoiding calling LookupAccountSidW twice (first to get the buffer sizes) is an obvious opportunity to optimize.

Not calling LookupAccountSidW at all (Gid and Groups are not default columns) also seems like a potential optimization.

I wonder if a different api might help. I noticed LsaLookupSids2 accepts an array of SIDs - I wonder if that results in fewer round-trips to the domain controller.

@supernova-III
Copy link

supernova-III commented Apr 1, 2023

I found too many LookupAccountSidW call, and tried to add cache for SID at the latest master. Could you try it? (If you don't have Rust compiler, I can provide the latest binary.)

@dalance
I have found one more thing which can lead to additional overhead. This is where LookupAccountSidW is called twice (here): the first time to get the required size for memory, and the second time to actually do something.
The best option is to pre-allocate a large enough but static piece of memory on the stack and pass it to LookupAccountSidW. It may fail because of too small buffer (extremely low probability case), and this is where dynamic allocation can be introduced. This technique eliminates extra call to dynamic allocator, but most importantly, the LookupAccountSid function is faster when it is called once with a sufficient buffer - I saw this in profiler in one of my projects

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants