Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sudden Segementation Fault #1455

Closed
seidler opened this issue Apr 16, 2024 · 16 comments · Fixed by #1481
Closed

Sudden Segementation Fault #1455

seidler opened this issue Apr 16, 2024 · 16 comments · Fixed by #1481
Labels
bug 🐛 Something isn't working Linux 🐧 Linux related issues
Milestone

Comments

@seidler
Copy link

seidler commented Apr 16, 2024

After a few minutes runtime htop aborts with a segementation fault.

uname -a :
`linux asus 5.14.21-150500.55.52-default #1 SMP PREEMPT_DYNAMIC Tue Mar 5 16:53:41 UTC 2024 (a62851f) x86_64 x86_64 x86_64 GNU/Linux

lsb_release -a:

SB Version:    n/a
Distributor ID: openSUSE
Description:    openSUSE Leap 15.5
Release:        15.5
Codename:       n/a
FATAL PROGRAM ERROR DETECTED
============================
Please check at https://htop.dev/issues whether this issue has already been reported.
If no similar issue has been reported before, please create a new issue with the following information:
  - Your htop version: '3.4.0-dev'
  - Your OS and kernel version (uname -a)
  - Your distribution and release (lsb_release -a)
  - Likely steps to reproduce (How did it happen?)
  - Backtrace of the issue (see below)

Error information:
------------------
A signal 11 (Segmentation fault) was received.

Setting information:
--------------------
htop_version=3.4.0-dev;config_reader_min_version=3;fields=0 48 17 18 39 40 2 46 47 119 49 1;hide_kernel_threads=1;hide_userland_threads=1;hide_running_in_container=1;shadow_other_users=0;show_thread_names=0;show_program_path=0;highlight_base_name=0;highlight_deleted_exe=1;shadow_distribution_path_prefix=0;highlight_megabytes=1;highlight_threads=1;highlight_changes=0;highlight_changes_delay_secs=5;find_comm_in_cmdline=1;strip_exe_from_cmdline=1;show_merged_command=0;header_margin=1;screen_tabs=1;detailed_cpu_time=0;cpu_count_from_one=0;show_cpu_usage=1;show_cpu_frequency=0;update_process_names=0;account_guest_in_cpu_meter=0;color_scheme=0;enable_mouse=1;delay=15;hide_function_bar=0;header_layout=two_50_50;column_meters_0=LeftCPUs2 Memory Swap;column_meter_modes_0=1 1 1;column_meters_1=RightCPUs2 Tasks LoadAverage Uptime;column_meter_modes_1=1 2 2 2;tree_view=0;sort_key=46;tree_sort_key=46;sort_direction=-1;tree_sort_direction=-1;tree_view_always_by_pid=0;all_branches_collapsed=0;screen:Main=PID USER PRIORITY NICE M_RESIDENT M_SHARE STATE PERCENT_CPU PERCENT_MEM M_SWAP TIME Command;.sort_key=PERCENT_CPU;.tree_sort_key=PERCENT_CPU;.tree_view_always_by_pid=0;.tree_view=0;.sort_direction=-1;.tree_sort_direction=-1;.all_branches_collapsed=0;screen:I/O=PID USER IO_PRIORITY IO_RATE IO_READ_RATE IO_WRITE_RATE Command;.sort_key=IO_RATE;.tree_sort_key=PID;.tree_view_always_by_pid=0;.tree_view=0;.sort_direction=-1;.tree_sort_direction=1;.all_branches_collapsed=0;

Backtrace information:
----------------------

 0:       0x411250  htop  (CRT_handleSIGSEGV+0xe0)  [0x411250]
 1: 0x7f8909f07dc0  /lib64/libc.so.6  (killpg+0x42)  [0x7f8909f07e01]  {signal frame}
 2: 0x7f890a0a8a40  /lib64/libc.so.6  (free_mem+0)  [0x7f890a0a8a40]  {signal frame}

htop.objdump.gz

@BenBE BenBE added bug 🐛 Something isn't working question ❔ Further information is requested Linux 🐧 Linux related issues labels Apr 16, 2024
@BenBE
Copy link
Member

BenBE commented Apr 16, 2024

Unfortunately this only shows the actual signal handling, not what caused it.

Can you take a look at things running htop in a debugger? Start htop, and in a second console attach to it with gdb. Once a signal is caught, enter bt full.

Looking at the crash report: Which commit from main were you trying exactly?

@ArnuldOnData
Copy link

ArnuldOnData commented Apr 22, 2024

I got the same crash on Arch Linux. I can try running the debugger on the weekend. For now, attached obj dump:

FATAL PROGRAM ERROR DETECTED
------------------

Please check at https://htop.dev/issues whether this issue has already been reported.
If no similar issue has been reported before, please create a new issue with the following information:
  - Your htop version: '3.3.0'
  - Your OS and kernel version (uname -a)
  - Your distribution and release (lsb_release -a)
  - Likely steps to reproduce (How did it happen?)
  - Backtrace of the issue (see below)

Error information:
------------------
A signal 11 (Segmentation fault) was received.

Setting information:
--------------------
htop_version=3.3.0;config_reader_min_version=3;fields=0 48 17 18 38 39 40 2 46 47 49 1;hide_kernel_threads=1;hide_userland_threads=0;hide_running_in_container=0;shadow_other_users=0;show_thread_names=0;show_program_path=1;highlight_base_name=0;highlight_deleted_exe=1;shadow_distribution_path_prefix=0;highlight_megabytes=1;highlight_threads=1;highlight_changes=0;highlight_changes_delay_secs=5;find_comm_in_cmdline=1;strip_exe_from_cmdline=1;show_merged_command=0;header_margin=1;screen_tabs=1;detailed_cpu_time=0;cpu_count_from_one=0;show_cpu_usage=1;show_cpu_frequency=0;show_cpu_temperature=0;degree_fahrenheit=0;update_process_names=0;account_guest_in_cpu_meter=0;color_scheme=0;enable_mouse=1;delay=15;hide_function_bar=0;header_layout=two_50_50;column_meters_0=LeftCPUs2 Memory Swap;column_meter_modes_0=1 1 1;column_meters_1=RightCPUs2 Tasks LoadAverage Uptime;column_meter_modes_1=1 2 2 2;tree_view=0;sort_key=49;tree_sort_key=0;sort_direction=-1;tree_sort_direction=1;tree_view_always_by_pid=0;all_branches_collapsed=0;screen:Main=PID USER PRIORITY NICE M_VIRT M_RESIDENT M_SHARE STATE PERCENT_CPU PERCENT_MEM TIME Command;.sort_key=TIME;.tree_sort_key=PID;.tree_view_always_by_pid=0;.tree_view=0;.sort_direction=-1;.tree_sort_direction=1;.all_branches_collapsed=0;screen:I/O=PID USER IO_PRIORITY IO_RATE IO_READ_RATE IO_WRITE_RATE PERCENT_SWAP_DELAY PERCENT_IO_DELAY Command;.sort_key=IO_RATE;.tree_sort_key=PID;.tree_view_always_by_pid=0;.tree_view=0;.sort_direction=-1;.tree_sort_direction=1;.all_branches_collapsed=0;

Backtrace information:
----------------------
htop(+0x140ec)[0x5ef035dc70ec]
htop(CRT_handleSIGSEGV+0xf2)[0x5ef035dcebc2]
/usr/lib/libc.so.6(+0x3c770)[0x782ee796c770]
htop(+0x2b776)[0x5ef035dde776]
htop(Machine_scanTables+0xaf)[0x5ef035dd28af]
htop(ScreenManager_run+0x671)[0x5ef035de6731]
htop(CommandLine_run+0x87d)[0x5ef035dcf50d]
/usr/lib/libc.so.6(+0x25cd0)[0x782ee7955cd0]
/usr/lib/libc.so.6(__libc_start_main+0x8a)[0x782ee7955d8a]
htop(_start+0x25)[0x5ef035dc5065]

htop.objdump.gz

@BenBE
Copy link
Member

BenBE commented Apr 22, 2024

Can you attach a debugger in ProcessTable.c on line 71; the one calling Process_makeCommandStr(p, settings);.

Something seems to be off with that call. Can you test with an explicit assert(p); right before that call?

Also, does this still reproduce with the HEAD commit from main?

@seidler
Copy link
Author

seidler commented Apr 23, 2024 via email

@BenBE
Copy link
Member

BenBE commented Apr 23, 2024

@ArnuldOnData Your version is different from the one that the OP is using. Are you sure these are the same crash?

@seidler
Copy link
Author

seidler commented Apr 23, 2024 via email

@ArnuldOnData
Copy link

@ArnuldOnData Your version is different from the one that the OP is using. Are you sure these are the same crash?

He might be using different OS. I posted in the same issue here because both crashes look quite similar.

@seidler
Copy link
Author

seidler commented May 8, 2024

The segmentation fault occurred again.
The objdump, coredump and error message are aded.
htop.coredump.zip
htop.objdump.zip
htop.error.txt

@BenBE
Copy link
Member

BenBE commented May 8, 2024

@seidler Thank you for this update. Looking through this crash looks a bit strange, as the crash is not a classical NULL dereference on some data (or a double free), but somehow the vtable for the object that should be freed has no delete function in its klass member set. As the klass member is set while initializing the object, this means either some place forgets to perform this initial setup OR there's some memory corruption going on somewhere.

@BenBE BenBE removed the question ❔ Further information is requested label May 8, 2024
@ScoreUnder
Copy link
Contributor

It didn't crash for me on valgrind, but I got some suspicious messages. Git hash 314d693

valgrind-out.txt

I can't know for sure that it's related, but I feel that it is likely.

@BenBE
Copy link
Member

BenBE commented May 16, 2024

On first glance this looks like some DF or UAF. And from the stack traces from valgrind this furthermore looks at lot like it happens when a process has just terminated recently.

@BenBE BenBE added this to the 3.4.0 milestone May 16, 2024
@ScoreUnder
Copy link
Contributor

Just in case it helps, I have re-triggered the issue with -O0 -ggdb3 and got a slightly more detailed stack trace: valgrind-out.txt

cgzones added a commit to cgzones/htop that referenced this issue May 16, 2024
In case reading the status file of a process fails do not bail out and
treat the process as a short lived one, since the process has already
been added to the global process table.  Free'ing it will lead to
use-after-free issues.

Fixes: 22d25db ("Linux: detect container process by different PID namespace")
Closes: htop-dev#1455
cgzones added a commit to cgzones/htop that referenced this issue May 16, 2024
In case parsing an essential pid entry file like 'status' we treat the
process as a short living one and ignore it.  In the relevant goto
label the process structure is free'd, thus it must not have been
inserted into the global process table.

Reorder parsing the status file after potentially inserting the process
into the process table.

Fixes: 22d25db ("Linux: detect container process by different PID namespace")
Closes: htop-dev#1455
@cgzones
Copy link
Member

cgzones commented May 16, 2024

@ScoreUnder thanks, these stack traces were very helpful.
Please test the fix in #1481.

cgzones added a commit to cgzones/htop that referenced this issue May 16, 2024
In case parsing an essential pid entry file like 'status' we treat the
process as a short living one and ignore it.  In the relevant goto
label the process structure is free'd, thus it must not have been
inserted into the global process table.

Reorder parsing the status file after potentially inserting the process
into the process table.

Fixes: 22d25db ("Linux: detect container process by different PID namespace")
Closes: htop-dev#1455
@BenBE
Copy link
Member

BenBE commented May 16, 2024

@seidler @ArnuldOnData : Can you please check if the provided patch in #1481 by @cgzones resolves the issue for you? TIA.

@ScoreUnder
Copy link
Contributor

@ScoreUnder thanks, these stack traces were very helpful. Please test the fix in #1481.

I have run it built from that branch for 4.5 hours and it doesn't seem to have hit any issues, so this is likely fixed on my end after those changes

@seidler
Copy link
Author

seidler commented May 17, 2024

With the applied patch htop is running for hours without any problem.

fasterit pushed a commit to fasterit/htop that referenced this issue May 17, 2024
When parsing an essential pid entry file like 'status' fails, we treat
the process as a short-lived one and skip adding it into the process
table.

This should be done before the process is added, as the goto label used
for error handling can free the process structure, thus causing an
use-after-free scenario.

Fixes: 22d25db ("Linux: detect container process by different PID namespace")
Closes: htop-dev#1455
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐛 Something isn't working Linux 🐧 Linux related issues
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants