Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault when using Nvidia drivers #1

Open
GOKOP opened this issue Mar 4, 2020 · 12 comments
Open

Segmentation fault when using Nvidia drivers #1

GOKOP opened this issue Mar 4, 2020 · 12 comments
Assignees
Labels
bug Something isn't working

Comments

@GOKOP
Copy link

GOKOP commented Mar 4, 2020

I've cloned the repo and built it with make, but when I try to run the executable, it segfaults:

$ ./ecosim
[1]    28758 segmentation fault  ./ecosim

Logging python script appears to be broken as well, unable to find module logger_data. It doesn't sound like some actual python module that can be installed (and duckduckgo search agrees) so I assume it's supposed to be a part of this software but it's not here

$ ./ecosim_with_log.sh 
Starting Ecosim...
./ecosim_with_log.sh: line 4: 30315 Segmentation fault   ./ecosim
Starting logger plot
Traceback (most recent call last):
  File "./logger_plot.py", line 7, in <module>
    import logger_data
ModuleNotFoundError: No module named 'logger_data'

As a side note, I had to change the python script's shebang to #!/usr/bin/env python3 because python3.5 is not a thing on my system, I imagine I'm not the only one

Should I make this into two issues? Cause now that I think about this these are two separate problems but it feels kinda weird

@connor-brooks
Copy link
Owner

Logging python script appears to be broken as well, unable to find module logger_data. It doesn't sound like some actual python module that can be installed (and duckduckgo search agrees) so I assume it's supposed to be a part of this software but it's not here

The logger_data module is generated by the main simulation whilst running, but because the simulation is segfaulting this file doesn't exist yet.

As a side note, I had to change the python script's shebang to #!/usr/bin/env python3 because python3.5 is not a thing on my system, I imagine I'm not the only one

Oops, my bad. I'll make the change right now.

I've cloned the repo and built it with make, but when I try to run the executable, it segfaults:

This is interesting. A few people on HN were having a similar issues. It seemed mainly people with Nvidia graphics. May I ask what distribution and graphics you have?

Cheers

@GOKOP
Copy link
Author

GOKOP commented Mar 4, 2020

Artix Linux (basically Arch without systemd) and yes, Nvidia

@GOKOP
Copy link
Author

GOKOP commented Mar 4, 2020

When I run it on my thinkpad (so no nvidia) the program runs and even displays some output (lines "food added" and "proned") but the window is all green and logger_data is still missing

@connor-brooks
Copy link
Owner

connor-brooks commented Mar 4, 2020

Artix Linux (basically Arch without systemd) and yes, Nvidia

It seems there is some issue with GlDrawElements() and Nvidia. I'll need to investigate this further before fully understanding why. Thanks for letting me know about this.

When I run it on my thinkpad (so no nvidia) the program runs and even displays some output (lines "food added" and "proned") but the window is all green and logger_data is still missing

Which ThinkPad was it? The green screen indicates that the FBO isn't being rendered correctly. The wobbly-jelly kinda graphics work by rendering the whole simulation offscreen to a frame buffer object, then distorting this using a shader. This distorted image is then used to texture a rectangle which spans the whole screen. If this shader fails then a green fullscreen quad spanning the whole screen will be displayed. I believe FBO's were only included in OpenGL 3.0, so this makes sense for any older ThinkPad. Ecosim was developed on a ThinkPad T420 running Devuan (Debian without systemd)

My apologies for these issues. In the near future the whole simulation is going to be ported from GLFW to SDL2, which should be easier to ensure it works on various machines.

@sethalves
Copy link

sethalves commented Mar 4, 2020

me too
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff60a4c53 in ?? () from /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.435.21
(gdb) bt
#0 0x00007ffff60a4c53 in ?? () from /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.435.21
#1 0x00007ffff617d766 in ?? () from /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.435.21
#2 0x00007ffff5cf666d in ?? () from /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.435.21
#3 0x0000555555558108 in gfx_agents_draw_cell (av=0x5555558c9810, shader=6, scale=1.66666663, zoom=1) at graphics.c:371
#4 0x000055555555b4fc in main (argc=1, argv=0x7fffffffdd18) at main.c:238

ubuntu 19, GeForce GTX 1070

@GOKOP
Copy link
Author

GOKOP commented Mar 4, 2020

I believe FBO's were only included in OpenGL 3.0

That would explain it. I've ran into issues on that Thinkpad already that made me discover it doesn't support OpenGL 3.0 (It's an x200)

@harleypig
Copy link

#metoo
Arch Linux (updated last Saturday)
01:00.0 VGA compatible controller: NVIDIA Corporation GM204 [GeForce GTX 970] (rev a1)

I tried to attach an strace output log, but kept getting 'something went really wrong' ...

@connor-brooks
Copy link
Owner

connor-brooks commented Mar 4, 2020

That would explain it. I've ran into issues on that Thinkpad already that made me discover it doesn't support OpenGL 3.0 (It's an x200)

In the near future I will add an option in config.h which disables the FBO. You wouldn't get the jelly-like graphics but it should be able to run okay. It will feel very mechanical as opposed to organic, but should work.

#metoo
Arch Linux (updated last Saturday)
01:00.0 VGA compatible controller: NVIDIA Corporation GM204 [GeForce GTX 970] (rev a1)

I'm guessing the problem is caused by an issue in gfx_agents_draw_cell(), which calls glDrawElements(), causing the segfault. It seems a common issue with people using Nvidia graphics. At the moment I'm unable to understand exactly why (I have no access to a Nvidia machine), but I'll investigate.

Thanks for the feedback :)

@connor-brooks connor-brooks added the bug Something isn't working label Mar 4, 2020
@connor-brooks connor-brooks self-assigned this Mar 4, 2020
@connor-brooks connor-brooks changed the title Segmentation fault and missing module in python script Segmentation fault Mar 4, 2020
@Muffindrake
Copy link

I segfault with Nvidia's OpenGL implementation. Mesa with Intel Graphics is fine (and I suspect nouveau users would be fine). Easily testable for users with hybrid graphics thanks to NVIDIA's actual Optimus support.

(gdb) bt
#0  0x00007ffff64d9143 in ?? () from /usr/lib64/libnvidia-glcore.so.440.59
#1  0x00007ffff65bd8c6 in ?? () from /usr/lib64/libnvidia-glcore.so.440.59
#2  0x00007ffff61355bd in ?? () from /usr/lib64/libnvidia-glcore.so.440.59
#3  0x00005555555576c7 in gfx_agents_draw_cell ()
#4  0x000055555555a9ef in main ()

Intel(R) HD Graphics 530 (SKL GT2) + NVIDIA Corporation GM107M [GeForce GTX 950M]

@connor-brooks
Copy link
Owner

I segfault with Nvidia's OpenGL implementation. Mesa with Intel Graphics is fine (and I suspect nouveau users would be fine).

Thanks for helping clarify that @Muffindrake

@connor-brooks connor-brooks changed the title Segmentation fault Segmentation fault when using Nvidia drivers Mar 5, 2020
@GOKOP
Copy link
Author

GOKOP commented May 25, 2020

So is this program abandoned?

@connor-brooks
Copy link
Owner

So is this program abandoned?

Not abandoned.

I've tried getting to the root cause of the bug but haven't managed to. I will be porting the simulation over to SDL2 at some point soon, the segfault will be fixed then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants