Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel simulations #507

Open
avelure opened this issue Aug 11, 2022 · 8 comments
Open

Parallel simulations #507

avelure opened this issue Aug 11, 2022 · 8 comments
Assignees

Comments

@avelure
Copy link

avelure commented Aug 11, 2022

I run tests of the same testcase with different generics in parallel from the same library. This seems to create some file and linking issues, but maybe this is not supposed to be possible.

library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;

entity test is
  generic (val : natural);
end entity test;
architecture beh of test is
begin
  
  process
  begin
    report "Test " & integer'image(val);
    wait;
  end process;

end architecture beh;
nvc -a test.vhd
nvc -e test-beh -gval=1 -r &
nvc -e test-beh -gval=2 -r &
nvc -e test-beh -gval=3 -r &
nvc -e test-beh -gval=4 -r &
# sh ./test.sh
** Fatal: unlink: /tmp/work/_WORK.TEST-BEH.elab.0.o: No such file or directory
/usr/bin/ld: cannot find /tmp/work/_WORK.TEST-BEH.elab.0.o: No such file or directory
/usr/bin/ld: cannot find /tmp/work/_WORK.TEST-BEH.elab.0.o: No such file or directory
** Fatal: /usr/bin/ld --eh-frame-hdr -shared -o
          /tmp/work/_WORK.TEST-BEH.elab.so /tmp/work/_WORK.TEST-BEH.elab.0.o failed with status 1
** Fatal: /usr/bin/ld --eh-frame-hdr -shared -o
          /tmp/work/_WORK.TEST-BEH.elab.so /tmp/work/_WORK.TEST-BEH.elab.0.o failed with status 1
** Note: 0ms+0: Report Note: Test 1
         test.vhd:13
@nickg
Copy link
Owner

nickg commented Aug 11, 2022

The problem above where the object file names collide is easy enough to fix by adding e.g. the PID to the file names, but then there's another problem where elaboration generates a shared library work/_WORK.<name>.elab.so and that can be re-used later by a subsequent nvc -r so the file name has to be known ahead of time.

Would it work for you if there was a --name option to -e like:

nvc -e test-beh --name test1 -gval=1 -r &
nvc -e test-beh --name test2 -gval=2 -r &
nvc -e test-beh --name test3 -gval=3 -r &
nvc -e test-beh --name test4 -gval=4 -r &

Which would produce separate shared libs for each elaboration. Then you could go back later and execute, say -gval=2, by running nvc -r test2.

@nickg nickg self-assigned this Aug 11, 2022
@nickg
Copy link
Owner

nickg commented Aug 11, 2022

Another alternative is to add an elaboration option --temporary or --no-save that doesn't save the shared library to disk (and perhaps also the elaborated design). That would work well for VUnit too which always re-elaborates before running.

@avelure
Copy link
Author

avelure commented Aug 11, 2022

In my case I always re-elaborates each time and do elab+run as this will be used in a CI setup, so the name option is not as useful for me.

An option for the separate elab and run crowd could be to somehow hash the generics provided and add them to the name, and then require the generics to also be present for the run command (unless you do elab+run). Run would then check if the unit with the specific generics has already been elaborated before running.

@nickg
Copy link
Owner

nickg commented Aug 11, 2022

I've added the --no-save option for now as it's relatively straightforward to implement. You should be able to do:

nvc -e test-beh --no-save -gval=1 -r &
nvc -e test-beh --no-save -gval=2 -r &
nvc -e test-beh --no-save -gval=3 -r &
nvc -e test-beh --no-save -gval=4 -r &

As a bonus elaboration will be slightly quicker as it's not saving the design hierarchy to disk any more.

@nickg
Copy link
Owner

nickg commented Aug 11, 2022

That was in commit c684f67, I didn't tag it properly.

@avelure
Copy link
Author

avelure commented Aug 16, 2022

Possibly related, but when I kill NVC now with CTRL-C in windows I get

fatal: interrupted in process :p_correlate at 7773062700ps+0
fatal: jit_abort called when not executing
[00007FF687E16340]
[00007FF687EAA936] nvc_current_delta+0x15b6
[00007FF687EAA99A] nvc_current_delta+0x161a
[00007FF687E9CEA6]
[00007FFB8B7CB943] CtrlRoutine+0xc3
[00007FFB8D5D7034] BaseThreadInitThunk+0x14
[00007FFB8DDA2651] RtlUserThreadStart+0x21
warning: cannot remove C:\proj\work\_WORK.MY_TB-TB.elab.23636.dll: Access is denied.

Except for that the parallel simulation works fine now.

@nickg
Copy link
Owner

nickg commented Aug 16, 2022

The regression with ctrl-c should be fixed now, but I'm not able to test on Windows at the moment. The "cannot remove" warning is unfortunate but I'm not sure what to do about that: Windows cannot remove files that are opened by any process, and it's a bit tricky to unload the DLL from the ctrl-c handler.

@avelure
Copy link
Author

avelure commented Aug 16, 2022

Ctrl-C is fixed now. I do not get a "cannot remove" warning when using Ctrl-C while running without the --no-save parameter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants