Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Travis CI builds #144

Open
czender opened this issue Aug 6, 2019 · 5 comments
Open

Fix Travis CI builds #144

czender opened this issue Aug 6, 2019 · 5 comments
Assignees

Comments

@czender
Copy link
Member

czender commented Aug 6, 2019

I am back from vacation. Have you noticed that all Travis builds since 3106 have failed?
https://travis-ci.org/nco/nco/jobs/565403229
Apparently this is due to your OpenMP changes, which include options not supported by all compilers so you probably need to add #ifdefs similar to those in nco_rgr.c.
Please make your #1 priority to fix Travis builds.
And then we need to move to a PR-only system for your modifications to master so that I can verify they do not contain breakage.
It is not OK to allow broken builds in master for > 24 hours, and success on AppVeyor only means that Windows builds fine, AFAICT, but Windows does not exercise the high-performance aspects of the code like OpenMP which must remain working on all the supercomputers we support with a variety of compilers and versions.
Please close this issue when Travis is working again and add a written acknowledgement that you will always submit PRs to change master in the future.

@hmb1
Copy link
Contributor

hmb1 commented Aug 7, 2019

@czender
I didnt put my omp mods into master they are in a the branch hmb-gen-msh4-omp.
The appveyor builds were failing for me because a the following ncks test command was failing

ncks -M "http://tds.marine.rutgers.edu/thredds/dodsC/roms/espresso/2013_da/his/ESPRESSO_Real-Time_v2_History_Best"

but you seem to have fixed that
...Henry

@czender
Copy link
Member Author

czender commented Aug 7, 2019

You are correct that your OpenMP code causes TravisCI to fail on your branches not on master. We still need you to try to get OpenMP passing TravisCI on your branch before you merge it to master. It turns out that at about the same time that TravisCI started failing on your branch (because of OpenMP as shown in my link above)

nco_ply_lst.c: In function ‘nco_poly_lst_mk_vrl_sph’:
nco_ply_lst.c:615:57: error: invalid schedule kind before ‘nonmonotonic’

, TravisCI also started failing on master because the default Travis build distribution was upgraded (I think) to Ubuntu Xenial and in any case the libnetcdf package name changed from libnetcdfc7 to libnetcdf11. This must be set in the file nco/.travis.yml.This is now fixed in master. Therefore your OpenMP branch still needs compiler-dependent #ifdefs for the OpenMP code. Please get on it and follow the requests in my post above.

@hmb1
Copy link
Contributor

hmb1 commented Aug 8, 2019

@czender
I've merged my omp code into the latest and greatest weight generation code (in hmb-gen-msh5 ). It seems that I can get threading to work in nco_poly_lst_mk_vrl_sph() ONLY when I use ncremap and not from ncks. I get the same behaviour when I run this branch on e3sm.
I've taken a good look at ncremap but nothing obvious jumps out.

#pragma omp parallel for private(idx, thr_idx) schedule(nonmonotonic: guided,20) shared(rtree, grd_lon_typ, bDirtyRats, bSort, max_nbr_vrl, pl_cnt_dbg, tot_nan_cnt, tot_wrp_cnt, pl_typ)

The reason why I added the shedule parameter to the pragma is to stop the situation whereby each thread is assigned the same number of work chunks from the for loop. This results in the weird situation whereby all of the threads except one run out of work and then wait for the last one. to finish. By reducing the chunk size this situation is avoided. You can see situation evolving by looking at the thread number for each for loop iteration.

...Henry

@czender
Copy link
Member Author

czender commented Aug 8, 2019

Suggestion: Use the ncremap -D 1 or -D 2 option to print the ncks commands that it executes. This should help you understand why your ncks commands do not work. Also, you MUST FIX TravisCI builds and now AppVeyor builds on your branch before the PR to master will be accepted.

@czender
Copy link
Member Author

czender commented Aug 28, 2019

It is fine to keep schedule(nonmonotonic: guided,20) as long as it is ifdef'd properly so that an older GCC that does not support it does not see it. Please add it back to your code if it improves performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants