Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Summarizing future work with Tapenade after PR #685 #735

Open
7 of 18 tasks
Shreyas911 opened this issue May 23, 2023 · 16 comments
Open
7 of 18 tasks

Summarizing future work with Tapenade after PR #685 #735

Shreyas911 opened this issue May 23, 2023 · 16 comments

Comments

@Shreyas911
Copy link
Collaborator

Shreyas911 commented May 23, 2023

I am creating an issue to explicitly document some future work that we already know needs to be done in association with Tapenade and a couple of non-Tapenade points too, based on discussions with @jm-c, @mjlosch, and @heimbach. This work is after we merge PR #685.

Tapenade-related issues -

  • Update outputs for verification experiments as per a Fedora-based VM (@jm-c will handle this)
  • Keep documentation current as per the issues below
  • Integrate better with the diagnostics package to manage AD output (High priority)
  • Changes needed to fix/polish the use tapenade through docker (check out the docker images for AMD and ARM).
  • Dealing with CI (Future PR)
  • Handle the out-of-bound issues with the -devel flag and NaNs with the –ieee flag (Laurent Hascoet and I will take a look, @jm-c also has some understanding of this and probably mentioned at some point that these are mostly benign)
  • TLM without devel flag not working for some experiments for older versions of JAVA, need some benchmark for the minimum working version
  • Getting Tapenade to work more efficiently with some packages, for example, streamice
  • Deal with the 'touch' hack in genmake2 if needed
  • Link adStack.f, adBinomial.f, f95_test_mods.f90
  • exch2_*.template for all the duplicate files for subroutines such as exch2_rl1_cube_b
  • Untangle pkg/autodiff and pkg/tapenade if needed (@heimbach and @jm-c agree this is not a priority)
  • Evaluate if verification/lab_sea/code_tap.noecco and verification/lab_sea/code_tap.adx.noecco are needed, especially in the context of nightly tests (See PR Clean input tap #746 which deletes both of these).
  • Issue Perhaps add file initializations for TLM for profiles package #717 is also relevant here, in case the use of TLM is needed with the profiles package.
  • Checkpointing investigation (as suggested by @dngoldberg)

Other issues -

We can keep adding more issues as we find them to keep this list current.

@dngoldberg
Copy link
Collaborator

hi @Shreyas911 is this the right place to mention getting tapenade to work efficiently with other packages (e.g. streamice)? Once this P/R is merged, I need to add the necessary files to avoid differentiating through the linear system solve, plus a few other things, and also I need to test tapenade's Christianson fixed-point functionality (Laurent has recently done some work with this to make it easier to use for this purpose).

@Shreyas911
Copy link
Collaborator Author

hi @Shreyas911 is this the right place to mention getting tapenade to work efficiently with other packages (e.g. streamice)? Once this P/R is merged, I need to add the necessary files to avoid differentiating through the linear system solve, plus a few other things, and also I need to test tapenade's Christianson fixed-point functionality (Laurent has recently done some work with this to make it easier to use for this purpose).

Indeed @dngoldberg, going to add this as a bullet point. :)

@jm-c
Copy link
Member

jm-c commented Jun 5, 2023

I marked the second task as "done", since the reference output have been updated in PR #685.

The tapenade tests are not yet fully integrated in the set of daily regression tests but I've started to run both the Adjoint and Tang-Lin at night at least once a month (supposed to be on the second but in June it was on the fourth) and every time there is an update to the GitHub master branch:
http://mitgcm.org/testing/results/2023_06/tr_jaures_20230604_0/summary.txt
http://mitgcm.org/testing/results/2023_06/tr_jaures_20230604_1/summary.txt

@Shreyas911 one comment about lab_sea experiment: I did not realized that you added 3 "code_tap*" dirs there, assuming we would use the corresponding "input_tap*" to run with corresponding executable. But testreport does not work like this, and use the same executable (compiled using code_tap\) to run the primary test (input_tap/) and the secondary test (input_tap.noecco/). So the 2 code_tap.noecco/ and code_tap.adx.noecco/ are never used and could be removed. You may want to check if running the input_tap.noecco test with the executable that has been compiled using code_tap/ make sense (I thought it does but not 100% sure).

@Shreyas911
Copy link
Collaborator Author

Shreyas911 commented Jun 12, 2023

Hi @jm-c,

The tapenade tests are not yet fully integrated in the set of daily regression tests but I've started to run both the Adjoint and Tang-Lin at night at least once a month (supposed to be on the second but in June it was on the fourth) and every time there is an update to the GitHub master branch:
http://mitgcm.org/testing/results/2023_06/tr_jaures_20230604_0/summary.txt
http://mitgcm.org/testing/results/2023_06/tr_jaures_20230604_1/summary.txt

This is great to hear!

@Shreyas911 one comment about lab_sea experiment: I did not realized that you added 3 "code_tap*" dirs there, assuming we would use the corresponding "input_tap*" to run with corresponding executable. But testreport does not work like this, and use the same executable (compiled using code_tap\) to run the primary test (input_tap/) and the secondary test (input_tap.noecco/). So the 2 code_tap.noecco/ and code_tap.adx.noecco/ are never used and could be removed. You may want to check if running the input_tap.noecco test with the executable that has been compiled using code_tap/ make sense (I thought it does but not 100% sure).

I am not sure about this, since the cost functions are set up to be different depending on if ecco is used or not used. If you run diff -qr code_tap code_tap.noecco, the two directories differ in SEAICE_OPTIONS.h and COST_OPTIONS.h

Just to document everything, I am still going to add this to the list of issues to deal with.

@Shreyas911
Copy link
Collaborator Author

@jm-c, is it possible to introduce a new checkpoint so that people can clone get Tapenade capabilities in the latest tagged release?

@jm-c
Copy link
Member

jm-c commented Jun 28, 2023

@Shreyas911 I agree, it's a good idea to make a new checkpoint. Will take care of this later this week.

@jm-c
Copy link
Member

jm-c commented Jun 29, 2023

@Shreyas911 I 've just made a new tag "checkpoint68q"

@Shreyas911
Copy link
Collaborator Author

Thanks a lot, @jm-c!

@dngoldberg
Copy link
Collaborator

hi @Shreyas911 i have been doing some work on my fork to ensure tapenade doesn't differentiate through the STREAMICE linear solve and am happy with the accuracies for the verification. I will try to tackle the (more difficult) Christianson fixed point treatment with the tapenade FP-LOOP directive next. However, I am a bit confused about recomputation, I think Im seeing more than i would have expected. Can I ask for detail on how checkpointing is implemented with the tapenade adjoint? Is it revolve, or more similar to the "level" scheme with TAF, or something else? @heimbach tagging you as well, in case you know..

@Shreyas911
Copy link
Collaborator Author

Hi @dngoldberg ,

It might be best to check with Laurent Hascoet. My use case of checkpointing has typically just involved using the Tapenade binomial checkpointing pragmas and I have always treated it as a black box. Happy to get on a Zoom call if needed with you and Laurent. :)

Best,
Shreyas

@jm-c
Copy link
Member

jm-c commented Jul 20, 2023

@Shreyas911 and @mjlosch and @dngoldberg: I would like to add a comment (number 3) that Martin wrote about PR #746 :

halfpipe_streamice/code_tap/ctrl_map_gentim2d.F and
halfpipe_streamice/code_tap/ctrl_map_gentim2d.F are copies of code_oad, but they have been put there because OpenAD could not handle the default
code. I think Tapenade can and the experiments could be modeled after taf-versions, requiring fewer customised files.

I think it could be listed here, but would appreciate your opinion.

@dngoldberg
Copy link
Collaborator

@jm-c @mjlosch @Shreyas911 I have a branch in my fork where i have been addressing a few issues, one of which relates to this. The branch attempts to bring the cost and control framework of streamice (with openad) somewhat in line with other packages, so that bespoke files are not needed for cost or controls.

There are only a few reasons that ctrl_map_gentim2d.F, ctrl_map_ini_gentim2d.F and ctrl_map_ini_genarr.F are in the code_oad folder. One is that the activity analysis cannot seem to follow the assignments by parameter in ctrl_map_ini_genarr.F, but @mjlosch is ok with the fix for this. The other is that the OpenAD compiler cannot deal with referencing of characters of strings withing arrays, i.e. ctrl_name(iarr)(1:IL). This is trivial to handle with intermediate string variables, if you would be OK with this.

I was going to work on a few other changes for my PR but with this in mind I am going to submit soon, and hope that you and/or Martin will be able to review it. If it is approved, this will remove the offending files.

@jm-c
Copy link
Member

jm-c commented Mar 26, 2024

Some updates:

  1. I was assuming that, by downloading a tapenade tar file with a given name (e.g.tapenade_3.16.tar) , I would always get the same version as long as the tar file name was the same, but this is not the case. So it's likely that some of my comments in PR Introducing adjoint and TLM capabilities with Tapenade #685 were not right regarding how tapenade works on different platforms (as I might have mixed different versions).
  2. The regression tests that I run (the day following an update to MITgcm git repos) are without "-devel" and using:

Tapenade 3.16 (develop) - 21 Mar 2023 19:38 - Java 17.0.8 Linux

  1. ussing a more recent version: Tapenade 3.16 (develop) - 20 Mar 2024 18:20, I am getting the same results, and works as well on a different computer (Ubuntu instead of Fedora 37). I checked that when using "-devel", the same set of exp. stop to run because of "out of bound" indices, but this is not a serious issue (just that Tapenade abuse a little bit F77 argument arrays).
  2. I am going to try to run the testreport on an other different machine (it requires some memory) to see where we could more the regression test.

@Shreyas911
Copy link
Collaborator Author

@jm-c, thanks for all the work! I have added links to the docker images in the list above. I have tested the AMD image on TACC using apptainer. It seems to work fine for testing. Can you check it out?

@Shreyas911
Copy link
Collaborator Author

Shreyas911 commented Apr 12, 2024

@jm-c @mjlosch , quick question - When we run testreport and it reports There were 16 decimal places of similarity for "ADM CostFct", it means this when compared against the reference value of ADM CostFct in the reference output files in the subdirectory results, correct?

@mjlosch
Copy link
Member

mjlosch commented Apr 15, 2024

@jm-c @mjlosch , quick question - When we run testreport and it reports There were 16 decimal places of similarity for "ADM CostFct", it means this when compared against the reference value of ADM CostFct in the reference output files in the subdirectory results, correct?

Yes, all relative to results/output_adm*.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants