Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Jitter issues with rmarkdown #901

Open
1 task done
MOshima-PIFSC opened this issue Jan 11, 2024 · 27 comments
Open
1 task done

[Bug]: Jitter issues with rmarkdown #901

MOshima-PIFSC opened this issue Jan 11, 2024 · 27 comments
Assignees
Labels

Comments

@MOshima-PIFSC
Copy link
Contributor

Describe the bug

I'm trying to create a vignette for ss3diags showing a jitter analysis but I am getting an error when I try to render it using the github action pkgdown.yml. I believe the issue is with running jitter() in parallel, because the backtrace from the failed action shows:
` 1. ├─r4ss::jitter(...)

  1. │ └─furrr::future_map_dbl(...)

  2. │ └─furrr:::furrr_map_template(...)

  3. │ └─furrr:::furrr_template(...)

  4. │ └─future::future(...)
    ......`
    I was wondering if there is a way to still run jitter without the parallelization? Or if you have a suggestion for another workaround? Thanks!

To Reproduce

  1. Go to https://github.com/PIFSCstockassessments/ss3diags/actions/runs/7481127357
  2. Select re-run jobs/re-run failed jobs
  3. View action output

Expected behavior

Action will fail at Deploy Package step when rendering jitter.rmd.

Screenshots

No response

Which OS are you seeing the problem on?

Linux

Which versions of r4ss and SS3 are you seeing the problem on?

r4ss: 1.48.1, ss3: v3.30.21

Additional Context

No response

Code of Conduct

@e-perl-NOAA
Copy link
Contributor

@okenk was involved with adding the parallel part to the jitter function so maybe she has a suggestion.

@okenk
Copy link
Contributor

okenk commented Jan 11, 2024

Sorry it isn’t working, I had tried to ensure back compatibility.

It will not run in parallel unless you have set up future to do so on your machine previously in your R session. Otherwise it will use the future functions, but run everything in serial, as if you have used purrr::map.

To debug future functions, I have found it helpful to run future::plan(future::sequential, split =TRUE) which will allow you to use debug() and browser()

@MOshima-PIFSC
Copy link
Contributor Author

Thanks @okenk. Since I am running this in a github action I am not sure if/how I can use the debug and browser functions to help debug the issue. I tried including the future package in the r dependencies in the workflow but that doesn't seem to help. Do you or anyone else have another suggestion for me to try?
Thanks!

@iantaylor-NOAA
Copy link
Contributor

Sorry @MOshima-PIFSC, I was away over a long weekend, but I can look at this now.

@e-perl-NOAA
Copy link
Contributor

@MOshima-PIFSC - is it giving you the error when running locally if you just try to run the lines in the vignette or are you only getting the error when using the github action?

@MOshima-PIFSC
Copy link
Contributor Author

@e-perl-NOAA I only get the error when using github action. I can run the few lines and also knit the rmd locally no problem.

@e-perl-NOAA
Copy link
Contributor

e-perl-NOAA commented Jan 17, 2024

Hmmm, then I wonder if it's not specifically an r4ss issue, at least in that function. I have a couple of suggestions that may or may not help but worth a shot.

  1. Try adding the following to your yaml file for pkgdown. Sometimes updating things on the machine can solve problems for some reason.
- name: updates
        run: sudo apt-get update
  1. Try adding rmarkdown to the packages you are installing in the pkgdown yaml. Might be worth adding purrr and furrr in there as well just in case.
  2. Last suggestion for now, try using install_github("r4ss/r4ss") in your vignette instead of using pak, some people had issues with pak but didn't with install_github().

If none of these work, then I can go back to the drawing board (google) and poke around some more to see what other things might be causing this issue.

@MOshima-PIFSC
Copy link
Contributor Author

Thanks @e-perl-NOAA. I gave all three of those suggestions a try and am still getting the same error. I thought installing purrr and furrr might have been the fix but doesn't seem to be the case. I wonder if it has to do with the type of machine I'm using for github actions?

@e-perl-NOAA
Copy link
Contributor

I'm going to fork the repo and add a codespace to test it on (theoretically) the same type of machine that it would be running on using github actions

@e-perl-NOAA
Copy link
Contributor

Okay so I'm walking through the vignette on a codespace that's a linux machine and the first problem that I run into is that the model fails to run with r4ss::run() - the output is

 reading from starter.ss
 reading from data file
Data read sucessful 999

 reading forecast file 
 reading from control file
End of control file successful! 999
Begin setting up parameters ...  read parm file
Error trying to open parameter input file ss.par
[1] "model run failed"

I would need to test more locally to see what's going on but it's the end of my day today. I will try to work on testing some more tomorrow or Friday.

@iantaylor-NOAA
Copy link
Contributor

I'm sorry I haven't had any luck with my limited exploration of this issue either.

@MOshima-PIFSC
Copy link
Contributor Author

I also set up a codespace to debug. I was able to get through r4ss::run(), however, I am still getting the same error for r4ss::jitter(). I also tried changing the type of machine to use a windows machine for the action but that did not work either.

@e-perl-NOAA
Copy link
Contributor

I have tried running it locally on my windows computer and I'm running into the same error. I think it has to do with the file names. The error I get when running the vignette is with the jitter function where it stops after showing this error

Warning: No Report.sso file found from run 1Error in (function (.x, .f, ..., .progress = FALSE)  :

When I look at the files in dir_jitter, the report file is named Report0.sso, so I wonder if that is what is causing the problem.

@MOshima-PIFSC
Copy link
Contributor Author

@e-perl-NOAA and @iantaylor-NOAA I still have not made much progress figuring out what is causing this error and the inconsistencies in me and @e-perl-NOAA's errors is confusing. I opened a codespace and was running the rmarkdown line by line. I got the same error when I got the jitter function

Error in (function (.x, .f, ..., .progress = FALSE)  : ℹ In index: 1.
Caused by error:
! Result must be length 1, not 0.

so I copied the jitter code and ran through the function line-by-line. When I got to

likesaved <- furrr::future_map_dbl(Njitter, function(.x) {
    iterate_jitter(
      i = .x,
      dir = dir,
      printlikes = printlikes,
      exe = exe,
      verbose = verbose,
      init_values_src = starter[["init_values_src"]],
      ...
    )
  })

I got an error but when I removed the ... the function ran and produced likelihoods from 5 runs. I'm not sure if this is helpful but its some progress maybe. I will be on leave all week next week but keep me posted if you figure out a solution.
A potential workaround for this could be to keep the code but set eval=F in the run-jitter chunk and then have the outputs saved as an rdata object that I could read in later in the rmarkdown for the plots. However, I would prefer to run ~50 jitters so that object could get pretty large and would put us back to the issue we had before of the package being too large.
Again, really appreciate your help with this!

@iantaylor-NOAA
Copy link
Contributor

I think I now see what's going on. The jitter function renames the original files to add 0 at the end so they don't get overwritten by each new run. However, if the model uses the .par file, the ss.par file got added back.
In the changes to add parallelization that @okenk implemented in October, that adding back step seems to have gotten lost: 68b7b86#diff-63b0e69443d432a2baee832a6ada2edaf390a6ed7c923cb4080ae3bb1ec7ba19L148-L152.

@okenk is on leave this week but is likely the best person to figure out how best to keep or restore the .par file in the context of the new parallelization option.

@MOshima-PIFSC, if you change from init_values_src = 1 to init_values_src = 0 on this line, that should avoid the issue: https://github.com/PIFSCstockassessments/ss3diags/blob/FEAT-cookbook-recipe-articles/vignettes/articles/Jitter.Rmd#L57.

Here's some background on my debugging in case anyone is interested. Thanks to some tips from @e-perl-NOAA, I was able to work on the vignette locally and I got this error:

jit.likes <- r4ss::jitter(dir = dir_jitter, 
                              Njitter = Njitter, 
                              init_values_src = 1, 
                              jitter_fraction = 0.1,
                              exe = "ss3.exe",
                              printlikes = FALSE,
                              verbose = FALSE)

Error in (function (.x, .f, ..., .progress = FALSE)  : 
  i In index: 1.
Caused by error:
! Result must be length 1, not 0.
In addition: Warning message:
In system2(command = command, args = extras, stdout = ifelse(show_in_console,  :
  running command '"ss3.exe" ' had status 1

Then running the executable via command line or via run() (and making sure to see the console output via show_in_console = TRUE shows that missing ss.par is the problem.

> run(dir_jitter, show_in_console = TRUE)
Executable found in directory C:\Users\IAN~1.TAY\AppData\Local\Temp\RtmpILKXAD/jitter
Changing working directory to C:\Users\IAN~1.TAY\AppData\Local\Temp\RtmpILKXAD/jitter and running model using the command: ss3.exe 
 reading from starter.ss
 reading from data file
Data read sucessful 999

 reading forecast file
 reading from control file
End of control file successful! 999
Begin setting up parameters ...  read parm file
Error trying to open parameter input file ss.par
[1] "model run failed"

@okenk
Copy link
Contributor

okenk commented Jan 22, 2024

I am happy to work on this, but am not sure the problem has been diagnosed. It looks like @iantaylor-NOAA ran SS3 within his own base jitter directory, dir_jitter. jitter() executes run() within a subdirectory of dir_jitter.

To start, @iantaylor-NOAA, could you explain what you mean by:

However, if the model uses the .par file, the ss.par file got added back

The par file from the original model does get copied into the subdirectory where SS3 is run if the starter file has init_values_src = 1 . But maybe I am misunderstanding what you said.

Further, it sounds like it is working locally on @MOshima-PIFSC's computer, so I am a bit confused why it is not working locally for @iantaylor-NOAA and @e-perl-NOAA. It sounds like @MOshima-PIFSC diagnosed that the issue is the ... in the anonymous function. I find this totally believable because we had issues with that earlier, though I thought we had fixed it.

@okenk
Copy link
Contributor

okenk commented Jan 22, 2024

Ah, I think that the error that @iantaylor-NOAA identified is independent and caused by a different bug where jitter() was not working if your working directory when you called the function was different than the base model directory. I can fix that, but it will not solve @MOshima-PIFSC's problem.

okenk added a commit to okenk/r4ss that referenced this issue Jan 22, 2024
@okenk
Copy link
Contributor

okenk commented Jan 22, 2024

@MOshima-PIFSC, I forked {ss3diags} and changed the {r4ss} dependency to a jitter-fix branch of my {r4ss} fork. I then changed the way the ... is implemented, per this recommendation by @kellijohnson-NOAA -- should have listened to you from the start instead of being lazy and keeping it as it was. 😄 I think the pkgdown action now successfully runs! But do check yourself to be sure.

@iantaylor-NOAA, the jitter-fix branch should also fix the bug you were getting, which is totally separate. I will start a pull request once the checks are done.

@okenk okenk mentioned this issue Jan 22, 2024
@MOshima-PIFSC
Copy link
Contributor Author

Thanks @okenk for working on this, sorry for the delayed response. I tried building the pkgdown site from your fork but I am still getting the same error. I should mention that I'm working off of the FEAT-cookbook-recipe-articles branch currently and the jitter vignette isn't on the main branch. So that may be why your action passed.

@okenk
Copy link
Contributor

okenk commented Jan 31, 2024

Hmm, yes, I was probably using the wrong branch to test the GitHub action. Bummer! I am at a workshop this week but will try to get to this on Friday. (Someday maybe we will be able to work on this concurrently. Haha.)

@iantaylor-NOAA, maybe you did correctly diagnose the issue but I didn’t understand what you were saying?

@iantaylor-NOAA
Copy link
Contributor

I got wrapped up in meetings last week and dropped the ball on testing this, so tahnk you @MOshima-PIFSC.
I'll take a look as soon as I can.

@e-perl-NOAA
Copy link
Contributor

e-perl-NOAA commented Feb 1, 2024

I think part of the problem is the file_increment() function. I don't think it's needed anymore and is renaming files but then the other r4ss functions aren't reading the new file names correctly so it's not copying them. When I comment that out and start running the jitter function line by line with the inputs from the vignette, it seems to be working.

@e-perl-NOAA
Copy link
Contributor

e-perl-NOAA commented Feb 2, 2024

@MOshima-PIFSC can you try using the try-fix-jitter branch with your package? Would probably require using install_github("r4ss/r4ss", ref = "try-fix-jitter") in the vignette temporarily. (this branch isn't passing r cmd check right now because of some par file issues that have arisen after an ss3 change that I'm working on fixing now in r4ss but the jitter function should be fine)

@MOshima-PIFSC
Copy link
Contributor Author

@e-perl-NOAA that seems to be working now! One thing I did notice is the original Report file isn't saved as Report0.sso anymore so for the SSgetoutput(keyvec) I needed to switch from 0:Njitter to c("", 1:Njitter). It's not a big deal but may give some people some issues at first if they have older code they are working from.
Thanks again!

@e-perl-NOAA
Copy link
Contributor

@MOshima-PIFSC is it just the Report file that needs to be saved with a 0 after? If so that would be easy enough to add in.

@MOshima-PIFSC
Copy link
Contributor Author

@e-perl-NOAA I think it would need to be the files that get read in with SSgetoutput, so that looks like it include Report.sso, CompReport.sso, covar.sso, Forecast-report.sso, and warning.ss. It doesn't look like the jitter function runs the reference model before starting the jitters, which is maybe when the names were changed (from Report.sso to Report0.sso) in the previous code? I think that's ok, as long as the user is aware that the reference model won't be in the directory, unless they've run it separately.

@iantaylor-NOAA
Copy link
Contributor

@e-perl-NOAA, thanks for the proposed solution and @MOshima-PIFSC, for the feedback on this.
Let me try restoring the renaming of ss.par_0.sso instead of removing the file_increment() command, which should make it more similar to how it worked in the past.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants