Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ie #57

Draft
wants to merge 39 commits into
base: main
Choose a base branch
from
Draft

Ie #57

wants to merge 39 commits into from

Conversation

jmsull
Copy link
Collaborator

@jmsull jmsull commented Feb 5, 2022

Toward a hierarchy-less version - eventually want to compare speed against hierarchy.

@codecov-commenter
Copy link

codecov-commenter commented Feb 5, 2022

Codecov Report

Merging #57 (edbaa86) into main (de08e4d) will decrease coverage by 21.71%.
The diff coverage is 0.28%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@             Coverage Diff             @@
##             main      #57       +/-   ##
===========================================
- Coverage   46.09%   24.39%   -21.71%     
===========================================
  Files           7        8        +1     
  Lines         692     1279      +587     
===========================================
- Hits          319      312        -7     
- Misses        373      967      +594     
Impacted Files Coverage Δ
src/Bolt.jl 100.00% <ø> (ø)
src/ie.jl 0.00% <0.00%> (ø)
src/ionization/ionization.jl 22.03% <ø> (-2.56%) ⬇️
src/ionization/recfast.jl 91.78% <ø> (-0.08%) ⬇️
src/perturbations.jl 0.00% <0.00%> (ø)
src/spectra.jl 0.00% <0.00%> (ø)
src/background.jl 80.00% <28.57%> (-6.37%) ⬇️

... and 1 file with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

Copy link
Collaborator Author

@jmsull jmsull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what I have so far for IE solver

@jmsull
Copy link
Collaborator Author

jmsull commented Jun 2, 2022

About to add a cleaned up first scipt version (ie_single_k.jl) that (finally!) works. It is slow and is only tested for a single relatively large k (~0.03 h/Mpc), but it works!!

In action with a starting guess of zero (for quadrupole parts) on the photon monopole:
ie_temp_mono

Upcoming improvements:

  • clean up the iteration code
  • try different (namely, higher) k
  • set some scheme for the time integration points that is more efficient than equispaced in x (to go faster)
  • speed up the iteration (+ better initial guess for Pi)

@jmsull
Copy link
Collaborator Author

jmsull commented Jun 2, 2022

The code I pushed has the file I described above, which uses the equations here (but in conformal newton gauge as I wrote out in the notes).

I also turned the RSA off in ie.jl (which is a file that only evolves the low-order moments of the photons and uses the splines for quadrupoles).

Some notes about this last point:

RSA serves two purposes:

  1. Give the solver less computational work to do by reducing the number of ODEs that need to be solved at late times.
  2. Prevent boundary condition reflection blow up of photon perturbations.

The latter is entirely a numerical artefact of the hierarchy - with the ie solution we can properly resolve photon oscillations without the boundary issue. While aesthetically pleasing to say we don't need to make the RSA assumption (for a finite number of multipoles), it is probably desirable from a computational standpoint to continue using RSA. But we no longer absolutely have to to get a correct answer (and I wonder if there are any science cases for late-time radiation-like perturbations...).

@xzackli
Copy link
Owner

xzackli commented Jun 3, 2022

This PR is at a very exciting step! I'm finding your code to be very clear, especially for a first draft.

For my own understanding, I still need to do a bit of derivation to go from the hierarchy to the IE, as you've done in your notes!

@jmsull
Copy link
Collaborator Author

jmsull commented Feb 9, 2023

Going to start posting some numerical experiment results (plots/timings/norms) here for the massless neutrino FFT dependence on numerical parameters. (Everything here is in conformal time integration, though I don't find much of a difference between stepping in $\eta$ and $x$ in overall accuracy).

Varying:

  • Full-to-truncated switch $\eta_{\mathrm{switch}}$
  • Number of FFT points $M$
  • Number of iterations $N_{I}$

Also will look at truncating the number of multipoles in the initial solve and using a better initial ansatz, as by default here I will use $\ell_{\nu} = 50$ in the initial full hierarchy and the dumbest ansatz, which is an array of all zeros.

Timings should be taken with a grain of salt because the FFT code is 1. Unoptimized and 2. Still carrying the weight of the full hierarchy for the photons and massive neutrinos, but I think are still useful for a rough indication at least in the tradeoff of $M$ vs $N_{I}$.

@jmsull
Copy link
Collaborator Author

jmsull commented Feb 10, 2023

Fiducial values of the main parameters (fixed to this unless otherwise stated): $\eta_{\mathrm{switch}} = 1 \mathrm{Mpc}$, $M=8192$, $N_{I}=5$. And right now this is all for $k=0.03 h/\mathrm{Mpc}$.

Vary $M$ and $N_{I}$ at fixed switch

mslss_k0 03_switch1 1_elmax50_zeroini

Black is the hierarchy. Clearly more M helps. Here is some summary output:

3.914403 seconds (621.16 k allocations: 219.230 MiB, 15.97% gc time)
(M = 4096, Nᵢ = 5),
error against full hierarchy is
ℓ=0: 0.024327736624258293,
ℓ=2: 0.004968625571585811

3.337983 seconds (975.75 k allocations: 391.330 MiB, 1.96% gc time)
(M = 8192, Nᵢ = 5),
error against full hierarchy is
ℓ=0: 5.628201856013467e-5,
ℓ=2: 9.69840450718242e-6

3.989557 seconds (1.71 M allocations: 736.494 MiB, 5.39% gc time)
(M = 16384, Nᵢ = 5),
error against full hierarchy is
ℓ=0: 9.016324731957222e-6,
ℓ=2: 1.6088676485945645e-6

Without any look at the flamegraph it seems like most of the time is not going into the FFT, since increasing M doesn't affect the walltime (which, note, is @time and not @Btime), even though it affects the accuracy.

Note here that the error is sum( (iter-hierarchy)^2 /M )

@jmsull
Copy link
Collaborator Author

jmsull commented Feb 10, 2023

Here is the same test at higher $k = 0.3 ~h/\mathrm{Mpc}$:

mslss_k0 30_switch1 1_elmax50_zeroini

14.687309 seconds (3.66 M allocations: 420.126 MiB, 0.61% gc time, 14.60% compilation time)
(M = 4096, Nᵢ = 5),
error against full hierarchy is
ℓ=0: 0.0027027526587004207,
ℓ=2: 0.0005436380175114952

12.744275 seconds (1.48 M allocations: 456.152 MiB, 0.79% gc time)
(M = 8192, Nᵢ = 5),
error against full hierarchy is
ℓ=0: 1.7713887051272318e-5,
ℓ=2: 2.6232322357963993e-6

12.747244 seconds (2.20 M allocations: 799.949 MiB, 1.47% gc time)
(M = 16384, Nᵢ = 5),
error against full hierarchy is
ℓ=0: 6.691494413715124e-6,
ℓ=2: 9.531342334122629e-7

Obviously this takes much longer (but we might explore relaxing the tolerance?). Either way convergence results are qualitatively similar (error is actually lower than for lower k).

Again most of the time is probably spent in the truncated ODE solve, and I would imagine dropping the photon and massive neutrino hierarchies will help down the line.

To follow up on this, the time for the solutions of the "late" full hierarchy is
3.097427 seconds (91.47 k allocations: 16.091 MiB)
while the "early" time is less than half a second.

Each truncated solve costs around 2.4 seconds
2.357683 seconds (115.19 k allocations: 17.830 MiB)

  • which checks out with the 5 iter time of 12.7ish above.

So we are saving only around 30% of the runtime by truncating the neutrinos. Will revisit these timings again later, but this helps (at least me) get oriented.

@jmsull
Copy link
Collaborator Author

jmsull commented Feb 10, 2023

Vary the switch and $M$ at fixed $N_{I}$

Here $N_{I}$ is fixed to 5 and I vary the switch from $1 \mathrm{Mpc}$ to [0.5,10.,100.] $\mathrm{Mpc}$. $M$ is varied as above.

For reference, for this k-mode, the "horizon scale" (for which there is no obvious definition) is where $\frac{k}{\mathcal{H}}=1$ is $\eta=37 \mathrm{Mpc}$.
Note that the place where the variation in the monopole transitions from exponential to linear happens after this...

mslss_k0 03_switch0 5_elmax50_zeroini

Output:

2.477748 seconds (640.01 k allocations: 220.874 MiB, 1.51% gc time)
(M = 4096, Nᵢ = 5),
error against full hierarchy is
ℓ=0: 44.97438381582731,
ℓ=2: 9.615684393834469

2.272631 seconds (996.48 k allocations: 391.946 MiB, 3.80% gc time)
(M = 8192, Nᵢ = 5),
error against full hierarchy is
ℓ=0: 0.05185301616318226,
ℓ=2: 0.010879956079524676

2.579722 seconds (1.72 M allocations: 737.093 MiB, 6.44% gc time)
(M = 16384, Nᵢ = 5),
error against full hierarchy is
ℓ=0: 9.873425689145687e-5,
ℓ=2: 1.799027061434055e-5

mslss_k0 03_switch1 0_elmax50_zeroini

Output:

2.211860 seconds (618.90 k allocations: 218.756 MiB, 2.60% gc time)
(M = 4096, Nᵢ = 5),
error against full hierarchy is
ℓ=0: 0.05024428535457524,
ℓ=2: 0.010362831120741907

2.343475 seconds (979.42 k allocations: 391.376 MiB, 5.47% gc time)
(M = 8192, Nᵢ = 5),
error against full hierarchy is
ℓ=0: 9.89168854707918e-5,
ℓ=2: 1.764483646486915e-5

2.749802 seconds (1.72 M allocations: 736.774 MiB, 5.79% gc time)
(M = 16384, Nᵢ = 5),
error against full hierarchy is
ℓ=0: 9.303373602457604e-6,
ℓ=2: 1.617072960746139e-6

mslss_k0 03_switch10 0_elmax50_zeroini

Output:

2.489212 seconds (591.94 k allocations: 217.901 MiB, 1.63% gc time)
(M = 4096, Nᵢ = 5),
error against full hierarchy is
ℓ=0: 1.0948808134474916e-5,
ℓ=2: 2.0282281959209817e-6

3.096040 seconds (963.15 k allocations: 390.454 MiB, 3.00% gc time)
(M = 8192, Nᵢ = 5),
error against full hierarchy is
ℓ=0: 9.158093237790724e-6,
ℓ=2: 1.7164577184716518e-6

2.870788 seconds (1.70 M allocations: 735.816 MiB, 6.01% gc time)
(M = 16384, Nᵢ = 5),
error against full hierarchy is
ℓ=0: 8.752670467463331e-6,
ℓ=2: 1.6362669057618164e-6

mslss_k0 03_switch100 0_elmax50_zeroini

Output:

1.982466 seconds (567.07 k allocations: 216.046 MiB, 2.08% gc time)
(M = 4096, Nᵢ = 5),
error against full hierarchy is
ℓ=0: 6.783373104924727e-5,
ℓ=2: 3.2583090867381304e-5

2.086790 seconds (931.32 k allocations: 388.667 MiB, 3.72% gc time)
(M = 8192, Nᵢ = 5),
error against full hierarchy is
ℓ=0: 6.71768974129022e-5,
ℓ=2: 3.0038297397818197e-5

2.421192 seconds (1.67 M allocations: 733.976 MiB, 6.74% gc time)
(M = 16384, Nᵢ = 5),
error against full hierarchy is
ℓ=0: 6.763186632413128e-5,
ℓ=2: 2.8919112145994086e-5

#--------

Again, here is the same thing for the higher-$k$ mode ($k=0.03~h/\mathrm{Mpc}, "horizon scale" 3.4 Mpc):
mslss_k0 30_switch0 5_elmax50_zeroini

Output:

9.759966 seconds (1.10 M allocations: 281.695 MiB)
(M = 4096, Nᵢ = 5),
error against full hierarchy is
ℓ=0: 0.0021914827906542155,
ℓ=2: 0.0004347068032540287

8.746645 seconds (1.49 M allocations: 455.735 MiB, 5.16% gc time)
(M = 8192, Nᵢ = 5),
error against full hierarchy is
ℓ=0: 2.4660373189609712e-5,
ℓ=2: 2.686193755159691e-6

7.930377 seconds (2.20 M allocations: 799.368 MiB, 1.31% gc time)
(M = 16384, Nᵢ = 5),
error against full hierarchy is
ℓ=0: 8.324277287684964e-6,
ℓ=2: 1.0597585399984886e-6

mslss_k0 30_switch1 0_elmax50_zeroini

Output:

8.164170 seconds (1.11 M allocations: 282.128 MiB, 0.80% gc time)
(M = 4096, Nᵢ = 5),
error against full hierarchy is
ℓ=0: 5.9543352472409525e-5,
ℓ=2: 7.192616175038779e-6

8.267998 seconds (1.45 M allocations: 453.342 MiB, 1.47% gc time)
(M = 8192, Nᵢ = 5),
error against full hierarchy is
ℓ=0: 1.5593479481400864e-5,
ℓ=2: 1.9421464632261884e-6

8.791901 seconds (2.20 M allocations: 798.750 MiB, 1.27% gc time)
(M = 16384, Nᵢ = 5),
error against full hierarchy is
ℓ=0: 6.781303864436087e-6,
ℓ=2: 9.661082440228582e-7

mslss_k0 30_switch10 0_elmax50_zeroini

Output:

8.845261 seconds (1.06 M allocations: 279.297 MiB, 0.77% gc time)
(M = 4096, Nᵢ = 5),
error against full hierarchy is
ℓ=0: 3.380758869776408e-5,
ℓ=2: 9.897320474102225e-6

8.691568 seconds (1.44 M allocations: 452.829 MiB, 0.70% gc time)
(M = 8192, Nᵢ = 5),
error against full hierarchy is
ℓ=0: 1.416734005629872e-5,
ℓ=2: 5.76125185991097e-6

9.181634 seconds (2.18 M allocations: 798.308 MiB, 1.96% gc time)
(M = 16384, Nᵢ = 5),
error against full hierarchy is
ℓ=0: 9.540303946926921e-6,
ℓ=2: 4.245814075267681e-6

mslss_k0 30_switch100 0_elmax50_zeroini

Output:

8.330425 seconds (1.05 M allocations: 278.212 MiB, 0.73% gc time)
(M = 4096, Nᵢ = 5),
error against full hierarchy is
ℓ=0: 6.854331049445173e-6,
ℓ=2: 2.56038594633846e-6

8.464808 seconds (1.42 M allocations: 451.609 MiB, 0.65% gc time)
(M = 8192, Nᵢ = 5),
error against full hierarchy is
ℓ=0: 6.4147161848718985e-6,
ℓ=2: 2.4778285049901822e-6

8.088516 seconds (2.16 M allocations: 796.813 MiB, 2.21% gc time)
(M = 16384, Nᵢ = 5),
error against full hierarchy is
ℓ=0: 6.309153736510417e-6,
ℓ=2: 2.4354245288514206e-6

Walltime decreases as we start later (makes sense, less ODE to solve).
The error metric also decreases as we start later. At face value this makes no sense, but looking at the plots for later switches (if you squint) you can see that this is because the later starts only have to do the FFT in the easy-to-FFT nearly flat wiggle zone, and I am only measuring the error where we actually do the FFT.

@jmsull
Copy link
Collaborator Author

jmsull commented Feb 10, 2023

Hmm I may need to fix that first plot for $k=0.03$ but it should agree quite well for higher $M$

@jmsull
Copy link
Collaborator Author

jmsull commented Feb 12, 2023

Here is that first $k=0.03$ plot at higher $M$ where the range of the plot is smaller (though only fair comparison with others is the lowest $M$ here vs highest $M$ in other plots)
mslss_k0 03_switch0 5_elmax50_zeroini

@jmsull
Copy link
Collaborator Author

jmsull commented Feb 13, 2023

Now that these experiments are done I will move on to massive neutrinos. I won't repeat all these tests but will post a few quick checks that nothing is broken. In principle, massless neutrinos should be the "worst" for the purposes of the iterative convolution integral since they are the least smooth (have the most high-frequency content) of all the neutrinos, but I could be wrong...

Once that's done, I will:

  • Move all the FFT iteration tools into another file
  • Build the infrastructure for mixing photon iteration with neutrino iteration
  • Make the "sanity check" into a testsuite test with some tolerance

We might want to pick a tolerance rather than a fixed number of iterations in the future, since it will be a huge waste of time for the user if we take 5 iterations when using a good ansatz that would give convergence in 1 step.

@jmsull
Copy link
Collaborator Author

jmsull commented Mar 28, 2023

These last few commits remove the monopole and quadrupole entirely from the neutrino solution in the truncated hierarchies (per @marius311's correction of my oversight there).
This gets rid of these weird singular matrix and dtmin errors (woo!).
The massive neutrino iteration experiments are not working from constant/zero ansatz unless I use an insane amount of FFT points, so will need to spend a little time debugging...

@jmsull
Copy link
Collaborator Author

jmsull commented Mar 28, 2023

So I'm going to ignore it for now, but the conformal time version of the iteration is not working after the first iteration for some reason.
When I use the x-evolution for the truncated hierarchy solve there appears to be no problem at all and the results look quite nice. I didn't see anything like this for the massless neutrinos alone, but when massive neutrinos are included something strange is happening.
The iteration error does not get smaller in L2 sense (while it does for x iteration).

Here are some results using the x iteration (like the old massless experiments). Here I use a switch of 5.0 Mpc and $M=8192$ with a constant ansatz. Results are not perfect at early times (this gets better either with a later $\eta$ switch or with more $M$).

mssv_k0 03_q0_allconst_xiter_switch5 0

Here is q1 for massive neutrinos:
mssv_k0 03_q1log_allconst_xiter_switch5 0

Here is q15 as well:
mssv_k0 03_qendlog_allconst_xiter_switch5 0

These last couple are a bit challenging to see in log unfortunately

@jmsull
Copy link
Collaborator Author

jmsull commented Mar 28, 2023

Here also are the same things but with a Planck cosmology (full) hierarchy, but no neutrino mass, initialization:

massless:
mssv_k0 03_q0_plancknonu_xiter_switch5 0

massive q1:
mssv_k0 03_q1log_plancknonu_xiter_switch5 0

massive qend:
mssv_k0 03_qendlog_plancknonu_xiter_switch5 0

@jmsull
Copy link
Collaborator Author

jmsull commented Mar 28, 2023

Now (finally) moving on to photon iterative unification....

@jmsull
Copy link
Collaborator Author

jmsull commented Apr 4, 2023

With these changes, unification is complete! But there are some residual numerical issues with the k=1.0 h/Mpc mode I am looking at before moving on to performance (the sanity check passes for the other two modes we have been looking at k=0.03 and 0.3)

@jmsull
Copy link
Collaborator Author

jmsull commented Apr 6, 2023

The issue I mentioned with high-k mode probably needs to be investigated more, but there is nothing wrong with the iteration procedure - after sufficiently many iterations the result still converges.

Here also is a somewhat clean "experiment" file for the combined iteration where I started to look at performance.

@jmsull
Copy link
Collaborator Author

jmsull commented Apr 11, 2023

Added a first look at comparing solvers and the full vs truncated hierarchy times in in the "/scripts/combined_experiment_iter.jl" file

Getting an AD error for Rodas5P on the truncated hierarchy solve though, so will need to check that out....

Also AD fails due to the conformal time interpolator in the conformal solver

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants