Ie #57

jmsull · 2022-02-05T07:09:04Z

Toward a hierarchy-less version - eventually want to compare speed against hierarchy.

codecov-commenter · 2022-02-05T07:09:46Z

Codecov Report

Merging #57 (edbaa86) into main (de08e4d) will decrease coverage by 21.71%.
The diff coverage is 0.28%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@             Coverage Diff             @@
##             main      #57       +/-   ##
===========================================
- Coverage   46.09%   24.39%   -21.71%     
===========================================
  Files           7        8        +1     
  Lines         692     1279      +587     
===========================================
- Hits          319      312        -7     
- Misses        373      967      +594

Impacted Files	Coverage Δ
src/Bolt.jl	`100.00% <ø> (ø)`
src/ie.jl	`0.00% <0.00%> (ø)`
src/ionization/ionization.jl	`22.03% <ø> (-2.56%)`	⬇️
src/ionization/recfast.jl	`91.78% <ø> (-0.08%)`	⬇️
src/perturbations.jl	`0.00% <0.00%> (ø)`
src/spectra.jl	`0.00% <0.00%> (ø)`
src/background.jl	`80.00% <28.57%> (-6.37%)`	⬇️

... and 1 file with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

jmsull

what I have so far for IE solver

jmsull · 2022-06-02T20:37:51Z

About to add a cleaned up first scipt version (ie_single_k.jl) that (finally!) works. It is slow and is only tested for a single relatively large k (~0.03 h/Mpc), but it works!!

In action with a starting guess of zero (for quadrupole parts) on the photon monopole:

Upcoming improvements:

clean up the iteration code
try different (namely, higher) k
set some scheme for the time integration points that is more efficient than equispaced in x (to go faster)
speed up the iteration (+ better initial guess for Pi)

jmsull · 2022-06-02T21:39:30Z

The code I pushed has the file I described above, which uses the equations here (but in conformal newton gauge as I wrote out in the notes).

I also turned the RSA off in ie.jl (which is a file that only evolves the low-order moments of the photons and uses the splines for quadrupoles).

Some notes about this last point:

RSA serves two purposes:

Give the solver less computational work to do by reducing the number of ODEs that need to be solved at late times.
Prevent boundary condition reflection blow up of photon perturbations.

The latter is entirely a numerical artefact of the hierarchy - with the ie solution we can properly resolve photon oscillations without the boundary issue. While aesthetically pleasing to say we don't need to make the RSA assumption (for a finite number of multipoles), it is probably desirable from a computational standpoint to continue using RSA. But we no longer absolutely have to to get a correct answer (and I wonder if there are any science cases for late-time radiation-like perturbations...).

xzackli · 2022-06-03T02:26:35Z

This PR is at a very exciting step! I'm finding your code to be very clear, especially for a first draft.

For my own understanding, I still need to do a bit of derivation to go from the hierarchy to the IE, as you've done in your notes!

jmsull · 2023-02-09T17:42:24Z

Going to start posting some numerical experiment results (plots/timings/norms) here for the massless neutrino FFT dependence on numerical parameters. (Everything here is in conformal time integration, though I don't find much of a difference between stepping in $\eta$ and $x$ in overall accuracy).

Varying:

Full-to-truncated switch $\eta_{\mathrm{switch}}$
Number of FFT points $M$
Number of iterations $N_{I}$

Also will look at truncating the number of multipoles in the initial solve and using a better initial ansatz, as by default here I will use $\ell_{\nu} = 50$ in the initial full hierarchy and the dumbest ansatz, which is an array of all zeros.

Timings should be taken with a grain of salt because the FFT code is 1. Unoptimized and 2. Still carrying the weight of the full hierarchy for the photons and massive neutrinos, but I think are still useful for a rough indication at least in the tradeoff of $M$ vs $N_{I}$.

jmsull · 2023-02-10T08:09:12Z

Fiducial values of the main parameters (fixed to this unless otherwise stated): $\eta_{\mathrm{switch}} = 1 \mathrm{Mpc}$, $M=8192$, $N_{I}=5$. And right now this is all for $k=0.03 h/\mathrm{Mpc}$.

Vary $M$ and $N_{I}$ at fixed switch

Black is the hierarchy. Clearly more M helps. Here is some summary output:

3.914403 seconds (621.16 k allocations: 219.230 MiB, 15.97% gc time)
(M = 4096, Nᵢ = 5),
error against full hierarchy is
ℓ=0: 0.024327736624258293,
ℓ=2: 0.004968625571585811

3.337983 seconds (975.75 k allocations: 391.330 MiB, 1.96% gc time)
(M = 8192, Nᵢ = 5),
error against full hierarchy is
ℓ=0: 5.628201856013467e-5,
ℓ=2: 9.69840450718242e-6

3.989557 seconds (1.71 M allocations: 736.494 MiB, 5.39% gc time)
(M = 16384, Nᵢ = 5),
error against full hierarchy is
ℓ=0: 9.016324731957222e-6,
ℓ=2: 1.6088676485945645e-6

Without any look at the flamegraph it seems like most of the time is not going into the FFT, since increasing M doesn't affect the walltime (which, note, is @time and not @Btime), even though it affects the accuracy.

Note here that the error is sum( (iter-hierarchy)^2 /M )

jmsull · 2023-02-10T08:39:38Z

Here is the same test at higher $k = 0.3 ~h/\mathrm{Mpc}$:

14.687309 seconds (3.66 M allocations: 420.126 MiB, 0.61% gc time, 14.60% compilation time)
(M = 4096, Nᵢ = 5),
error against full hierarchy is
ℓ=0: 0.0027027526587004207,
ℓ=2: 0.0005436380175114952

12.744275 seconds (1.48 M allocations: 456.152 MiB, 0.79% gc time)
(M = 8192, Nᵢ = 5),
error against full hierarchy is
ℓ=0: 1.7713887051272318e-5,
ℓ=2: 2.6232322357963993e-6

12.747244 seconds (2.20 M allocations: 799.949 MiB, 1.47% gc time)
(M = 16384, Nᵢ = 5),
error against full hierarchy is
ℓ=0: 6.691494413715124e-6,
ℓ=2: 9.531342334122629e-7

Obviously this takes much longer (but we might explore relaxing the tolerance?). Either way convergence results are qualitatively similar (error is actually lower than for lower k).

Again most of the time is probably spent in the truncated ODE solve, and I would imagine dropping the photon and massive neutrino hierarchies will help down the line.

To follow up on this, the time for the solutions of the "late" full hierarchy is
3.097427 seconds (91.47 k allocations: 16.091 MiB)
while the "early" time is less than half a second.

Each truncated solve costs around 2.4 seconds
2.357683 seconds (115.19 k allocations: 17.830 MiB)

which checks out with the 5 iter time of 12.7ish above.

So we are saving only around 30% of the runtime by truncating the neutrinos. Will revisit these timings again later, but this helps (at least me) get oriented.

jmsull · 2023-02-10T09:54:06Z

Vary the switch and $M$ at fixed $N_{I}$

Here $N_{I}$ is fixed to 5 and I vary the switch from $1 \mathrm{Mpc}$ to [0.5,10.,100.] $\mathrm{Mpc}$. $M$ is varied as above.

For reference, for this k-mode, the "horizon scale" (for which there is no obvious definition) is where $\frac{k}{\mathcal{H}}=1$ is $\eta=37 \mathrm{Mpc}$.
Note that the place where the variation in the monopole transitions from exponential to linear happens after this...

Output:

2.477748 seconds (640.01 k allocations: 220.874 MiB, 1.51% gc time)
(M = 4096, Nᵢ = 5),
error against full hierarchy is
ℓ=0: 44.97438381582731,
ℓ=2: 9.615684393834469

2.272631 seconds (996.48 k allocations: 391.946 MiB, 3.80% gc time)
(M = 8192, Nᵢ = 5),
error against full hierarchy is
ℓ=0: 0.05185301616318226,
ℓ=2: 0.010879956079524676

2.579722 seconds (1.72 M allocations: 737.093 MiB, 6.44% gc time)
(M = 16384, Nᵢ = 5),
error against full hierarchy is
ℓ=0: 9.873425689145687e-5,
ℓ=2: 1.799027061434055e-5

Output:

2.211860 seconds (618.90 k allocations: 218.756 MiB, 2.60% gc time)
(M = 4096, Nᵢ = 5),
error against full hierarchy is
ℓ=0: 0.05024428535457524,
ℓ=2: 0.010362831120741907

2.343475 seconds (979.42 k allocations: 391.376 MiB, 5.47% gc time)
(M = 8192, Nᵢ = 5),
error against full hierarchy is
ℓ=0: 9.89168854707918e-5,
ℓ=2: 1.764483646486915e-5

2.749802 seconds (1.72 M allocations: 736.774 MiB, 5.79% gc time)
(M = 16384, Nᵢ = 5),
error against full hierarchy is
ℓ=0: 9.303373602457604e-6,
ℓ=2: 1.617072960746139e-6

Output:

2.489212 seconds (591.94 k allocations: 217.901 MiB, 1.63% gc time)
(M = 4096, Nᵢ = 5),
error against full hierarchy is
ℓ=0: 1.0948808134474916e-5,
ℓ=2: 2.0282281959209817e-6

3.096040 seconds (963.15 k allocations: 390.454 MiB, 3.00% gc time)
(M = 8192, Nᵢ = 5),
error against full hierarchy is
ℓ=0: 9.158093237790724e-6,
ℓ=2: 1.7164577184716518e-6

2.870788 seconds (1.70 M allocations: 735.816 MiB, 6.01% gc time)
(M = 16384, Nᵢ = 5),
error against full hierarchy is
ℓ=0: 8.752670467463331e-6,
ℓ=2: 1.6362669057618164e-6

Output:

1.982466 seconds (567.07 k allocations: 216.046 MiB, 2.08% gc time)
(M = 4096, Nᵢ = 5),
error against full hierarchy is
ℓ=0: 6.783373104924727e-5,
ℓ=2: 3.2583090867381304e-5

2.086790 seconds (931.32 k allocations: 388.667 MiB, 3.72% gc time)
(M = 8192, Nᵢ = 5),
error against full hierarchy is
ℓ=0: 6.71768974129022e-5,
ℓ=2: 3.0038297397818197e-5

2.421192 seconds (1.67 M allocations: 733.976 MiB, 6.74% gc time)
(M = 16384, Nᵢ = 5),
error against full hierarchy is
ℓ=0: 6.763186632413128e-5,
ℓ=2: 2.8919112145994086e-5

#--------

Again, here is the same thing for the higher-$k$ mode ($k=0.03~h/\mathrm{Mpc}, "horizon scale" 3.4 Mpc):
mslss_k0 30_switch0 5_elmax50_zeroini

Output:

9.759966 seconds (1.10 M allocations: 281.695 MiB)
(M = 4096, Nᵢ = 5),
error against full hierarchy is
ℓ=0: 0.0021914827906542155,
ℓ=2: 0.0004347068032540287

8.746645 seconds (1.49 M allocations: 455.735 MiB, 5.16% gc time)
(M = 8192, Nᵢ = 5),
error against full hierarchy is
ℓ=0: 2.4660373189609712e-5,
ℓ=2: 2.686193755159691e-6

7.930377 seconds (2.20 M allocations: 799.368 MiB, 1.31% gc time)
(M = 16384, Nᵢ = 5),
error against full hierarchy is
ℓ=0: 8.324277287684964e-6,
ℓ=2: 1.0597585399984886e-6

Output:

8.164170 seconds (1.11 M allocations: 282.128 MiB, 0.80% gc time)
(M = 4096, Nᵢ = 5),
error against full hierarchy is
ℓ=0: 5.9543352472409525e-5,
ℓ=2: 7.192616175038779e-6

8.267998 seconds (1.45 M allocations: 453.342 MiB, 1.47% gc time)
(M = 8192, Nᵢ = 5),
error against full hierarchy is
ℓ=0: 1.5593479481400864e-5,
ℓ=2: 1.9421464632261884e-6

8.791901 seconds (2.20 M allocations: 798.750 MiB, 1.27% gc time)
(M = 16384, Nᵢ = 5),
error against full hierarchy is
ℓ=0: 6.781303864436087e-6,
ℓ=2: 9.661082440228582e-7

Output:

8.845261 seconds (1.06 M allocations: 279.297 MiB, 0.77% gc time)
(M = 4096, Nᵢ = 5),
error against full hierarchy is
ℓ=0: 3.380758869776408e-5,
ℓ=2: 9.897320474102225e-6

8.691568 seconds (1.44 M allocations: 452.829 MiB, 0.70% gc time)
(M = 8192, Nᵢ = 5),
error against full hierarchy is
ℓ=0: 1.416734005629872e-5,
ℓ=2: 5.76125185991097e-6

9.181634 seconds (2.18 M allocations: 798.308 MiB, 1.96% gc time)
(M = 16384, Nᵢ = 5),
error against full hierarchy is
ℓ=0: 9.540303946926921e-6,
ℓ=2: 4.245814075267681e-6

Output:

8.330425 seconds (1.05 M allocations: 278.212 MiB, 0.73% gc time)
(M = 4096, Nᵢ = 5),
error against full hierarchy is
ℓ=0: 6.854331049445173e-6,
ℓ=2: 2.56038594633846e-6

8.464808 seconds (1.42 M allocations: 451.609 MiB, 0.65% gc time)
(M = 8192, Nᵢ = 5),
error against full hierarchy is
ℓ=0: 6.4147161848718985e-6,
ℓ=2: 2.4778285049901822e-6

8.088516 seconds (2.16 M allocations: 796.813 MiB, 2.21% gc time)
(M = 16384, Nᵢ = 5),
error against full hierarchy is
ℓ=0: 6.309153736510417e-6,
ℓ=2: 2.4354245288514206e-6

Walltime decreases as we start later (makes sense, less ODE to solve).
The error metric also decreases as we start later. At face value this makes no sense, but looking at the plots for later switches (if you squint) you can see that this is because the later starts only have to do the FFT in the easy-to-FFT nearly flat wiggle zone, and I am only measuring the error where we actually do the FFT.

jmsull · 2023-02-10T09:55:06Z

Hmm I may need to fix that first plot for $k=0.03$ but it should agree quite well for higher $M$

jmsull · 2023-02-12T19:55:35Z

Here is that first $k=0.03$ plot at higher $M$ where the range of the plot is smaller (though only fair comparison with others is the lowest $M$ here vs highest $M$ in other plots)
mslss_k0 03_switch0 5_elmax50_zeroini

jmsull · 2023-02-13T00:34:05Z

Now that these experiments are done I will move on to massive neutrinos. I won't repeat all these tests but will post a few quick checks that nothing is broken. In principle, massless neutrinos should be the "worst" for the purposes of the iterative convolution integral since they are the least smooth (have the most high-frequency content) of all the neutrinos, but I could be wrong...

Once that's done, I will:

Move all the FFT iteration tools into another file
Build the infrastructure for mixing photon iteration with neutrino iteration
Make the "sanity check" into a testsuite test with some tolerance

We might want to pick a tolerance rather than a fixed number of iterations in the future, since it will be a huge waste of time for the user if we take 5 iterations when using a good ansatz that would give convergence in 1 step.

jmsull · 2023-03-28T05:51:11Z

These last few commits remove the monopole and quadrupole entirely from the neutrino solution in the truncated hierarchies (per @marius311's correction of my oversight there).
This gets rid of these weird singular matrix and dtmin errors (woo!).
The massive neutrino iteration experiments are not working from constant/zero ansatz unless I use an insane amount of FFT points, so will need to spend a little time debugging...

jmsull · 2023-03-28T19:34:35Z

So I'm going to ignore it for now, but the conformal time version of the iteration is not working after the first iteration for some reason.
When I use the x-evolution for the truncated hierarchy solve there appears to be no problem at all and the results look quite nice. I didn't see anything like this for the massless neutrinos alone, but when massive neutrinos are included something strange is happening.
The iteration error does not get smaller in L2 sense (while it does for x iteration).

Here are some results using the x iteration (like the old massless experiments). Here I use a switch of 5.0 Mpc and $M=8192$ with a constant ansatz. Results are not perfect at early times (this gets better either with a later $\eta$ switch or with more $M$).

Here is q1 for massive neutrinos:

Here is q15 as well:

These last couple are a bit challenging to see in log unfortunately

jmsull · 2023-03-28T20:49:09Z

Here also are the same things but with a Planck cosmology (full) hierarchy, but no neutrino mass, initialization:

massless:

massive q1:

massive qend:

jmsull · 2023-03-28T20:54:52Z

Now (finally) moving on to photon iterative unification....

jmsull · 2023-04-04T17:15:12Z

With these changes, unification is complete! But there are some residual numerical issues with the k=1.0 h/Mpc mode I am looking at before moving on to performance (the sanity check passes for the other two modes we have been looking at k=0.03 and 0.3)

jmsull · 2023-04-06T20:36:49Z

The issue I mentioned with high-k mode probably needs to be investigated more, but there is nothing wrong with the iteration procedure - after sufficiently many iterations the result still converges.

Here also is a somewhat clean "experiment" file for the combined iteration where I started to look at performance.

jmsull · 2023-04-11T19:53:34Z

Added a first look at comparing solvers and the full vs truncated hierarchy times in in the "/scripts/combined_experiment_iter.jl" file

Getting an AD error for Rodas5P on the truncated hierarchy solve though, so will need to check that out....

Also AD fails due to the conformal time interpolator in the conformal solver

jmsull added 3 commits February 4, 2022 18:08

Polarization integrand sort of works. Zero rsa for polzn hierarchy.

fd7dacb

Add temperature integrand, works in the easy test.

672517c

Clean up a little

d891052

This was referenced Feb 5, 2022

zero polarization in RSA #56

Closed

Cosmological perturbations without the Boltzmann hierarchy #49

Closed

xzackli mentioned this pull request Feb 12, 2022

Enzyme isn't ready for use with Bolt #59

Open

jmsull commented May 26, 2022

View reviewed changes

update ie

fe3af65

first working ie code

a6ff1f3

Delete single_ie_step.jl

9f3a538

jmsull added 10 commits June 3, 2022 14:15

rsa bugfix

35ea056

initial fft nu test, a few un-broadcasts

197c13d

sparser ie integration points

d44a5ed

Wrapper struct for crude conformal evolution

d7fb813

Clean a bit

9cb705a

IE clean some

bb39d07

correct clight

ef843f2

refactor ie and opt

8893b0a

bufgix in conformal_hierarchy

275963a

conformal ie

36999d8

jmsull added 11 commits February 17, 2023 15:02

convolution script

99c6378

dirty ie source

9ce2274

fix conv script

c8907bb

latexstrings

55d2800

Before swapping fully to no interp fft

36182b6

add ie array

8484967

fix massive nu indexing error bc I am dumb

0026323

fix insane interpolator typing bug..., fix eta2x

99d775d

drop monopoles and quadrupoles from IEnus

fe109ec

indexing bugfixes for x fftsolve

3bc9576

Get conformal to work as well, massless

736d6ac

jmsull added 2 commits March 28, 2023 13:51

dirty working massive file, dipole only msslss

247002c

clean massive experiments file

edbaa86

jmsull added 5 commits March 28, 2023 16:30

combined ienu funcs (untested), drop unused, rsa

75b822e

Combined hierarchy that runs

8874e7d

reduce Mpcfacs by bg.eta spline change

23e824d

Mpcfac roundoff error fixed in script

a1b5ef0

clean away old IE structs/functions

a13b7af

jmsull added 2 commits April 4, 2023 10:30

put back confhier

ddc5dd3

put final iterative functions in ie, experiments

8ac1e02

jmsull added 2 commits April 10, 2023 11:21

use planck ansatz

2479b12

solvers and ode function profiling v1

1e8f959

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ie #57

Ie #57

jmsull commented Feb 5, 2022 •

edited

codecov-commenter commented Feb 5, 2022 •

edited

jmsull left a comment •

edited

jmsull commented Jun 2, 2022 •

edited

jmsull commented Jun 2, 2022

xzackli commented Jun 3, 2022

jmsull commented Feb 9, 2023 •

edited

jmsull commented Feb 10, 2023 •

edited

jmsull commented Feb 10, 2023 •

edited

jmsull commented Feb 10, 2023

jmsull commented Feb 10, 2023

jmsull commented Feb 12, 2023

jmsull commented Feb 13, 2023 •

edited

jmsull commented Mar 28, 2023

jmsull commented Mar 28, 2023

jmsull commented Mar 28, 2023

jmsull commented Mar 28, 2023

jmsull commented Apr 4, 2023

jmsull commented Apr 6, 2023

jmsull commented Apr 11, 2023 •

edited

Ie #57

Are you sure you want to change the base?

Ie #57

Conversation

jmsull commented Feb 5, 2022 • edited

codecov-commenter commented Feb 5, 2022 • edited

Codecov Report

jmsull left a comment • edited

Choose a reason for hiding this comment

jmsull commented Jun 2, 2022 • edited

jmsull commented Jun 2, 2022

xzackli commented Jun 3, 2022

jmsull commented Feb 9, 2023 • edited

jmsull commented Feb 10, 2023 • edited

Vary $M$ and $N_{I}$ at fixed switch

jmsull commented Feb 10, 2023 • edited

jmsull commented Feb 10, 2023

Vary the switch and $M$ at fixed $N_{I}$

jmsull commented Feb 10, 2023

jmsull commented Feb 12, 2023

jmsull commented Feb 13, 2023 • edited

jmsull commented Mar 28, 2023

jmsull commented Mar 28, 2023

jmsull commented Mar 28, 2023

jmsull commented Mar 28, 2023

jmsull commented Apr 4, 2023

jmsull commented Apr 6, 2023

jmsull commented Apr 11, 2023 • edited

jmsull commented Feb 5, 2022 •

edited

codecov-commenter commented Feb 5, 2022 •

edited

jmsull left a comment •

edited

jmsull commented Jun 2, 2022 •

edited

jmsull commented Feb 9, 2023 •

edited

jmsull commented Feb 10, 2023 •

edited

jmsull commented Feb 10, 2023 •

edited

jmsull commented Feb 13, 2023 •

edited

jmsull commented Apr 11, 2023 •

edited