Add polynomial regression example #782

tjburch · 2024-02-21T01:52:17Z

This PR creates a new example notebook of polynomial regression using Bambi that I've been working on for the last few weeks. There's still some tweaks on the last example that need to be done, I'm hoping to get some feedback here.

Immediate questions:

Is there any appetite for including this example in the first place? It's a bit atypical compared to the others in that it's not featuring a unique functionality of Bambi itself, more in the formula component. However, I think there's some nuances that are nice to explain. This example also displays the impact of setting priors and using it with an experiment in mind, so I think those are interesting too. However, if it's better suited for a personal blog, that's fine, happy to do that.
If yes, there is a long aside in the middle, where I derive how orthogonal polynomials are created. Is that of interest, or beyond the scope? We could definitely find ways to trim it down, or cut it.
I am pretty certain the last model suffers from some type of mis-specification, however I've struggled pinning it down. I was hoping to be able to include a hierarchical effect on the linear term, but it seem to be fitting quite right. If anyone has a second to take a look at the DGP and the associated model and suggest other approaches, that'd be super helpful.

Thanks in advance for feedback, hopefully it's an interesting notebook regardless if it's suited for the example gallery or not.

review-notebook-app · 2024-02-21T01:52:22Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

codecov-commenter · 2024-02-21T02:22:32Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 90.03%. Comparing base (b5b9f09) to head (1b8a0b1).
Report is 8 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #782      +/-   ##
==========================================
+ Coverage   89.86%   90.03%   +0.16%     
==========================================
  Files          46       47       +1     
  Lines        3810     3934     +124     
==========================================
+ Hits         3424     3542     +118     
- Misses        386      392       +6

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

ColCarroll · 2024-02-21T04:24:03Z

A bit of a side note that I thought this was really nice.

I found the bit on orthogonal polynomials really interesting, perhaps as a separate notebook? In particular, I was disappointed when you tee'd up "these help with inference", then followed with "...but we won't use them". Show us the poly!

tjburch · 2024-02-21T04:29:08Z

I wanted to add another example with poly but I was worried it was getting long as-is!

A separate notebook seems like a good solution, one more focused on the orthogonal polynomials with more of a statistical model lean and then the current notebook with the I(x**2) approach.

ColCarroll · 2024-02-21T04:36:12Z

Also, my only productive feedback is that

df_falling.assign(tsquared=t**2)

is a neat trick to create a copy of df_falling with an extra column. I had trouble spotting it, but will try to remember to use that pattern more! I might be alone in failing to see that, but if there was a way to point out that you're doing that, it may be helpful.

tomicapretto · 2024-02-21T12:33:15Z

@tjburch thanks a lot! I quickly skimmed through it and I like it! I will have a more in-depth read and come back with feedback :)

zwelitunyiswa · 2024-02-21T13:19:05Z

@tjburch This is great! I love how you move between the science, the data, and the model, and a reader can see how the science is captured in the model, and how the model's posterior artifacts link back to the data and science. We also see the choices one has to make. I find the most valuable insights I have gleaned in Bayesianism is seeing people like Solomon Kurz and Andrew Heiss work through real examples and the internal dialogue they have.

tomicapretto · 2024-02-22T15:31:10Z

@tjburch I just checked the example. It is fantastic, very well done. A couple of notes

poly in Bambi is implemented in formulae (https://github.com/bambinos/formulae/blob/b00f53da4b092ea13eeeabe92866736e97d56db0/formulae/transforms.py#L400-L426). I'm not sure if it matches what R does (I just don't know)
About the hierarchical model...

Currently, it's specified as

But I think it's clearer and more explicit if we write it as

Changes

$i$ indexes time, $j$ indexes planet, and $k$ indexes astronaut
Made it explicit there's a prior for $\beta_{\text{planet}, j}$ (for every planet it's a 1d distribution)
Made the hierarchical prior for $\beta_{\text{astronaut}, k}$ explicit too.
The "for all" are not that needed as it could be understood, but it reinforces the fact that you have multiple 1d distributions

Then in Bambi, you can specify the priro for the group-specific effect as

priors = {
    "I(Time ** 2):Planet": bmb.Prior(
        'TruncatedNormal',
        mu=[
            -9.81 / 2,  # Earth
            -3.72 / 2,  # Mars
            -6.0 / 2    # PlanetX
        ],
        sigma=[ 
            0.025 / 2,  # Earth 
            0.02 / 2,   # Mars
            4         # PlanetX
        ],
        upper=[0, 0, 0]
    ),
   # Here it starts
    "Time|Astronaut": bmb.Prior(
       "Normal",
       mu=bmb.Prior("Normal", mu=4, sigma=1), # Population mean
       sigma=bmb.Prior("HalfNormal", sigma=1), # Population standard deviation
   # Here it ends
    )
}

measurement_mod_hier_prior = bmb.Model(
    formula='Height ~  I(Time**2):Planet +  (0 + Time|Astronaut) + 0',
    data=df,
    priors =priors,
    noncentered=False
)

Important note this is not the most common pattern in Bambi, but it's allowed. In Bambi the group-specific effects are usually a deflection around some common value (that's why we have common and group-specific effects). For example, when there are "random" slopes, we write ~ x + (x | group). The x common slope, and (x | group) are the deflections around that common slope. However, it's also possible to do what you wanted to do and just do ~ (x | group). One has to make sure not to center the prior around zero, as it's done by default, but around the population mean (as it's done in the code I shared).

As a sidenote, my physics knowledge is 0.1 on a scale from 0 to 10, so I cannot check whether all those things are correct or not, I simply trust you here 😄

Just let me know if you have other questions and/or you need help with something

tomicapretto · 2024-02-22T15:32:50Z

@tjburch regarding what you shared as "immediate questions"

Yes, this is an excellent example.
I really like having it, but it's also your content, so feel free to use it as your own blog post too.
I think I already helped in the previous comment, but happy to answer if you have questions

tjburch · 2024-02-22T15:35:52Z

Thanks @tomicapretto. I'll work on updating with the correct poly reference and give this a test! Much appreciated you taking a look at it.

Would a separate example with poly in a model be desired, and if so, in this notebook or do you think splitting it out into 2 (e.g. one "polynomial regression" and one "polynomial regression using orthogonal polynomials" or something along those lines?)

tomicapretto · 2024-02-22T15:39:57Z

Thanks @tomicapretto. I'll work on updating with the correct poly reference and give this a test! Much appreciated you taking a look at it.

Would a separate example with poly in a model be desired, and if so, in this notebook or do you think splitting it out into 2 (e.g. one "polynomial regression" and one "polynomial regression using orthogonal polynomials" or something along those lines?)

If you feel like you want to do it, yeah, I think two separate notebooks would be better. But it's not a problem if you want to leave it as it is (we can split things it in the future)

tjburch · 2024-02-29T04:30:04Z

I'm still working to clean up the original notebook, but I've just added a second notebook that's purely dedicated to just the orthogonal polynomials and using poly. It's ready for feedback if/when anyone gets time, and in the meantime, I'll be working to update the original notebook too.

review-notebook-app · 2024-03-04T15:06:48Z

View / edit / reply to this conversation on ReviewNB

tomicapretto commented on 2024-03-04T15:06:47Z
----------------------------------------------------------------

Line #1.    diagonal = np.diag(np.diag(r))

I think there's a small mistake here (two calls to np.diag())

tjburch commented on 2024-03-04T15:20:34Z
----------------------------------------------------------------

This is actually something I learned while putting this together. The first call of np.diag extracts the diagonal elements. The second call constructs a diagonalized matrix:

In [1]: import numpy as np
In [2]: mat = np.array([[1,2,3],[4,5,6],[7,8,9]])
In [3]: mat

Out[3]:

array([[1, 2, 3],

       [4, 5, 6],

       [7, 8, 9]])
In [4]: np.diag(mat)

Out[4]: array([1, 5, 9])
In [5]: np.diag(np.diag(mat))

Out[5]:

array([[1, 0, 0],

       [0, 5, 0],

       [0, 0, 9]])

I will make a note of that though.

tomicapretto · 2024-03-04T15:07:21Z

@tjburch the orthogonal polynomial notebook is just AMAZING! Thanks!!

tjburch · 2024-03-04T15:10:02Z

Thanks! Hoping to have the original cleaned up shortly. You may have seen that I deleted a previous comment, I'm running into errors when truncating the response variable, which I think is contributing to the underestimates. I was able to reproduce the toy example in #752 this morning though, so I just need to side-by-side to try to understand what's different.

tomicapretto · 2024-03-04T15:18:21Z

@tjburch I could swear I saw a comment and when I came back today, I couldn't find it, haha! If a truncated normal does not work, you could try another distribution with a positive domain (the problem is that you'll likely need a non-identity link function and then the meaning of parameter estimates change and that's another problem)

tjburch · 2024-03-04T15:20:35Z

This is actually something I learned while putting this together. The first call of np.diag extracts the diagonal elements. The second call constructs a diagonalized matrix:

In [1]: import numpy as np
In [2]: mat = np.array([[1,2,3],[4,5,6],[7,8,9]])
In [3]: mat

Out[3]:

array([[1, 2, 3],

       [4, 5, 6],

       [7, 8, 9]])
In [4]: np.diag(mat)

Out[4]: array([1, 5, 9])
In [5]: np.diag(np.diag(mat))

Out[5]:

array([[1, 0, 0],

       [0, 5, 0],

       [0, 0, 9]])

View entire conversation on ReviewNB

tomicapretto · 2024-03-05T22:13:19Z

This is actually something I learned while putting this together. The first call of np.diag extracts the diagonal elements. The second call constructs a diagonalized matrix:

In [1]: import numpy as np
In [2]: mat = np.array([[1,2,3],[4,5,6],[7,8,9]])

In [3]: mat
Out[3]:
array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [4]: np.diag(mat)
Out[4]: array([1, 5, 9])

In [5]: np.diag(np.diag(mat))
Out[5]:
array([[1, 0, 0],
       [0, 5, 0],
       [0, 0, 9]])

View entire conversation on ReviewNB

tjburch · 2024-04-16T01:25:29Z

Alright, I think this is probably ready to no longer be a WIP and good for final review.

I took out the last example with multiple astronauts in the original notebook, to me it didn't seem to help users understand more information and injecting priors didn't really help the estimates I hoped it would, so it seemed like not a huge value add. If anyone feels strongly otherwise, I can go put it back in, but I think that the notebooks as-is do a pretty good job conveying the ins-and-outs of polynomial regression.

tomicapretto · 2024-04-16T10:34:09Z

Amazing @tjburch! I'll review it soon

tomicapretto · 2024-04-27T01:40:49Z

@tjburch I didn't mean to close this. I tried to add some changes to this branch but I broke something :S. I'm trying to fix it.

tjburch · 2024-04-27T01:42:50Z

Whoops! If you're unable to recover, let me know and I can open a new PR or something, but all things equal it'd be nice to keep some of this prior discussion.

tjburch changed the title ~~WIP: Add polynomial regression example~~ Add polynomial regression example Apr 16, 2024

tomicapretto closed this Apr 27, 2024

tomicapretto force-pushed the main branch from 1b8a0b1 to b5aefcf Compare April 27, 2024 01:38

tjburch mentioned this pull request Apr 27, 2024

Add polynomial regression example (reopen) #809

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add polynomial regression example #782

Add polynomial regression example #782

tjburch commented Feb 21, 2024

review-notebook-app bot commented Feb 21, 2024

codecov-commenter commented Feb 21, 2024 •

edited

ColCarroll commented Feb 21, 2024

tjburch commented Feb 21, 2024

ColCarroll commented Feb 21, 2024

tomicapretto commented Feb 21, 2024

zwelitunyiswa commented Feb 21, 2024

tomicapretto commented Feb 22, 2024 •

edited

tomicapretto commented Feb 22, 2024

tjburch commented Feb 22, 2024

tomicapretto commented Feb 22, 2024

tjburch commented Feb 29, 2024

review-notebook-app bot commented Mar 4, 2024 •

edited

tomicapretto commented Mar 4, 2024

tjburch commented Mar 4, 2024

tomicapretto commented Mar 4, 2024

tjburch commented Mar 4, 2024

tomicapretto commented Mar 5, 2024

tjburch commented Apr 16, 2024

tomicapretto commented Apr 16, 2024

tomicapretto commented Apr 27, 2024

tjburch commented Apr 27, 2024

Add polynomial regression example #782

Add polynomial regression example #782

Conversation

tjburch commented Feb 21, 2024

review-notebook-app bot commented Feb 21, 2024

codecov-commenter commented Feb 21, 2024 • edited

Codecov Report

ColCarroll commented Feb 21, 2024

tjburch commented Feb 21, 2024

ColCarroll commented Feb 21, 2024

tomicapretto commented Feb 21, 2024

zwelitunyiswa commented Feb 21, 2024

tomicapretto commented Feb 22, 2024 • edited

tomicapretto commented Feb 22, 2024

tjburch commented Feb 22, 2024

tomicapretto commented Feb 22, 2024

tjburch commented Feb 29, 2024

review-notebook-app bot commented Mar 4, 2024 • edited

tomicapretto commented Mar 4, 2024

tjburch commented Mar 4, 2024

tomicapretto commented Mar 4, 2024

tjburch commented Mar 4, 2024

tomicapretto commented Mar 5, 2024

tjburch commented Apr 16, 2024

tomicapretto commented Apr 16, 2024

tomicapretto commented Apr 27, 2024

tjburch commented Apr 27, 2024

codecov-commenter commented Feb 21, 2024 •

edited

tomicapretto commented Feb 22, 2024 •

edited

review-notebook-app bot commented Mar 4, 2024 •

edited