Change default for window size in EquivalentSourcesGB #487

indiauppal · 2024-04-02T18:36:42Z

Change the default for the window size in the EquivalentSourcesGB constructor. By default, approximately 5000 data points are in each window.

Relevant issues/PRs:
Fixes #425

…such that approximately 5000 data points are in each window.

indiauppal · 2024-04-16T17:49:28Z

Add a test to check window size default. Use numpy.testing.assert_allclose

…aising an error. Need to raise an error for data points less than and equal to 5e3.

indiauppal · 2024-04-23T16:59:34Z

Raise an warning for no. of points < 5e3

Completed with @santisoler

Update test to check for data points less than 5e3 and the associated warning.

santisoler · 2024-05-07T16:46:03Z

This is starting to look great @indiauppal!

I'm leaving a few ideas after the meeting we had today:

We should probably extend the test function for less than 5000 points and test if the outputs match the ones for a single window. Something like:

data_windows, source_windows = eqs._create_windows(grid_coords)
# The output lists should have a single element each (corresponding to the single window)
assert len(data_windows) == 1
assert len(source_windows) == 1
# Check if all sources and data points are inside the window
for coord in grid_coords:
    npt.assert_allclose(coord, coord[data_windows[0]])
for coord in eqs.points_:
    npt.assert_allclose(coord, coord[source_windows[0]])

We could explore replacing those np.aranges with slices. We only need those indices to slice the sources and coordinates arrays in the self._gradient_boosting method, so anything we could use to slice them that are more memory efficient should work. This is just a minor optimization, so it's not that critical for this PR.

Since now we have another attribute the fit method assigns (the window_size_), we should add it to the list of Attributes in the class docstring. Something like the think I wrote in Change default value of depth in equivalent sources #491 for the depth_ argument:

harmonica/harmonica/_equivalent_sources/cartesian.py

Lines 142 to 145 in 55309c7

    
               depth_ : float or None 
        
                   Estimated depth of the sources calculated as 4.5 times the median 
        
                   distance between first neighboring sources. This attribute is set to 
        
                   None if ``points`` is passed.

It would be nice to add a check for the window_size argument in the __init__ method. Basically, we should raise an error if the passed value is a str and is not "default". That's the only string value that should be valid. Check this out for inspiration:

harmonica/harmonica/_equivalent_sources/cartesian.py

Lines 176 to 180 in 55309c7

    
           if isinstance(depth, str) and depth != "default": 
        
               raise ValueError( 
        
                   f"Found invalid 'depth' value equal to '{depth}'." 
        
                   "It should be 'default' or a numeric value." 
        
               )

This is somewhat my personal wishlist for this PR, so feel free to assign me a few of these tasks if you want. As always, feel free to ask for help if you need it 🙂

Looking forward to see this merged!

santisoler · 2024-05-07T21:48:09Z

@indiauppal, I'm updating this branch after the fix I made for the failing Mac testing. Remember to run a git pull to sync your local repo with the latest change in this branch.

Add India Uppal to the author list.

This reverts commit 14b1f65.

Change window size in EquivalentSourcesGB to calculate a window size …

7bd1a40

…such that approximately 5000 data points are in each window.

indiauppal and others added 3 commits April 16, 2024 19:52

Merge branch 'main' into window_size

4dc0eb0

Merge branch 'main' into window_size

6646e7f

Add a test for the window size. Test for window size less than 5000 r…

b97064a

…aising an error. Need to raise an error for data points less than and equal to 5e3.

indiauppal added 2 commits May 7, 2024 16:54

Raise warning for data points less than or equal to 5e3.

3f0beea

Update test to check for data points less than 5e3 and the associated warning.

Run black.

0474691

santisoler and others added 4 commits May 7, 2024 14:48

Merge branch 'main' into window_size

39233a0

Add India Uppal to AUTHORS.md

14b1f65

Add India Uppal to the author list.

Revert "Add India Uppal to AUTHORS.md"

e33afa3

This reverts commit 14b1f65.

Merge branch 'main' into window_size

3e63e00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change default for window size in EquivalentSourcesGB #487

Change default for window size in EquivalentSourcesGB #487

indiauppal commented Apr 2, 2024

indiauppal commented Apr 16, 2024 •

edited

indiauppal commented Apr 23, 2024 •

edited

santisoler commented May 7, 2024

santisoler commented May 7, 2024

Change default for window size in EquivalentSourcesGB #487

Are you sure you want to change the base?

Change default for window size in EquivalentSourcesGB #487

Conversation

indiauppal commented Apr 2, 2024

indiauppal commented Apr 16, 2024 • edited

indiauppal commented Apr 23, 2024 • edited

santisoler commented May 7, 2024

santisoler commented May 7, 2024

indiauppal commented Apr 16, 2024 •

edited

indiauppal commented Apr 23, 2024 •

edited