Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change default for window size in EquivalentSourcesGB #487

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

indiauppal
Copy link
Member

Change the default for the window size in the EquivalentSourcesGB constructor. By default, approximately 5000 data points are in each window.

Relevant issues/PRs:
Fixes #425

…such that approximately 5000 data points are in each window.
@indiauppal
Copy link
Member Author

indiauppal commented Apr 16, 2024

@indiauppal
Copy link
Member Author

indiauppal commented Apr 23, 2024

  • Raise an warning for no. of points < 5e3

Completed with @santisoler

Update test to check for data points less than 5e3 and the associated warning.
@santisoler
Copy link
Member

This is starting to look great @indiauppal!

I'm leaving a few ideas after the meeting we had today:

  1. We should probably extend the test function for less than 5000 points and test if the outputs match the ones for a single window. Something like:
    data_windows, source_windows = eqs._create_windows(grid_coords)
    # The output lists should have a single element each (corresponding to the single window)
    assert len(data_windows) == 1
    assert len(source_windows) == 1
    # Check if all sources and data points are inside the window
    for coord in grid_coords:
        npt.assert_allclose(coord, coord[data_windows[0]])
    for coord in eqs.points_:
        npt.assert_allclose(coord, coord[source_windows[0]])
  2. We could explore replacing those np.aranges with slices. We only need those indices to slice the sources and coordinates arrays in the self._gradient_boosting method, so anything we could use to slice them that are more memory efficient should work. This is just a minor optimization, so it's not that critical for this PR.
  3. Since now we have another attribute the fit method assigns (the window_size_), we should add it to the list of Attributes in the class docstring. Something like the think I wrote in Change default value of depth in equivalent sources #491 for the depth_ argument:
    depth_ : float or None
    Estimated depth of the sources calculated as 4.5 times the median
    distance between first neighboring sources. This attribute is set to
    None if ``points`` is passed.
  4. It would be nice to add a check for the window_size argument in the __init__ method. Basically, we should raise an error if the passed value is a str and is not "default". That's the only string value that should be valid. Check this out for inspiration:
    if isinstance(depth, str) and depth != "default":
    raise ValueError(
    f"Found invalid 'depth' value equal to '{depth}'."
    "It should be 'default' or a numeric value."
    )

This is somewhat my personal wishlist for this PR, so feel free to assign me a few of these tasks if you want. As always, feel free to ask for help if you need it 🙂

Looking forward to see this merged!

@santisoler
Copy link
Member

@indiauppal, I'm updating this branch after the fix I made for the failing Mac testing. Remember to run a git pull to sync your local repo with the latest change in this branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Better default for window_size in EquivalentSourcesGB
2 participants