Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate that custom datasets can interpolate #1684

Open
wants to merge 5 commits into
base: pre/2.7
Choose a base branch
from

Conversation

marc-flex
Copy link

This PR addresses issue #1668

@marc-flex marc-flex added the 2.7 will go into version 2.7.* label May 7, 2024
@marc-flex marc-flex self-assigned this May 7, 2024
@marc-flex
Copy link
Author

All I've done so far is to introduce a validator that checks that data can be interpolated in each of the dimensions.

Copy link
Collaborator

@tylerflex tylerflex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @marc-flex I think this is looking pretty good, just a few minor tweaks and also we should see if we want to apply this validator to all DataArray objects (as written currently) or rather as a Simulation post-init validator that loops over sources and does the interp check. I'm curious to see if the front end tests pass with these changes or if there are instances where we explicitly allow duplicate coords in some data array objects.

for dim in dims:
try:
val.interp({dim: 0})
except:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you'll want to except a specific Exception here (whatever xarray raises when interp() fails). One convenient way to do this while still retaining the original exception traceback (which can be useful for debugging) is the following pattern

try:
    ...
except ExceptionType as e:
    raise MyException(...) from e

@@ -93,6 +94,23 @@ def assign_data_attrs(cls, val):
val.attrs[attr_name] = attr
return val

@classmethod
def interp_validator(cls, val):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose this checks all DataArray objects, which maybe isn't so bad? but I guess this is mainly an issue for ModeSource and CustomFieldSource objects. I can't think of an instance where we want to allow a DataArray with repeating coords but it would be interesting to see if the tests pass.

dims = val.coords.dims
for dim in dims:
try:
val.interp({dim: 0})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might fail if the data is not numeric. For example, the ModeSource.direction can be ("-", "+"). It might be more robust to try an approach where we sel the first coordinate and try to interp into it.

val.interp_like(val.isel({val.dim: 0}))

val.interp({dim: 0})
except:
raise DataError(
f"DataArray.interp() fails to interpolate to {dim}=0. This may be caused by duplicated data "
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please put any code stuff in single (regular) quotes '. It will render nicer in the terminal output, thanks!

except:
raise DataError(
f"DataArray.interp() fails to interpolate to {dim}=0. This may be caused by duplicated data "
f'in this dimension (you can verify this with DataArray.drop_duplicates(dim="{dim}")). '
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how specifically can they verify with this command?

@tylerflex
Copy link
Collaborator

Also, could you please add a few tests to tests/test_data/test_data_array.py? Basically try to replicate a few scenarios and edge cases for this behavior and ensure that the proper thing happens. Thanks

@marc-flex
Copy link
Author

Thanks @marc-flex I think this is looking pretty good, just a few minor tweaks and also we should see if we want to apply this validator to all DataArray objects (as written currently) or rather as a Simulation post-init validator that loops over sources and does the interp check. I'm curious to see if the front end tests pass with these changes or if there are instances where we explicitly allow duplicate coords in some data array objects.

You're right. This doesn't pass front-end test. I'll give it another go with all your comments and restricting the check to custom fields and sources.

@tylerflex
Copy link
Collaborator

It might just be failing because of direction coords. but the fix I suggested about isel-ing the first coordinate might fix them, so perhaps try that first,

@tylerflex
Copy link
Collaborator

Or it could just be that some of the front end test data arrays have extra coordinates (by accident)

@momchil-flex
Copy link
Collaborator

Thanks @marc-flex I think this is looking pretty good, just a few minor tweaks and also we should see if we want to apply this validator to all DataArray objects (as written currently) or rather as a Simulation post-init validator that loops over sources and does the interp check. I'm curious to see if the front end tests pass with these changes or if there are instances where we explicitly allow duplicate coords in some data array objects.

You're right. This doesn't pass front-end test. I'll give it another go with all your comments and restricting the check to custom fields and sources.

I guess also CustomMedium types with interp_method == "linear" (as opposed to the default "nearest"). Actually I don't know if "nearest" also fails in some cases, i.e. maybe the data array sel method also fails if there are repeated coordinates?

@marc-flex
Copy link
Author

marc-flex commented May 8, 2024

@tylerflex I have tried several things. With the solution you mentioned (val.interp_like(val.isel({val.dim: 0}))) tests kept failing. So I have modified it to use an array of values from within the coordinates for each dimension (not sure that's generic enough).

Also, since this made checks fail, I'm only checking for custom source and mediums. Basically, I call a check function (defined at the DataArray level) within the validators.

I have created tests for custom medium and custom field source.

@tylerflex
Copy link
Collaborator

Thanks @marc-flex just FYI, this PR #1681 changed some of the tests in pre/2.7 so it looks like you'll need to rebase against that branch and fix any conflicts that might come up.

I'm not sure which tests are failing but we can look into it more. One that probably fails is related to log_capture but you can see my comment above.

Adding test for custom source
Adding test for custom medium
@marc-flex
Copy link
Author

marc-flex commented May 8, 2024

Thanks @marc-flex just FYI, this PR #1681 changed some of the tests in pre/2.7 so it looks like you'll need to rebase against that branch and fix any conflicts that might come up.

I have just rebased against pre/2.7 and force-pushed it

@tylerflex
Copy link
Collaborator

@marc-flex @momchil-flex is this something we want to get into 2.7.0rc2? and is it ready or still needing changes?

Copy link
Collaborator

@tylerflex tylerflex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @marc-flex I realized that this review was left as "pending" for the past week. here are some additional comments.

When you think it's getting ready, feel free to un-mark as a draft.

f"'DataArray.sel()' fails to select along {dim} which is used in the back-end. "
"This may be caused, for instance, by duplicated data "
f"in this dimension (you can verify this by running 'DataArray.drop_duplicates(dim=\"{dim}\")' "
"and run sel() with the new 'DataArray')."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please put sel in single quotes as well, thanks

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and double check the other raise messages eg above

x0 = np.array(self.coords[dim][0])
self.sel({dim: x0}, method="nearest")
except pandas.errors.InvalidIndexError:
raise DataError(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please ensure there are spaces between the periods at end of sentences and then the next sentence (if you don't add, manually, eg

"end of sentence. "
"Blah"

they will get combined without the space

@@ -93,6 +94,43 @@ def assign_data_attrs(cls, val):
val.attrs[attr_name] = attr
return val

def _interp_validator(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's add a return type annotation `def _interp_validator(self) -> None:

def _check_fields_interpolate(cls, val: FieldDataset) -> FieldDataset:
"""Checks whether the filds in 'field_dataset' can be interpolated."""
if isinstance(val, FieldDataset):
fields = ["Ex", "Ey", "Ez", "Hx", "Hy", "Hz"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the FieldDataset should have a @property called field_components : dict, so you can simply iterate through that more easily

for field, data in val.field_components.items():
    if isinstance(data, SpatialDataArray):
        # do the check on data

@@ -5,7 +5,7 @@
import matplotlib.pyplot as plt
import tidy3d as td
from tidy3d.exceptions import ValidationError, SetupError
from ..utils import assert_log_level, log_capture, AssertLogLevel
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the code formatter (ruff) is removing log_captiure since it's not used in the test directly.

However, we do add it as a pytest.fixture, passing it into eg. def test_custom_medium(log_capture): so this is a mistake from ruff that might be causing test failures.

You can resolve it by adding log_capture back into the imports and then ignoring this linting rule with a comment, for example:

from ..utils import log_capture # noqa: F401

from ..utils import log_capture  # noqa: F401 

@marc-flex marc-flex marked this pull request as ready for review May 14, 2024 11:10
@@ -115,7 +115,7 @@ def _interp_validator(self):
f"'DataArray.interp()' fails to interpolate along {dim} which is used in the back-end. "
"This may be caused, for instance, by duplicated data "
f"in this dimension (you can verify this by running 'DataArray.drop_duplicates(dim=\"{dim}\")' "
"and interpolate with the new DataArray)."
"and interpolate with the new DataArray). "
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data array in single quotes here

if isinstance(val.__getattribute__(field), ScalarFieldDataArray):
val.__getattribute__(field)._interp_validator()
for _, data in val.field_components.items():
if isinstance(data, ScalarFieldDataArray):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per original issue, I think we want to perform this for CustomMedium as well.

Will let @momchil-flex confirm what he wants when he's back online.

Otherwise looks good! Thanks @marc-flex

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this doing it for CustomMedium?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

huh, for some reason I didnt see that at all. Yea that should do it!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only thing I would consider adding to make it just a bit more user friendly would be an optional keyword argument to interp_validator(self, field_name: str=None)

and then we can pass permittivity or Ex, etc and the error message can include it.

This might help, eg. users identify which field is causing issues.

If this overly complicates things, feel free to ignore.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this overly complicates things, feel free to ignore.

Not at all! I think it'll be much nicer

Copy link
Author

@marc-flex marc-flex May 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tylerflex I have added field name. Let me know what you think. Thanks!

I was also looking that eps_dataset within CustomMedium has fields that derive from DataArray. Should we include this into the validator for CustomMedium? I guess this is along what @weiliangjin2021 is commenting below.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! minus a missing f in the fstring ;). Thanks!

Copy link
Collaborator

@weiliangjin2021 weiliangjin2021 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one comment: custom datasets are validated in CustomMedium. Shall we validate all other types of custom materials as well, e.g. CustomPoleResidue, CustomLorentz, etc.?

@marc-flex
Copy link
Author

marc-flex commented May 15, 2024

Just one comment: custom datasets are validated in CustomMedium. Shall we validate all other types of custom materials as well, e.g. CustomPoleResidue, CustomLorentz, etc.?

You're opening Pandora's box here. Maybe we should do this for all custom-defined DataArrays fields? Or should we limit this PR to the fields defined in issue #1668 ? What do you think @momchil-flex @tylerflex ? I'm guessing this wouldn't just be the validator, it'd also be implementing a corresponding test

@momchil-flex
Copy link
Collaborator

Just one comment: custom datasets are validated in CustomMedium. Shall we validate all other types of custom materials as well, e.g. CustomPoleResidue, CustomLorentz, etc.?

You're opening Pandora's box here. Maybe we should do this for all custom-defined DataArrays fields? Or should we limit this PR to the fields defined in issue #1668 ? What do you think @momchil-flex @tylerflex ? I'm guessing this wouldn't just be the validator, it'd also be implementing a corresponding test

To me it kinda makes sense to test all custom-defined data array fields, and I can't think of currently existing objects where we wouldn't want that. However, a) I don't know if I'm forgetting something already and b) I'm not sure if in the future something will come up. But maybe let's leave b) for future devs to worry about if validating all custom data arrays passes all tests (so hopefully a) is ok)?

@marc-flex
Copy link
Author

Just one comment: custom datasets are validated in CustomMedium. Shall we validate all other types of custom materials as well, e.g. CustomPoleResidue, CustomLorentz, etc.?

You're opening Pandora's box here. Maybe we should do this for all custom-defined DataArrays fields? Or should we limit this PR to the fields defined in issue #1668 ? What do you think @momchil-flex @tylerflex ? I'm guessing this wouldn't just be the validator, it'd also be implementing a corresponding test

To me it kinda makes sense to test all custom-defined data array fields, and I can't think of currently existing objects where we wouldn't want that. However, a) I don't know if I'm forgetting something already and b) I'm not sure if in the future something will come up. But maybe let's leave b) for future devs to worry about if validating all custom data arrays passes all tests (so hopefully a) is ok)?

I can remove the specific tests for custom medium/source and have the validator at the DataArray level. The only issue is that it wouldn't then say what property made the validator fail which might be frustrating for the users. So I guess I'd suggest to keep the scope of this PR as is, and add another issue where we collect any other fields we'd like to validate/test?

@tylerflex
Copy link
Collaborator

I agree with @marc-flex . Maybe it's best to just limit the scope of these validators to the components that are causing issues as per #1684 . because I additionally worry that adding too strict validation DataArray may lead to some unintended consequences.

@momchil-flex
Copy link
Collaborator

Ok yeah sounds good to me.

"This may be caused, for instance, by duplicated data "
f"in this dimension (you can verify this by running "
f"'{field_name}={field_name}.drop_duplicates(dim=\"{dim}\")' "
"and run 'sel()' with the new '{field_name}'). "
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add f for the {field_name}

if isinstance(val.__getattribute__(field), ScalarFieldDataArray):
val.__getattribute__(field)._interp_validator()
for _, data in val.field_components.items():
if isinstance(data, ScalarFieldDataArray):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! minus a missing f in the fstring ;). Thanks!

f"'{field_name}={field_name}.drop_duplicates(dim=\"{dim}\")' "
"and run 'sel()' with the new '{field_name}'). "
"Plase make sure you can use 'sel()' on the data."
)
Copy link
Collaborator

@tylerflex tylerflex May 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the pandas.errors.InvalidIndexError handling, consider doing the following

except pandas.errors.InvalidIndexError as e:
   raise DataError(...) from e

What this will do is raise your DataError and also display the original pandas Index error stack trace. It could be helpful to see what exactly is causing it, and for users to report us the stack trace so we can debug.

I might recommend first verifying in the test output that this doesn't look absolutely terrible though :D

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tried and the error doesn't seem to change. Not sure if I should be seeing any change, though

tidy3d/components/data/data_array.py Outdated Show resolved Hide resolved
tidy3d/components/data/data_array.py Outdated Show resolved Hide resolved
tidy3d/components/data/data_array.py Outdated Show resolved Hide resolved
@tylerflex
Copy link
Collaborator

@marc-flex I think this is basically ready to go. Last thing you'll need to do is update the docs/notebooks submodule to the latest version (pre/2.7 branch).

After you make that commit, we'll want to rebase this against tidy3d pre/2.7 and squash all the commits into one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.7 will go into version 2.7.*
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants