New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validate that custom datasets can interpolate #1684
base: pre/2.7
Are you sure you want to change the base?
Conversation
All I've done so far is to introduce a validator that checks that data can be interpolated in each of the dimensions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @marc-flex I think this is looking pretty good, just a few minor tweaks and also we should see if we want to apply this validator to all DataArray
objects (as written currently) or rather as a Simulation
post-init validator that loops over sources and does the interp check. I'm curious to see if the front end tests pass with these changes or if there are instances where we explicitly allow duplicate coords in some data array objects.
tidy3d/components/data/data_array.py
Outdated
for dim in dims: | ||
try: | ||
val.interp({dim: 0}) | ||
except: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you'll want to except
a specific Exception
here (whatever xarray
raises when interp()
fails). One convenient way to do this while still retaining the original exception traceback (which can be useful for debugging) is the following pattern
try:
...
except ExceptionType as e:
raise MyException(...) from e
tidy3d/components/data/data_array.py
Outdated
@@ -93,6 +94,23 @@ def assign_data_attrs(cls, val): | |||
val.attrs[attr_name] = attr | |||
return val | |||
|
|||
@classmethod | |||
def interp_validator(cls, val): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose this checks all DataArray
objects, which maybe isn't so bad? but I guess this is mainly an issue for ModeSource
and CustomFieldSource
objects. I can't think of an instance where we want to allow a DataArray with repeating coords but it would be interesting to see if the tests pass.
tidy3d/components/data/data_array.py
Outdated
dims = val.coords.dims | ||
for dim in dims: | ||
try: | ||
val.interp({dim: 0}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this might fail if the data is not numeric. For example, the ModeSource.direction
can be ("-", "+")
. It might be more robust to try an approach where we sel the first coordinate and try to interp into it.
val.interp_like(val.isel({val.dim: 0}))
tidy3d/components/data/data_array.py
Outdated
val.interp({dim: 0}) | ||
except: | ||
raise DataError( | ||
f"DataArray.interp() fails to interpolate to {dim}=0. This may be caused by duplicated data " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please put any code stuff in single (regular) quotes '
. It will render nicer in the terminal output, thanks!
tidy3d/components/data/data_array.py
Outdated
except: | ||
raise DataError( | ||
f"DataArray.interp() fails to interpolate to {dim}=0. This may be caused by duplicated data " | ||
f'in this dimension (you can verify this with DataArray.drop_duplicates(dim="{dim}")). ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how specifically can they verify with this command?
Also, could you please add a few tests to |
You're right. This doesn't pass front-end test. I'll give it another go with all your comments and restricting the check to custom fields and sources. |
It might just be failing because of |
Or it could just be that some of the front end test data arrays have extra coordinates (by accident) |
I guess also CustomMedium types with |
@tylerflex I have tried several things. With the solution you mentioned ( Also, since this made checks fail, I'm only checking for custom source and mediums. Basically, I call a check function (defined at the I have created tests for custom medium and custom field source. |
Thanks @marc-flex just FYI, this PR #1681 changed some of the tests in pre/2.7 so it looks like you'll need to rebase against that branch and fix any conflicts that might come up. I'm not sure which tests are failing but we can look into it more. One that probably fails is related to |
Adding test for custom source Adding test for custom medium
d42bb24
to
a12718e
Compare
I have just rebased against pre/2.7 and force-pushed it |
@marc-flex @momchil-flex is this something we want to get into 2.7.0rc2? and is it ready or still needing changes? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry @marc-flex I realized that this review was left as "pending" for the past week. here are some additional comments.
When you think it's getting ready, feel free to un-mark as a draft.
tidy3d/components/data/data_array.py
Outdated
f"'DataArray.sel()' fails to select along {dim} which is used in the back-end. " | ||
"This may be caused, for instance, by duplicated data " | ||
f"in this dimension (you can verify this by running 'DataArray.drop_duplicates(dim=\"{dim}\")' " | ||
"and run sel() with the new 'DataArray')." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please put sel
in single quotes as well, thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and double check the other raise
messages eg above
x0 = np.array(self.coords[dim][0]) | ||
self.sel({dim: x0}, method="nearest") | ||
except pandas.errors.InvalidIndexError: | ||
raise DataError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please ensure there are spaces between the periods at end of sentences and then the next sentence (if you don't add, manually, eg
"end of sentence. "
"Blah"
they will get combined without the space
tidy3d/components/data/data_array.py
Outdated
@@ -93,6 +94,43 @@ def assign_data_attrs(cls, val): | |||
val.attrs[attr_name] = attr | |||
return val | |||
|
|||
def _interp_validator(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's add a return type annotation `def _interp_validator(self) -> None:
tidy3d/components/source.py
Outdated
def _check_fields_interpolate(cls, val: FieldDataset) -> FieldDataset: | ||
"""Checks whether the filds in 'field_dataset' can be interpolated.""" | ||
if isinstance(val, FieldDataset): | ||
fields = ["Ex", "Ey", "Ez", "Hx", "Hy", "Hz"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the FieldDataset
should have a @property
called field_components : dict
, so you can simply iterate through that more easily
for field, data in val.field_components.items():
if isinstance(data, SpatialDataArray):
# do the check on data
tests/test_components/test_medium.py
Outdated
@@ -5,7 +5,7 @@ | |||
import matplotlib.pyplot as plt | |||
import tidy3d as td | |||
from tidy3d.exceptions import ValidationError, SetupError | |||
from ..utils import assert_log_level, log_capture, AssertLogLevel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the code formatter (ruff) is removing log_captiure
since it's not used in the test directly.
However, we do add it as a pytest.fixture
, passing it into eg. def test_custom_medium(log_capture):
so this is a mistake from ruff that might be causing test failures.
You can resolve it by adding log_capture
back into the imports and then ignoring this linting rule with a comment, for example:
from ..utils import log_capture # noqa: F401 |
from ..utils import log_capture # noqa: F401
tidy3d/components/data/data_array.py
Outdated
@@ -115,7 +115,7 @@ def _interp_validator(self): | |||
f"'DataArray.interp()' fails to interpolate along {dim} which is used in the back-end. " | |||
"This may be caused, for instance, by duplicated data " | |||
f"in this dimension (you can verify this by running 'DataArray.drop_duplicates(dim=\"{dim}\")' " | |||
"and interpolate with the new DataArray)." | |||
"and interpolate with the new DataArray). " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
data array in single quotes here
if isinstance(val.__getattribute__(field), ScalarFieldDataArray): | ||
val.__getattribute__(field)._interp_validator() | ||
for _, data in val.field_components.items(): | ||
if isinstance(data, ScalarFieldDataArray): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per original issue, I think we want to perform this for CustomMedium
as well.
Will let @momchil-flex confirm what he wants when he's back online.
Otherwise looks good! Thanks @marc-flex
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this doing it for CustomMedium
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
huh, for some reason I didnt see that at all. Yea that should do it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only thing I would consider adding to make it just a bit more user friendly would be an optional keyword argument to interp_validator(self, field_name: str=None)
and then we can pass permittivity
or Ex
, etc and the error message can include it.
This might help, eg. users identify which field is causing issues.
If this overly complicates things, feel free to ignore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this overly complicates things, feel free to ignore.
Not at all! I think it'll be much nicer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tylerflex I have added field name. Let me know what you think. Thanks!
I was also looking that eps_dataset
within CustomMedium
has fields that derive from DataArray
. Should we include this into the validator for CustomMedium
? I guess this is along what @weiliangjin2021 is commenting below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! minus a missing f
in the fstring ;). Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one comment: custom datasets are validated in CustomMedium
. Shall we validate all other types of custom materials as well, e.g. CustomPoleResidue, CustomLorentz, etc.?
You're opening Pandora's box here. Maybe we should do this for all custom-defined |
To me it kinda makes sense to test all custom-defined data array fields, and I can't think of currently existing objects where we wouldn't want that. However, a) I don't know if I'm forgetting something already and b) I'm not sure if in the future something will come up. But maybe let's leave b) for future devs to worry about if validating all custom data arrays passes all tests (so hopefully a) is ok)? |
I can remove the specific tests for custom medium/source and have the validator at the |
I agree with @marc-flex . Maybe it's best to just limit the scope of these validators to the components that are causing issues as per #1684 . because I additionally worry that adding too strict validation DataArray may lead to some unintended consequences. |
Ok yeah sounds good to me. |
tidy3d/components/data/data_array.py
Outdated
"This may be caused, for instance, by duplicated data " | ||
f"in this dimension (you can verify this by running " | ||
f"'{field_name}={field_name}.drop_duplicates(dim=\"{dim}\")' " | ||
"and run 'sel()' with the new '{field_name}'). " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add f
for the {field_name}
if isinstance(val.__getattribute__(field), ScalarFieldDataArray): | ||
val.__getattribute__(field)._interp_validator() | ||
for _, data in val.field_components.items(): | ||
if isinstance(data, ScalarFieldDataArray): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! minus a missing f
in the fstring ;). Thanks!
tidy3d/components/data/data_array.py
Outdated
f"'{field_name}={field_name}.drop_duplicates(dim=\"{dim}\")' " | ||
"and run 'sel()' with the new '{field_name}'). " | ||
"Plase make sure you can use 'sel()' on the data." | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for the pandas.errors.InvalidIndexError
handling, consider doing the following
except pandas.errors.InvalidIndexError as e:
raise DataError(...) from e
What this will do is raise your DataError and also display the original pandas Index error stack trace. It could be helpful to see what exactly is causing it, and for users to report us the stack trace so we can debug.
I might recommend first verifying in the test output that this doesn't look absolutely terrible though :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just tried and the error doesn't seem to change. Not sure if I should be seeing any change, though
@marc-flex I think this is basically ready to go. Last thing you'll need to do is update the docs/notebooks submodule to the latest version (pre/2.7 branch). After you make that commit, we'll want to rebase this against tidy3d |
This PR addresses issue #1668