Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for "ignoring a list of specific fields" #32

Open
hdw868 opened this issue Aug 6, 2020 · 11 comments
Open

Support for "ignoring a list of specific fields" #32

hdw868 opened this issue Aug 6, 2020 · 11 comments
Labels
enhancement New feature or request

Comments

@hdw868
Copy link

hdw868 commented Aug 6, 2020

I was using this tool to generate our regression baselines, the result is a JSON string like:

{
"id": "4163B50C4DD9911804894F827078B218",
"instanceId": "020B24C54D891A89B44AAABDBCFCE057",
 "status": 1
}

The problem is that there are some fields such as the instanceId field that is using UUID that will always change, and I don't want this false alert.
One workaround would be:

  • We provide the ignores=["instanceId",] to the data_regression function, and the value of this field will be ignored when comparing the result.

  • Or, we can replace the value of specific field using special string, such as "{% ignore %}" or something else;

@nicoddemus
Copy link
Member

Hi @hdw868,

Couldn't you preprocess the data yourself before passing to the fixture? Btw, are you using data_regression or file_regression?

@hdw868
Copy link
Author

hdw868 commented Aug 6, 2020

Hi @nicoddemus,
Yes, I could process the data before passing to the fixture, but usually, those are very common fileds such as "id", ''creationDate" and etc, I think it would much more convenient to provide a list of keys to be ignored instead of calling those reprocessing funtion.

@nicoddemus
Copy link
Member

Hi @hdw868,

While it might seem convenient to ignore just some top level keys, this doesn't cover nested structures:

{
  "orders": [
     {
       "id": "aksjsnj-10291",
     }
  ],
  "product" : {
    "id": "215",
  },
}

If a user wants to remove the id key from orders but not from product, they will need to do it themselves anyway.

For the simple use case, the proposal to ignore just a top-level key, the code:

del data["id"]
data_regression(data)

Becomes:

data_regression(data, ignore_key=["id"])

Which is not a big win IMHO.

So particularly I'm 👎 on adding this, but let's hear what others think.

@igortg
Copy link
Member

igortg commented Aug 6, 2020

We had the same issue using data_regression for SQLAlchemy models.

Below is a recipe for a new fixture based on data_regression that would ignore some fields of the data given to the check method. The recipe uses regex patterns, but I think you can simplify it for your own purpose. You can write this code into your project conftest.py, update the class member fields_to_ignore and use the db_model_regression.check instead.

@pytest.fixture()
def db_model_regression(data_regression):
    return DbModelRegression(data_regression)


class DbModelRegression:

    fields_to_ignore = ['updated_on', 'created_on', r'\w*_id', '^id$']

    def __init__(self, data_regression):
        self.data_regression = data_regression

    def check(self, data, **kwargs):
        return self.data_regression.check(self._prepare_for_regression(data), **kwargs)

    def _should_record(self, field_name):
        for pattern in self.fields_to_ignore:
            if re.match(pattern, field_name):
                return False
        else:
            return True

    def _prepare_for_regression(self, data):
        if isinstance(data, list):
            prepared_data = []
            for item in data:
                prepared_data.append(self._prepare_dict(item))
            return prepared_data
        elif isinstance(data, dict):
            return self._prepare_dict(data)
        else:
            return data

    def _prepare_dict(self, data):
        prepared_data = {}
        for key in filter(self._should_record, data):
            value = data[key]
            if isinstance(value, list):
                prepared_list = []
                for item in value:
                    prepared_list.append(self._prepare_for_regression(item))
                prepared_data[key] = prepared_list
            elif isinstance(value, dict):
                prepared_data[key] = self._prepare_for_regression(value)
            else:
                prepared_data[key] = value
        return prepared_data

I agree with you that we must find a more elegant way of address that. But I don't think that adding an ignores option is the answer. Maybe adding some highlevel fixtures that knows how to deal with objects instead of dicts. I'll try to put some thought on it.

@hdw868
Copy link
Author

hdw868 commented Aug 7, 2020

Hi @hdw868,

While it might seem convenient to ignore just some top level keys, this doesn't cover nested structures:

{
  "orders": [
     {
       "id": "aksjsnj-10291",
     }
  ],
  "product" : {
    "id": "215",
  },
}

If a user wants to remove the id key from orders but not from product, they will need to do it themselves anyway.

For the simple use case, the proposal to ignore just a top-level key, the code:

del data["id"]
data_regression(data)

Becomes:

data_regression(data, ignore_key=["id"])

Which is not a big win IMHO.

So particularly I'm 👎 on adding this, but let's hear what others think.

Yes, provide a ignore option only may make things complicated anyway, however, the workaround provided by @igortg seems to have the same problem. Just wondering if there is a more elegant way to handle this situation.

@igortg
Copy link
Member

igortg commented Aug 7, 2020

A more elegant way would be building your own serializer to convert objects to dicts without the unwanted fields. Pydantic has something like that, but you have to specify the attribute you want to ignore also in the nested objects.

@igortg igortg added the enhancement New feature or request label Aug 7, 2020
@hdw868
Copy link
Author

hdw868 commented Aug 10, 2020

Hmm, would adding exclude options like pydantic be useful? I may try to create a PR if it's worth trying.

@qweeze
Copy link

qweeze commented Aug 12, 2020

As an idea, I think it would be nice to have a more generic solution that can handle not just ignoring specific fields, but also custom comparison behaviour.

For example, one may want to be able to

  • ignore the value of id but assert that id field is present
  • ignore the value but assert that it is a positive integer
  • assert that a float equals expected value with some tolerance
  • assert that a list contains expected items ignoring order
  • assert that a string matches a regex

One way to achive this could be by providing a way to override fields' values with any object that defines an __eq__ method. Something like

from unittest.mock import ANY
from pytest import approx

data = {
    "orders": [{
        "id": "aksjsnj-10291",
    }],
    "product" : {
        "id": "215",
    },
    "gravity": 9.81,
    # ...100500 other fields
}

data_regression.check(
    data,
    {'product': {'id': ANY}, 'gravity': approx(10, 0.1)}
)   

@nicoddemus
Copy link
Member

One way to achive this could be by providing a way to override fields' values with any object that defines an eq method.

At first glance I like this idea, seems to include a nice general way to customize how things are compared.

@adamzev
Copy link

adamzev commented Nov 11, 2021

I forked the library (https://github.com/adamzev/pytest-regressions) and implemented a solution that works for my use case. I focused just on data regression yml files. For lines you want to ignore you can add a # ignore comment on them in the yml file and it will ignore differences in those lines. So you don't have to manually enter what lines to ignore, I added a --force-ignore flag which will tag every line which is currently showing up as different (for me, running the tests in a different order produces different model ids which can be deeply nested keys or values) to ignore in the future.

If this would work for other people I'd be happy to look at what needs to be done to make this able to be brought in.

@danlamanna
Copy link

FWIW, here is a relevant discussion in a similar library syrusakbary/snapshottest#21. I think pointing out that jest does this with their "property matchers" is nice https://jestjs.io/docs/en/snapshot-testing#property-matchers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants