Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add group validation & filtering #1167

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

rpkilby
Copy link
Collaborator

@rpkilby rpkilby commented Jan 28, 2020

This PR introduces a Meta.groups option to the FilterSet, which allows users to define groups of Filters that override the validation and filtering behavior for the filters in that group. Example:

class UserFilter(django_filters.FilterSet):
    class Meta:
        model = User
        fields = ['username', 'first_name', 'last_name']
        groups = [
            RequiredGroup(['first_name', 'last_name']),
        ]

In the above, both first_name and last_name are mutually required, so if one of the filter params is provided, then both must be provided.

This PR should resolve #1020 and is an alternative to #1164 (cc @jonathan-golorry, any thoughts on this PR as an alternative to yours?).

Context:

Users periodically request how to validate and filter against multiple parameters. The goto advice has been to implement something based on MultiWidget, however there are three main deficiencies in my mind:

  • It's verbose, users will typically need to subclass MultiWidget, MultiValueField, and Filter.
  • It's difficult to implement. For new Django users, the MultiWidget/MultiValueField API is somewhat confusing (maybe I just didn't grok it for some reason, but I remember struggling with it in the past)
  • As mentioned elsewhere, validation errors don't account for sub-widgets and are raised under the unsuffixed parameter name. This is acceptable for forms usage, where all sub widgets and errors are rendered together, but this mismatch doesn't work in an API context. You might have query params foo_min and foo_max, but the errors are reported as foo.

I think this ultimately stems from a mismatch in expectations. Filters are generally designed to operate on a single query parameter, while MultiWidget is not. Aside from the difficulties in implementing, there are other limitations elsewhere (e.g., lack of schema support).

Proposal:

In contrast to a MultiWidget-based approach, Meta.groups are specially handled by the filterset and provided its entire set of filter params. Groups must implement both validation and filtering, however they are slightly different.

  • Group validation is only intended to validate the qualities of the group. For example, an ExclusiveGroup would validate that only one of its mutually exclusive parameters is provided. Validation of the individual filter parameters is still performed by the filter's underlying form field.
  • In contrast, group filtering completely overrides how filtering is performed for that set of filters. Internally, a group may still delegate to the underlying Filter.filter methods, however this is up to the group's implementation. For example, RequiredGroup calls the filter methods of its individual filters, while CombinedRequiredGroup constructs Q objects that are combined via & or | (or some other function).

In total, this PR implements four groups:

Name Description
ExclusiveGroup(filters) A group of mutually exclusive filters.
RequiredGroup(filters) A group of mutually required filters.
CombinedGroup(filters[, combine]) A group of filters that result in a combined query (a Q object).
CombinedRequiredGroup(filters[, combine]) A group of mutually required filters that result in a combined query.

Notes:

  • RequiredGroup effectively provides an alternative for the old Meta.together option.
  • Rendered docs can be found at https://rpkilby.github.io/django-filter/ref/groups.html
  • One objective of the PR was to merge docstrings into the actual site docs. The idea being that this reduces potential duplication. The docs page provides some structure/prose, and class/method docstrings are inserted where appropriate. If this style is 👍, then I'd want to eventually want to look into overhauling the rest of the reference docs in a similar manner.
  • Code changes/additions are actually fairly minimal. Most of the lines are docs/docstrings/tests.

TODO:

  • How is the clarity of the docs? What parts don't make sense?
  • Are better code examples needed?

@rpkilby
Copy link
Collaborator Author

rpkilby commented Jan 28, 2020

Hi @carltongibson. What do you want to do about the Django 1.11 failures? Q objects were made comparable in django/django#7517, however this change wasn't added until Django 2.0. I could either rework the failing tests, or we could drop support for Django 1.11 per the official recommendations.

@jonathan-golorry
Copy link

I think this is a much better approach.

It looks like this would allow a single filter to be in multiple groups. Is that right?

One concern I have is how you handle excludes in the CombinedGroup. Consider:

m = Model.objects.create()
Related.objects.create(reverse=m, foo=True)
Related.objects.create(reverse=m, foo=False)
# 1
Model.objects.exclude(related__id__lt=0).exclude(related__foo=True)  # zero results
# 2
Model.objects.exclude(related__id__lt=0, related__foo=True)  # one result
# 3
Model.objects.filter(~Q(related__id__lt=0) & ~Q(related__foo=True))  # zero results

Excludes on related objects are weird.

@rpkilby
Copy link
Collaborator Author

rpkilby commented Jan 29, 2020

It looks like this would allow a single filter to be in multiple groups. Is that right?

Yes, but it's complicated. Validation should work, but filtering depends on the setup.

First, note the changes to filter_queryset. A group's params are 'extracted' from the rest of the cleaned_data. This is necessary so that the group and regular filters aren't both ran. e.g., with a combined group, you might expect a query like:

MyModel.objects.filter(Q(field_a=1) | Q(field_b=2))

Without extraction though, you would end up with something like:

MyModel.objects \
    .filter(Q(field_a=1) | Q(field_b=2)) \
    .filter(field_a=1) \
    .filter(field_b=2)

With a filter being shared across multiple groups, what would happen is that the filter would be extracted for the first group, but not the second. Some groups (like RequiredGroup) would fail, since they expect all their parameters to be provided, but this would work for ExclusiveGroups. e.g., you could have a setup where a filter is present in multiple mutually exclusive filter groups. Since that filter param would only be called once, there's no issue.

Long story short, filtering is going to be dependent on the implementations of the various groups. I don't think there is any rule that can be applied generally. Groups will just have to assert their expectations (e.g., as a mutually required group, I expect to filter on all my parameters) and provide as much helpful information as possible (maybe one of my params was preempted by another group).

One concern I have is how you handle excludes in the CombinedGroup.

I don't know if there's anything we can do here. That ORM call is not necessarily invalid, and exclusion across relationships generally requires consideration on the part of the developer. The CombinedGroup reference does include:

This is useful for enabling OR filtering, as well as combining filters that span multi-valued relationships (more info).

But maybe we could more specifically call out exclusion across relationships.

Excludes on related objects are weird.

Oh yes. I also ran into this with DRF filters. See philipn/django-rest-framework-filters#217

@jonathan-golorry
Copy link

Hmm. I don't like having the order of the group definitions matter for duplicate field use. Is there a downside to allowing a single field to be in multiple groups? The data extraction be rewritten as:

individual_data = cleaned_data.copy()
for group in self.groups:
    group_data = group.extract_data(cleaned_data)
    for name in group_data:
        individual_data.pop(name, None)
    queryset = group.filter(queryset, **group_data)

for name, value in individual_data.items():

Also, do you know if there's an in depth explanation of expected exclude behaviour? Now I'm wondering if my second example would be considered a bug in django.

@rpkilby
Copy link
Collaborator Author

rpkilby commented Jan 30, 2020

Is there a downside to allowing a single field to be in multiple groups?

I don't think the queries will generally make sense. For example, what would it mean for a filter to be a member of an ExclusiveGroup and an OR-based CombinedGroup? e.g.,

class MyFilter(FilterSet):
    class Meta:
        model = MyModel
        fields = ['field_a', 'field_b', 'field_c']
        groups = [
            ExclusiveGroup(filters=['field_a', 'field_b']),
            CombinedGroup(filters=['field_b', 'field_c'], combine=operator.or_),
        ]

MyFilter({'field_b': 'b', 'field_c': 'c'}).qs would have a call like

MyModel.objects.filter(field_b='b').filter(Q(field_b='b') | Q(field_c='c'))

The above negates the combined query, and is somewhat pointless. Another example, take two exclusive groups (e.g., two mutually sets of arguments, of which some are shared).

        groups = [
            ExclusiveGroup(filters=['field_a', 'field_b']),
            ExclusiveGroup(filters=['field_b', 'field_c']),
        ]

While the validation is sensible, filtering on field_b would generate a call like

MyModel.objects.filter(field_b='b').filter(field_b='b')

The duplicate filter condition doesn't hurt the query, but it's not helping either.

I'd like to see some concrete examples for overlapping groups. This can always be improved later, and in the meantime, maybe we could make a note in the docs?


Also, do you know if there's an in depth explanation of expected exclude behaviour? Now I'm wondering if my second example would be considered a bug in django.

It's described in a note about exclusion here. However, I'm not sure if your case is a bug or not.

@jonathan-golorry
Copy link

Wouldn't your first example currently result in:

MyModel.objects.filter(field_b='b').filter(Q(field_c='c'))

That would be significantly different from:

MyModel.objects.filter(field_b='b').filter(Q(field_b='b') | Q(field_c='c'))
MyModel.objects.filter(field_b='b')  # simplified

@codecov-io
Copy link

Codecov Report

Merging #1167 into master will decrease coverage by 1.07%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1167      +/-   ##
==========================================
- Coverage   99.43%   98.35%   -1.08%     
==========================================
  Files          15       16       +1     
  Lines        1235     1340     +105     
  Branches        0      226     +226     
==========================================
+ Hits         1228     1318      +90     
- Misses          7        8       +1     
- Partials        0       14      +14
Impacted Files Coverage Δ
django_filters/filters.py 98.84% <100%> (-1.16%) ⬇️
django_filters/filterset.py 98.98% <100%> (-1.02%) ⬇️
django_filters/groups.py 100% <100%> (ø)
django_filters/__init__.py 81.25% <0%> (-12.5%) ⬇️
django_filters/utils.py 97.2% <0%> (-2.8%) ⬇️
django_filters/fields.py 98.83% <0%> (-1.17%) ⬇️
django_filters/widgets.py 99.32% <0%> (-0.68%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e6d5f2d...fb54361. Read the comment docs.

@rpkilby
Copy link
Collaborator Author

rpkilby commented Jan 31, 2020

Wouldn't your first example currently result in:

MyModel.objects.filter(field_b='b').filter(Q(field_c='c'))

Sorry, I wasn't clear. I was assuming your proposed code changes, where a filter can be applied by overlapping groups. In this case, field_b is used by both the exclusive and combined OR groups, generating:

MyModel.objects.filter(field_b='b').filter(Q(field_b='b') | Q(field_c='c'))

@jonathan-golorry
Copy link

jonathan-golorry commented Feb 2, 2020

Yeah, either implementation could trip people up. Looking at your example again, my interpretation of the intent is that MyFilter({'field_b': 'b', 'field_c': 'c'}).qs would produce:

MyModel.objects.filter(Q(field_b='b') | Q(field_c='c'))

But neither implementation does that. Mine results in:

MyModel.objects.filter(field_b='b').filter(Q(field_b='b') | Q(field_c='c'))

And I think yours results in:

MyModel.objects.filter(field_b='b').filter(Q(field_c='c'))

Maybe we could split validation groups and filtering groups into two different concepts. I don't really see why ExclusiveGroup and RequiredGroup need to extract data and run their own filter logic. If the group data extraction were non-destructive, then you could have as many combinations of ExclusiveGroup and RequiredGroup as you wanted.

    def get_data(self, cleaned_data):
        return {
            name: cleaned_data[name]
            for name in self.filters
            if name in cleaned_data
        }

Then only do group filtering for CombinedGroup. The actual filtering on ExclusiveGroup and RequiredGroup can be done by the individual filters.

        individual_data = cleaned_data.copy()
        for group in self.groups:
            if isinstance(group, CombinedGroup):
                group_data = group.get_data(cleaned_data)
                for name in group_data:
                    individual_data.pop(name, None)
                queryset = group.filter(queryset, **group_data)

        for name, value in individual_data.items():

(edited to fix a mistake in the filtering)

@rpkilby
Copy link
Collaborator Author

rpkilby commented Feb 4, 2020

Maybe we could split validation groups and filtering groups into two different concepts. I don't really see why ExclusiveGroup and RequiredGroup need to extract data and run their own filter logic.

The idea was ease of mental model. In general, django-filter does two things:

  • validate query params
  • construct a query from those params

And I didn't want groups to deviate from this, where some groups do one thing, others something else. That said, this "ease of mental model" kind of falls apart when users need to actually know what's going on under the hood.

What I'm leaning towards now is that a group may validate and it may filter, if it implements the respective methods. So, the ExclusiveGroup and RequiredGroup would just implement validation, and a CombinedGroup would just implement filtering. CombinedRequiredGroup would implement both. This should alleviate most issues with overlapping groups. Although, in cases where filtering does overlap, users would need to take care that the queries still make sense.

@jonathan-golorry
Copy link

jonathan-golorry commented Feb 4, 2020

Sounds good to me. Two important points to note in the docs:

  1. If a field is in one or more groups that implement filter, the field's individual filter or method will not run.
  2. If a field is in multiple groups that implement filter, it will get passed to all of them.

More complex behaviour than that should use a custom FilterGroup.

@carltongibson
Copy link
Owner

Hey @rpkilby.

What do you want to do about the Django 1.11 failures?

Can we just ditch 1.11 I think. Let's go 2.2+ only. Simplifies everything. (You up to do that? 😀)

@rpkilby
Copy link
Collaborator Author

rpkilby commented Mar 4, 2020

Definitely. What is your timeline on the release? Aside from the Django failures, I'd still like to implement changes discussed above.

@carltongibson
Copy link
Owner

What is your timeline on the release?

No rush. Bang it into shape. Pull it together. Go. End of the month say?

@rpkilby
Copy link
Collaborator Author

rpkilby commented Mar 4, 2020

Oh yeah, I can manage that. If I'm being optimistic, end of week... but I've also got a lot of other stuff on my plate.

@carltongibson
Copy link
Owner

No rush! 🙂

@rpkilby
Copy link
Collaborator Author

rpkilby commented Mar 4, 2020

Sounds good. Did you have any thoughts on the documentation? If you could review the overall structure/idea, that would be useful.

  • Rendered docs can be found at https://rpkilby.github.io/django-filter/ref/groups.html
  • One objective of the PR was to merge docstrings into the actual site docs. The idea being that this reduces potential duplication. The docs page provides some structure/prose, and class/method docstrings are inserted where appropriate. If this style is 👍, then I'd want to eventually want to look into overhauling the rest of the reference docs in a similar manner.

@guzzijones
Copy link

guzzijones commented Mar 21, 2020

I literally just ran into this use case. I am filtering through a many to many relationship and I do NOT want it to chain. I will give this a whirl. Sounds like it is basically ready.

@guzzijones
Copy link

I am your first satisfied customer. THANK YOU! This is exactly what I needed. For now I added this branch as a requirement in my pipfile.

@kense20xx
Copy link

Desperately seeking this functionality... Is there any update about whether / when this might be merged into the master?

@guzzijones
Copy link

guzzijones commented Apr 10, 2020 via email

@AsheKR
Copy link

AsheKR commented Apr 20, 2020

any plan to merge this feature?

@Aaxom
Copy link

Aaxom commented May 24, 2020

Same demand here, hope to merge this feature into new master branch.

@TaipanRex
Copy link

+1, I am hoping this would solve multiple to-many filters.

@Detavern
Copy link

Same demand, any plan?

@alex-pobeditel-2004
Copy link

Does anybody know why this PR was forgotten? :)

Base automatically changed from master to main March 3, 2021 08:48
@MuriloScarpaSitonio
Copy link

Any updates on this?

@russmatney
Copy link

For anyone who still needs many-to-many filters to AND instead of OR, we were able to get similar functionality using this PR as a guide - I dropped a rough implementation here: https://gist.github.com/russmatney/7c757989ea3d6b1343df841ce5f33bc4

@ppolewicz
Copy link

@carltongibson it's been 9 months since the docs were provided. Do you need help reviewing https://rpkilby.github.io/django-filter/ref/groups.html ? I need CombinedGroup too, but I understand you may be busy with other stuff, maybe there is something we could do to help you?

@carltongibson
Copy link
Owner

@ppolewicz The issue is that I'm (still) not sure whether this is the way I'd like to go here, not a lack of time to hit the merge button.

@carltongibson
Copy link
Owner

To expand, it's a lot of extra API for something that can be done with a Filter and a multi-field already...

@0xE111
Copy link

0xE111 commented Dec 28, 2021

@carltongibson What is multi-field?

I had this problem when I wanted to filter by different fields of related objects. I have House -> Flat relations, and I wanted to filter houses by flat parameters, but

class HouseFilterSet(FilterSet):
    num_rooms = Filter(field_name='flats__num_rooms')
    sq_ft = Filter(field_name='flats__sq_ft')

results in

House.objects.filter(flats__num_rooms=...).filter(flats__sq_ft=...)

I believe that most users of django-filter want this to be resolved into

House.objects.filter(flats__num_rooms=..., flats__sq_ft=...)

Currently I solved it rather ugly by accumulating Q objects. Horrible...

class HomeFilter(...):
    ...
    def filter_queryset(self, request, queryset, view):
        qs = []
        for field in ('sq_ft', 'num_rooms'):
            if (value := request.query_params.get(field)) is not None:
                qs.append(Q(**{f'flats__{field}': value}))
            if (value := request.query_params.get(f'{field}_min')) is not None:
                qs.append(Q(**{f'flats__{field}__gte': value}))
            if (value := request.query_params.get(f'{field}_max')) is not None:
                qs.append(Q(**{f'flats__{field}__lte': value}))

        if qs:
            queryset = queryset.filter(reduce(and_, qs))

        return queryset

This would be solved by the groups PR, but let us know if there's any better solution, please. Honestly,
I suppose that many django rest APIs on the web are filtering incorrectly because of this :)

@carltongibson
Copy link
Owner

@c0ntribut0r I'm referring to MultiValueField https://docs.djangoproject.com/en/4.0/ref/forms/fields/#s-multivaluefield. A Filter using one to grab the related values as one from the query string and apply them in a single step is how this should be handled — so it's contained within the form and the Filter's filter method.

I believe that most users ...

... are filtering incorrectly because of this.

No. There are two ways of cutting the cake here. Both are valid, depending on your context. See the Spanning multi-valued relationships docs for discussion.

I appreciate folks want an out-of-box solution here, which is why I haven't closed this already, but every time I look at the proposed API here I think that it's too much. I would much rather stick with a narrower solution.

@kjpc-tech
Copy link

I've been following this for awhile since I had an ugly version working that just pulled all the values I needed in the single filter method.

After seeing the last comment about using Django's MultiValueField I have cleaned up my version and am posting here for anybody who finds it helpful: https://gitlab.com/-/snippets/2237049

@kviktor kviktor mentioned this pull request Aug 18, 2022
2 tasks
@danmash
Copy link

danmash commented Nov 27, 2023

Imagine spending two weeks building a huge endpoint with djnago_filters and after the test finding out that it is unable to filter properly and duplicate INNER JOIN

         INNER JOIN "grow_userrulenotificationgroup" T4 ON ("core_company_info_ukcompany"."id" = T4."uk_company_id")
         INNER JOIN "grow_userrulenotificationgroup" T5 ON ("core_company_info_ukcompany"."id" = T5."uk_company_id")
         INNER JOIN "grow_userrulenotification" T6 ON (T5."id" = T6."notification_group_id")
WHERE ("grow_userrulenotificationgroup"."id" IS NOT NULL AND "grow_userrulenotificationgroup"."latest_timestamp" >=
                                                             '2023-08-29 13:35:44.242749+00:00' AND T4."event_source_type" = 'AccountsFilingEventSourceType' AND T6."rule_set_id" IN (140))

Already been discussed many times
#537
#745
#753
and this stale PR.

4k stars on GitHub have never been so disappointing.

@carltongibson
Copy link
Owner

Filtering is done as per the Spanning multi-valued relationships Django docs.

If you need to filter in a single step see the snippet Kyle posted:

After seeing the last comment about using Django's MultiValueField I have cleaned up my version and am posting here for anybody who finds it helpful: https://gitlab.com/-/snippets/2237049

This PR is open as an aide memoir for implementing that solution in django-filter itself.

Repository owner locked as too heated and limited conversation to collaborators Nov 27, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Document multiparameter patern.