-
Notifications
You must be signed in to change notification settings - Fork 321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proof of concept: Refactoring UserListView an NodeListView to use django-filter [OSF-7962] [OSF-8090] #7258
Proof of concept: Refactoring UserListView an NodeListView to use django-filter [OSF-7962] [OSF-8090] #7258
Conversation
api/base/filters.py
Outdated
|
||
def __init__(self, data=None, *args, **kwargs): | ||
if data: | ||
new_qd = QueryDict('', mutable=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why a QueryDict
? Based on this, looks like data
can be regular dict.
api/base/filters.py
Outdated
@@ -61,6 +66,83 @@ def filter_queryset(self, request, queryset, view): | |||
return queryset | |||
|
|||
|
|||
class NewDjangoFilterMixin(django_filters.FilterSet): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a base class--not a mixin--so this should be renamed. Maybe JSONAPIFilterSet
?
api/base/filters.py
Outdated
super(NewDjangoFilterMixin, self).__init__(data=data, *args, **kwargs) | ||
|
||
@property | ||
def qs(self, *args, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we avoid having all the duplication with the code from FilterSet
? Ideally, this method would just be:
qs = super(....).qs
return self.filter_groups(qs)
I would think this is possible as long as you ensure self.filters
is correct in __init__
.
api/base/filters.py
Outdated
field_names = re.findall(self.FILTER_FIELDS, fields.strip()) | ||
|
||
self.or_fields = [] | ||
multiple_values = value.split(',') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: This is a strange name--"values" is already plural, so "multiple_values" is redundant.
api/base/filters.py
Outdated
|
||
self.or_fields = [] | ||
multiple_values = value.split(',') | ||
if len(multiple_values) > 1 or len(field_names) > 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the length of multiple_values
checked? Don't we only need to strip if there are more than one field_names
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the len of multiple_values
is checked here cause if there are more than one, those also need to be sent to the or_fields list for a built OR
query, independent if there are multiple field names
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I think I see. But don't filters with multiple values need to be handled separately from filters with multiple field names anyway? Also, doesn't django-filter have a built in way to handle multiple values? Intuitively, ?full_name=Erin,Steve
should be translated to .filter(fullname__in=['Erin', 'Steve'])
; I wonder if that's supported out-of-the-box.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a whole lot of trouble getting this to work out of the box -- theoretically, from reading through a few issues and their resolutions, the only mention I can find on how to handle those natively was through the CSV Widget -- though applying that in my testing of things didn't get me anywhere for a long while so I decided to just deal with it while dealing with other OR
cases at the same time. I can dive back in tho and really try to figure out what's happening....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool cool, looks like it might not be supported OOB. MultipleChoiceFilter does something similar to what we want, except without a constrained set of "choices". So maybe we can at least steal some code from that class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also came across this issue, which might be worth looking into: carltongibson/django-filter#727
api/base/filters.py
Outdated
fields = match_dict['fields'] | ||
field_names = re.findall(self.FILTER_FIELDS, fields.strip()) | ||
|
||
self.or_fields = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks suspicious--or_fields
is getting reset on each iteration, so ?filter[given_name,family_name]=Erin&filter[something_else,another_thing]=foo
would break.
api/base/filters.py
Outdated
if len(multiple_values) > 1 or len(field_names) > 1: | ||
for field_name in field_names: | ||
for value in multiple_values: | ||
self.or_fields.append({'field': field_name, 'value': value}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll need to handle multiple groups. Maybe it makes sense to use a dictionary, where the keys are sets of field names, and the value is the value.
{
set('given_name', 'family_name'): 'Erin',
...
}
api/users/views.py
Outdated
class Meta: | ||
model = OSFUser | ||
fields = ['id', 'full_name', 'given_name', 'middle_names', 'family_name'] | ||
strict = django_filters.constants.STRICTNESS.RAISE_VALIDATION_ERROR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to define strict
on the base class (currently NewDjangoFilterMixin)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, is there a way for us to raise the same error that we're doing now if the client passes an invalid field name?
Line 116 in 7e41c06
raise InvalidFilterFieldError(parameter='filter', value=field_name) |
api/users/views.py
Outdated
|
||
queryset = OSFUser.objects.filter(is_registered=True, date_disabled=None).order_by('-date_registered') | ||
if self.request.version >= '2.3': | ||
queryset = queryset.filter(merged_by__isnull=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this added?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was added for this test here that checks for version 2.3. Since django-filter is based on the view's get_queryset
method (rather than our filter's relying on get_default_odm_query
) -- I added the same checks on get_queryset
as already existed on the odm query method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, ok. Thanks for the clarification.
@@ -261,6 +261,27 @@ def test_users_list_filter_multiple_fields_with_bad_filter(self): | |||
res = self.app.get(url, expect_errors=True) | |||
assert_equal(res.status_code, 400) | |||
|
|||
def test_users_list_or_filter(self): | |||
self.john_doe = UserFactory(fullname='John Doe') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assigning to self
is unnecessary.
f21d1d5
to
c1d836b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pass finished
api/base/filters.py
Outdated
QUERY_PATTERN = re.compile(r'^filter\[(?P<fields>((?:,*\s*\w+)*))\](\[(?P<op>\w+)\])?$') | ||
FILTER_FIELDS = re.compile(r'(?:,*\s*(\w+)+)') | ||
|
||
or_fields = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this here?
api/base/filters.py
Outdated
|
||
class JSONAPIFilterSet(django_filters.FilterSet): | ||
|
||
QUERY_PATTERN = re.compile(r'^filter\[(?P<fields>((?:,*\s*\w+)*))\](\[(?P<op>\w+)\])?$') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is [op]
respected? I don't see it being used anywhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not yet! This was used when parsing out the other operators used in other filters, such as eq
, ne
, etc, which aren't used in this endpoint that I picked. Perhaps this proof of concept should include porting a second endpoint that uses operators?
api/base/filters.py
Outdated
if match: | ||
match_dict = match.groupdict() | ||
fields = match_dict['fields'] | ||
field_names = re.findall(self.FILTER_FIELDS, fields.strip()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since self.FILTER_FIELDS
is a RegexObject
, you should use self.FILTER_FIELDS.findall(...)
api/base/filters.py
Outdated
data = new_data | ||
super(JSONAPIFilterSet, self).__init__(data=data, *args, **kwargs) | ||
|
||
for field in field_names: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks suspicious: field_names
will be whatever happens to be set in the last iteration of the above loop. This validation should happen within the loop itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, is this code necessary, since this check is happening in filter_groups
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I at first tried to do this check inside of the loop itself, but the only way I could access the fields that were supposed to be there was to access the form fields -- which is inaccessible until super(JSONAPIFilterSet, self).__init__(...)
is called -- since self.form.fields
isn't set until self.filters
is set, which happens in the super
's init.
The check is necessary here (or somewhere, at least), as the check in filter_groups
only checks for invalid fields that are included in a group that will be part of an OR query (eg ?filter[wrong,field]
) and not filters that will go through normal filtering later (eg ?filter[wrong_field]=whateves
)
There might be a way to instead deal with this using django-filter's strict validation -- I tried for quite a long time to get this to work, and concluded that it was checking for an invalid filter value rather than an invalid filter itself -- which gets overlooked as far as I can tell by django-filter by default.
api/base/filters.py
Outdated
self.form.is_valid() | ||
or_keys = reduce(operator.or_, self.or_fields.keys(), set()) | ||
for key in or_keys: | ||
self.form.cleaned_data.pop(key, None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than removing these keys from form.cleaned_data
, would it make sense to prevent these fields from going through the form validation in the first place, i.e. preventing all the or_fields from going into self.form.fields
?
ae1c69c
to
0718b91
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few minor suggested changes. Feel free to make those then move on to the more advanced use cases described in https://openscience.atlassian.net/browse/OSF-8090
api/base/filters.py
Outdated
super(JSONAPIFilterSet, self).__init__(data=data, *args, **kwargs) | ||
|
||
for field in field_names: | ||
if field not in self.form.fields.keys(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: No need to call keys()
. value in some_dict
is an O(1) operation.
api/users/views.py
Outdated
if check_permissions: | ||
# May raise a permission denied | ||
self.check_object_permissions(self.request, obj) | ||
return obj | ||
|
||
|
||
class UserList(JSONAPIBaseView, generics.ListAPIView, ODMFilterMixin): | ||
class UserFilterSet(JSONAPIFilterSet): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FilterSets
and filter field classes should probably be separated into a separate file. Let's move this to api/users/filters.py
.
api/users/views.py
Outdated
@@ -29,7 +30,7 @@ | |||
from rest_framework import permissions as drf_permissions | |||
from rest_framework import generics | |||
from rest_framework.exceptions import NotAuthenticated, NotFound | |||
from osf.models import Contributor, ExternalAccount, AbstractNode as Node, PreprintService, OSFUser as User | |||
from osf.models import Contributor, ExternalAccount, AbstractNode as Node, PreprintService, OSFUser |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
0718b91
to
3ebedca
Compare
3ebedca
to
7652434
Compare
4c1eebd
to
5f3802b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple minor comments, no blockers other than conflicts. LGTM 🐳
api/base/filters.py
Outdated
|
||
return qs | ||
|
||
class NullModelMultipleChoiceCaseInsensitiveField(forms.ModelMultipleChoiceField): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: I think the Django convention is to define custom fields in a fields.py
file, but we haven't yet done that anywhere else in the API ¯\_(ツ)_/¯
QUERY_PATTERN = re.compile(r'^filter\[(?P<fields>((?:,*\s*\w+)*))\](\[(?P<op>\w+)\])?$') | ||
FILTER_FIELDS = re.compile(r'(?:,*\s*(\w+)+)') | ||
|
||
# TODO - there must be a better way to recognize these |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can do something like if <class>._meta.get_field('<name>').many_to_many
to check for M2M.
For date...
from django.db import connection
if 'timestamp' in <class>._meta.get_field('<name>').db_type(connection)
Not sure how hard it is to get the class in scope here, though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, that's good to know! I tried to get the class in scope here, but it's really tough, as far as I can tell my only args here are the field name and value being passed in by the filter....
53fa6a7
to
a2d0ef3
Compare
d3f782b
to
47e0276
Compare
- Put base class for filter into filters.py for use elsewhere that will deal with filter syntax - use that base class in the UserList view - Modify UserList get_queryset to also account for version - filter_groups method to deal with adding OR to filters in the same set
47e0276
to
a2fb4b4
Compare
a31a1ad
to
677739c
Compare
Closing. See https://openscience.atlassian.net/browse/PLAT-492 for our analysis |
Note:
This is no longer an actual PR that would be merged, rather a proof of concept. Main focus is document linked on below JIRA ticket!
Purpose
Investigate using the
DjangoFilterBackend
for APIv2. This is just a proof-of-concept, not necessarily for merging, but just for reviewing and seeing what sorts of modifications using the 3rd partyDjangoFilterBackend
could entail.Changes
DjangoFilterBackend
for theUserList
view andNodeList
viewsJSONAPIFilterSet
using some of the same filter param logic to break down filter syntax/?filter[given_name,family_name]=Doe
/?filter[family_name]=Jane,Dog
UserFilterSet
andNodeFilterSet
that facilitate filtering on node and user fields, consistent with what we have in the API currentlySide effects
No real ones at the moment, since it's only a proof of concept!
Other thoughts
A lot of messing around with standard functionality seemed to be needed to be done for our filter syntax to work properly with variable OR filters (see changes above). I'm not sure how well the built in solutions will work given the modifications we've done to get these things to work, but it could be alright! Just not sure if we'd need to alter the basics so much that we lose the benefits of using a maintained 3rd party library.
Ticket
https://openscience.atlassian.net/browse/OSF-7962
https://openscience.atlassian.net/browse/OSF-8090