Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proof of concept: Refactoring UserListView an NodeListView to use django-filter [OSF-7962] [OSF-8090] #7258

Closed

Conversation

erinspace
Copy link
Member

@erinspace erinspace commented May 24, 2017

Note:

This is no longer an actual PR that would be merged, rather a proof of concept. Main focus is document linked on below JIRA ticket!

Purpose

Investigate using the DjangoFilterBackend for APIv2. This is just a proof-of-concept, not necessarily for merging, but just for reviewing and seeing what sorts of modifications using the 3rd party DjangoFilterBackend could entail.

Changes

  • Use DjangoFilterBackend for the UserList view and NodeList views
  • Create JSONAPIFilterSet using some of the same filter param logic to break down filter syntax
  • Custom filter properties that will filter with OR queries for two different fields when using filter syntax such as /?filter[given_name,family_name]=Doe
  • Also add OR functionality with one field, such as /?filter[family_name]=Jane,Dog
  • UserFilterSet and NodeFilterSet that facilitate filtering on node and user fields, consistent with what we have in the API currently

Side effects

No real ones at the moment, since it's only a proof of concept!

Other thoughts

A lot of messing around with standard functionality seemed to be needed to be done for our filter syntax to work properly with variable OR filters (see changes above). I'm not sure how well the built in solutions will work given the modifications we've done to get these things to work, but it could be alright! Just not sure if we'd need to alter the basics so much that we lose the benefits of using a maintained 3rd party library.

Ticket

https://openscience.atlassian.net/browse/OSF-7962
https://openscience.atlassian.net/browse/OSF-8090


def __init__(self, data=None, *args, **kwargs):
if data:
new_qd = QueryDict('', mutable=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why a QueryDict? Based on this, looks like data can be regular dict.

@@ -61,6 +66,83 @@ def filter_queryset(self, request, queryset, view):
return queryset


class NewDjangoFilterMixin(django_filters.FilterSet):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a base class--not a mixin--so this should be renamed. Maybe JSONAPIFilterSet?

super(NewDjangoFilterMixin, self).__init__(data=data, *args, **kwargs)

@property
def qs(self, *args, **kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid having all the duplication with the code from FilterSet? Ideally, this method would just be:

qs = super(....).qs
return self.filter_groups(qs)

I would think this is possible as long as you ensure self.filters is correct in __init__.

field_names = re.findall(self.FILTER_FIELDS, fields.strip())

self.or_fields = []
multiple_values = value.split(',')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This is a strange name--"values" is already plural, so "multiple_values" is redundant.


self.or_fields = []
multiple_values = value.split(',')
if len(multiple_values) > 1 or len(field_names) > 1:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the length of multiple_values checked? Don't we only need to strip if there are more than one field_names?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the len of multiple_values is checked here cause if there are more than one, those also need to be sent to the or_fields list for a built OR query, independent if there are multiple field names

Copy link
Contributor

@sloria sloria May 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I think I see. But don't filters with multiple values need to be handled separately from filters with multiple field names anyway? Also, doesn't django-filter have a built in way to handle multiple values? Intuitively, ?full_name=Erin,Steve should be translated to .filter(fullname__in=['Erin', 'Steve']); I wonder if that's supported out-of-the-box.

Copy link
Member Author

@erinspace erinspace May 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a whole lot of trouble getting this to work out of the box -- theoretically, from reading through a few issues and their resolutions, the only mention I can find on how to handle those natively was through the CSV Widget -- though applying that in my testing of things didn't get me anywhere for a long while so I decided to just deal with it while dealing with other OR cases at the same time. I can dive back in tho and really try to figure out what's happening....

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool cool, looks like it might not be supported OOB. MultipleChoiceFilter does something similar to what we want, except without a constrained set of "choices". So maybe we can at least steal some code from that class.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also came across this issue, which might be worth looking into: carltongibson/django-filter#727

fields = match_dict['fields']
field_names = re.findall(self.FILTER_FIELDS, fields.strip())

self.or_fields = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks suspicious--or_fields is getting reset on each iteration, so ?filter[given_name,family_name]=Erin&filter[something_else,another_thing]=foo would break.

if len(multiple_values) > 1 or len(field_names) > 1:
for field_name in field_names:
for value in multiple_values:
self.or_fields.append({'field': field_name, 'value': value})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll need to handle multiple groups. Maybe it makes sense to use a dictionary, where the keys are sets of field names, and the value is the value.

{
    set('given_name', 'family_name'): 'Erin',
    ...
}

class Meta:
model = OSFUser
fields = ['id', 'full_name', 'given_name', 'middle_names', 'family_name']
strict = django_filters.constants.STRICTNESS.RAISE_VALIDATION_ERROR
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to define strict on the base class (currently NewDjangoFilterMixin)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, is there a way for us to raise the same error that we're doing now if the client passes an invalid field name?

raise InvalidFilterFieldError(parameter='filter', value=field_name)


queryset = OSFUser.objects.filter(is_registered=True, date_disabled=None).order_by('-date_registered')
if self.request.version >= '2.3':
queryset = queryset.filter(merged_by__isnull=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this added?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was added for this test here that checks for version 2.3. Since django-filter is based on the view's get_queryset method (rather than our filter's relying on get_default_odm_query) -- I added the same checks on get_queryset as already existed on the odm query method

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, ok. Thanks for the clarification.

@@ -261,6 +261,27 @@ def test_users_list_filter_multiple_fields_with_bad_filter(self):
res = self.app.get(url, expect_errors=True)
assert_equal(res.status_code, 400)

def test_users_list_or_filter(self):
self.john_doe = UserFactory(fullname='John Doe')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assigning to self is unnecessary.

@erinspace erinspace force-pushed the feature/filter_backend branch 2 times, most recently from f21d1d5 to c1d836b Compare May 31, 2017 16:42
Copy link
Contributor

@sloria sloria left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pass finished

QUERY_PATTERN = re.compile(r'^filter\[(?P<fields>((?:,*\s*\w+)*))\](\[(?P<op>\w+)\])?$')
FILTER_FIELDS = re.compile(r'(?:,*\s*(\w+)+)')

or_fields = {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this here?


class JSONAPIFilterSet(django_filters.FilterSet):

QUERY_PATTERN = re.compile(r'^filter\[(?P<fields>((?:,*\s*\w+)*))\](\[(?P<op>\w+)\])?$')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is [op] respected? I don't see it being used anywhere

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not yet! This was used when parsing out the other operators used in other filters, such as eq, ne, etc, which aren't used in this endpoint that I picked. Perhaps this proof of concept should include porting a second endpoint that uses operators?

if match:
match_dict = match.groupdict()
fields = match_dict['fields']
field_names = re.findall(self.FILTER_FIELDS, fields.strip())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since self.FILTER_FIELDS is a RegexObject, you should use self.FILTER_FIELDS.findall(...)

data = new_data
super(JSONAPIFilterSet, self).__init__(data=data, *args, **kwargs)

for field in field_names:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks suspicious: field_names will be whatever happens to be set in the last iteration of the above loop. This validation should happen within the loop itself.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, is this code necessary, since this check is happening in filter_groups?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I at first tried to do this check inside of the loop itself, but the only way I could access the fields that were supposed to be there was to access the form fields -- which is inaccessible until super(JSONAPIFilterSet, self).__init__(...) is called -- since self.form.fields isn't set until self.filters is set, which happens in the super's init.

The check is necessary here (or somewhere, at least), as the check in filter_groups only checks for invalid fields that are included in a group that will be part of an OR query (eg ?filter[wrong,field]) and not filters that will go through normal filtering later (eg ?filter[wrong_field]=whateves)

There might be a way to instead deal with this using django-filter's strict validation -- I tried for quite a long time to get this to work, and concluded that it was checking for an invalid filter value rather than an invalid filter itself -- which gets overlooked as far as I can tell by django-filter by default.

self.form.is_valid()
or_keys = reduce(operator.or_, self.or_fields.keys(), set())
for key in or_keys:
self.form.cleaned_data.pop(key, None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than removing these keys from form.cleaned_data, would it make sense to prevent these fields from going through the form validation in the first place, i.e. preventing all the or_fields from going into self.form.fields?

@erinspace erinspace force-pushed the feature/filter_backend branch 2 times, most recently from ae1c69c to 0718b91 Compare June 2, 2017 14:10
Copy link
Contributor

@sloria sloria left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few minor suggested changes. Feel free to make those then move on to the more advanced use cases described in https://openscience.atlassian.net/browse/OSF-8090

super(JSONAPIFilterSet, self).__init__(data=data, *args, **kwargs)

for field in field_names:
if field not in self.form.fields.keys():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: No need to call keys(). value in some_dict is an O(1) operation.

if check_permissions:
# May raise a permission denied
self.check_object_permissions(self.request, obj)
return obj


class UserList(JSONAPIBaseView, generics.ListAPIView, ODMFilterMixin):
class UserFilterSet(JSONAPIFilterSet):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FilterSets and filter field classes should probably be separated into a separate file. Let's move this to api/users/filters.py.

@@ -29,7 +30,7 @@
from rest_framework import permissions as drf_permissions
from rest_framework import generics
from rest_framework.exceptions import NotAuthenticated, NotFound
from osf.models import Contributor, ExternalAccount, AbstractNode as Node, PreprintService, OSFUser as User
from osf.models import Contributor, ExternalAccount, AbstractNode as Node, PreprintService, OSFUser
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@erinspace erinspace changed the title Proof of concept: Refactoring UserList View to use django-filter [OSF-7962] Proof of concept: Refactoring UserListView an NodeListView to use django-filter [OSF-7962] Jul 11, 2017
@erinspace erinspace changed the title Proof of concept: Refactoring UserListView an NodeListView to use django-filter [OSF-7962] Proof of concept: Refactoring UserListView an NodeListView to use django-filter [OSF-7962] [OSF-8090] Jul 11, 2017
@erinspace erinspace changed the title Proof of concept: Refactoring UserListView an NodeListView to use django-filter [OSF-7962] [OSF-8090] [WIP] Proof of concept: Refactoring UserListView an NodeListView to use django-filter [OSF-7962] [OSF-8090] Jul 11, 2017
@erinspace erinspace changed the title [WIP] Proof of concept: Refactoring UserListView an NodeListView to use django-filter [OSF-7962] [OSF-8090] Proof of concept: Refactoring UserListView an NodeListView to use django-filter [OSF-7962] [OSF-8090] Jul 13, 2017
@erinspace erinspace force-pushed the feature/filter_backend branch 2 times, most recently from 4c1eebd to 5f3802b Compare July 14, 2017 18:16
Copy link
Member

@mfraezz mfraezz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple minor comments, no blockers other than conflicts. LGTM 🐳


return qs

class NullModelMultipleChoiceCaseInsensitiveField(forms.ModelMultipleChoiceField):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: I think the Django convention is to define custom fields in a fields.py file, but we haven't yet done that anywhere else in the API ¯\_(ツ)_/¯

QUERY_PATTERN = re.compile(r'^filter\[(?P<fields>((?:,*\s*\w+)*))\](\[(?P<op>\w+)\])?$')
FILTER_FIELDS = re.compile(r'(?:,*\s*(\w+)+)')

# TODO - there must be a better way to recognize these
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can do something like if <class>._meta.get_field('<name>').many_to_many to check for M2M.
For date...

from django.db import connection

if 'timestamp' in <class>._meta.get_field('<name>').db_type(connection)

Not sure how hard it is to get the class in scope here, though

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that's good to know! I tried to get the class in scope here, but it's really tough, as far as I can tell my only args here are the field name and value being passed in by the filter....

@erinspace erinspace force-pushed the feature/filter_backend branch 6 times, most recently from 53fa6a7 to a2d0ef3 Compare September 13, 2017 17:42
@erinspace erinspace force-pushed the feature/filter_backend branch 2 times, most recently from d3f782b to 47e0276 Compare January 30, 2018 17:14
- Put base class for filter into filters.py for use elsewhere that will
deal with filter syntax
- use that base class in the UserList view
- Modify UserList get_queryset to also account for version
- filter_groups method to deal with adding OR to filters in the same set
@sloria
Copy link
Contributor

sloria commented Apr 13, 2018

Closing. See https://openscience.atlassian.net/browse/PLAT-492 for our analysis

@sloria sloria closed this Apr 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants