Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for query facets #62

Open
clintonb opened this issue Jun 30, 2016 · 8 comments
Open

Add support for query facets #62

clintonb opened this issue Jun 30, 2016 · 8 comments

Comments

@clintonb
Copy link
Contributor

Add support for customizable query facets.

@rhblind
Copy link
Owner

rhblind commented Jun 30, 2016

Yes, as you've discovered I've kind of just ignored this. But it should definitely be supported.

I have not quite decided on how to do the implementation, but it should be implemented in a way that feels natural concerning the other faceting options (by utilising the field_options dictionary on the HaystackFacetSerializer).

It would probably also be useful to be able to override the declared query_facets in the query string, as we can with the existing options.

Since we already declare the field_options as {<field_name>: {"option": value}} one solution could be to look for a nested list/tuple construction using ie. query as key in order to indicate that this is a field query which should be passed on to the search backend.

Ex:

field_options = {
    "firstname": {"query": ["robin*", "joh*"]}
}

Next, when constructing the query in the FacetQueryBuilder class, we would check for the query identifier and iterate all values from the list applying them something like this: SearchQuerySet().query_facet("firstname", "robin*").query_facet("firstname", "joh*")

This way we can support multiple query facets per field, as well as keeping the API simple and familiar.
The main drawback is that we're building a little "weird" convention.

Any inputs are appreciated!

@rhblind
Copy link
Owner

rhblind commented Jun 30, 2016

Another solution could be to introduce a new Meta attribute field_queries which basically does the same thing.
It might be more intuitive perhaps?

@rhblind
Copy link
Owner

rhblind commented Jun 30, 2016

When thinking about it I prefer the latter solution. Even if it requires the introduction of another Meta attribute, it's more declarative and better describes the action which will be taken.

@clintonb
Copy link
Contributor Author

I've started an implementation of the field_queries solution at openedx/course-discovery#144. I'm going to flesh it out some more in an effort to identify potential issues.

For this example, I need a method of generating custom date facets. They are rendered as follows:

"queries": {
        "availability_archived": {
            "count": 2,
            "query": "end:<=now",
            "narrow_url": "http://cd.local:8008/api/v1/search/all/facets/?content_type=courserun&selected_query_facets=availability_archived&selected_query_facets=availability_starting_soon&q=python"
        },
        "availability_starting_soon": {
            "count": 4,
            "query": "start:[now TO now+60d]",
            "narrow_url": "http://cd.local:8008/api/v1/search/all/facets/?content_type=courserun&selected_query_facets=availability_starting_soon&q=python"
        }
    },

I do not like the fact that the filtering is dependent on the view, as this is a violation of separation of concerns. Initially, I declared the facets in the view; but, opted to change this after reading your message, @rhblind. I'd be curious to hear the history behind defining facets on the serializer rather than the view.

@rhblind
Copy link
Owner

rhblind commented Jul 1, 2016

Great, I'll have a look at what you've come up with!

I do not like the fact that the filtering is dependent on the view, as this is a violation of separation of concerns.

I'm not quite sure I get what you mean. The filter_queryset() method at the HaystackGenericAPIView is just overriding the filter_queryset() method in GenericAPIView from django-rest-framework. The FacetMixin is just modelled after the GenericAPIView, so that is basically the whole reason for the filtering stuff to be on the view. Also, the serializer does not have access to the queryset, only what the view asks the serializer to render.

As for "defining facets on the serializer" I take it that you mean the field_options dict?
It seemed like a natural place to declare options to facetted fields, because it's where you define all the other field stuff.... I don't really know if it was such a thought through decision at that moment, but I think it works =)

If you have strong feelings for putting them on the view rather on the serializer, I'm willing to discuss it. But I think we should avoid breaking the API unless we have to...

@rhblind
Copy link
Owner

rhblind commented Jul 1, 2016

Another thing I came to think about is that declaring query facets in a list/tuple is probably ambiguous, since each search backend has its own syntax for doing OR and AND in the query itself. There is no reason to do multiple calls to SearchQuerySet().query_facet() on the same field, as you would just write the initial query differently. So I think we should keep it simple and simply provide a single string with the query for each field....

@clintonb
Copy link
Contributor Author

clintonb commented Jul 1, 2016

My comment is about the fact that filters depend on the views' serializers to build the queryset. I'm not recommending that change now; but, I do find it rather strange that the serializer affects the queryset. I generally prefer my serializers to be rather dumb, focusing solely on converting Python data to some other data type (e.g. JSON).

Regarding declaration of query facets, I am using a dict similar to the existing facets:

class CourseRunFacetSerializer(BaseHaystackFacetSerializer):
    serialize_objects = True

    class Meta:
        field_queries = {
            'availability_current': {'query': 'start:<now AND end:>now'},
        }

In this example 'availability_current' is simply the name by which Elasticsearch refers to the query, not an actual field. I opted to continue using the existing code that applies filters/facets, hence the value being a dict that will be read as kwargs for SearchQuerySet.query_facet().

@clintonb clintonb mentioned this issue Jul 1, 2016
3 tasks
@rhblind
Copy link
Owner

rhblind commented Jul 1, 2016

Ah, of course. Sorry was a bit tired this morning =)

Well, yes I agree, it's not optimal to put them on the serializer. But I'm also not too keen on doing a breaking API change when I'm in an rc cycle, so I think we should wait for a next major release.

It seems like you're in the process of making something work here and I really appreciate it, so thanks!
I'll take a closer look at the PR when the build succeeds and tests are in place =)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants