Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reducing number of db queries by using select_related data #216

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

Andrey-Skvortsov
Copy link

@Andrey-Skvortsov Andrey-Skvortsov commented Feb 23, 2018

Overview

Currently to get translated data django-parler do query to db for each record in main, default and fallback languages. It can be cached, but it needs the cache to be filled. It can be reduced by using prefetch_related. Disadvantages using prefetch related:

  • There are the additional query(s) when you join models with translations. It will be as much additional queries as much translated models.
  • Prefetch_related does not help when you use iterators as by design related data prefetches only after all data(records) is queried.

What is new

The PR introduces automatic addition virtual composite foreign keys from shared model to translated model to access to translations in default and active languages and automatic addition such relations to select_related for models' queryset. This optimisation covers main 99% use cases when you just need data in current active language.
The addition happens if you work under Django ver 1.8+
For Django 1.7 it works as early.

What is used

It used the django-composite-foreignkey module
https://pypi.python.org/pypi/django-composite-foreignkey
This module works under Django 1.8+

How it works

You use as usual Translated models, it will join query (OUTER JOIN) model twice with translations models in 2 languages.

JOIN not happening if it queries only non-translatable fields

If you use .only() and all the fields are not translatable there are no additional JOINs with translations models. In all other cases translated fields are expected and do JOIN to prevent having additional queries.

Do not want have additional joins

You could disable this by adding:

  • force_select_related_translations = False to queryset or use
  • LightTranslatableManager as a base manager for model. (but you still can join with translations manually call .select_related('translations'))

More complex usage translatable models

If you have a normal(not translatable) model which is joining with translatable models eg.: you have mode1 joining with model2 and model3 like .select_related('model2__model3') and for not having additional queries for translations for model2 and model3 you have to also add
.select_related( 'model2__model3', 'model2__translations_active', 'model2__translations_default', 'model2__model3__translations_active', 'model2__model3__translations_default', )
So if you do not want worry about which additional translation models you need add to select_related you can:

  • add class SelectRelatedTranslationsQuerySetMixin to base classes for your models' queryset or use:
  • AutoAddTranslationsManager as a base manager for mode1.
    Then it adds
    .select_related( ... 'model2__translations_active', 'model2__translations_default', 'model2__model3__translations_active', 'model2__model3__translations_default', )
    automatically. So you will have one SQL-query for everything do not carry about which models is translatable ones.

..also it works with translatable model

In the last example if model1 is translatable you can use:

  • DeepTranslatableManager instead default TranslatableManager for model1 to have the same behaviour automatically adding active and default translations.

Simplify usage translatable fields in .only()

In translatable model you can still use .only('field1') even if the field1 is translatable and do not belong to the model. In that case it will be automatically replaced with translations_active and translations_default fields in .only fields list and by that in will be automatically queried select related as base functionality of TranslatableQuerySet.

@codecov
Copy link

codecov bot commented Feb 23, 2018

Codecov Report

Merging #216 into master will increase coverage by 1.52%.
The diff coverage is 97.42%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #216      +/-   ##
==========================================
+ Coverage   86.19%   87.72%   +1.52%     
==========================================
  Files          30       31       +1     
  Lines        1797     2037     +240     
==========================================
+ Hits         1549     1787     +238     
- Misses        248      250       +2
Impacted Files Coverage Δ
example/article/tests.py 98.8% <ø> (ø) ⬆️
parler/tests/test_query_count.py 100% <100%> (ø) ⬆️
parler/managers.py 93.37% <100%> (+8.37%) ⬆️
parler/tests/test_model_construction.py 93.42% <100%> (+1.01%) ⬆️
parler/tests/test_model_inheritance.py 100% <100%> (ø) ⬆️
parler/tests/testapp/models.py 92.15% <84.84%> (-2.01%) ⬇️
parler/utils/fields.py 92.3% <92.3%> (ø)
parler/tests/test_querysets.py 99.24% <99.24%> (ø)
parler/cache.py 65.62% <0%> (-3.13%) ⬇️
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 875f64a...ff90ddd. Read the comment docs.

@vdboor
Copy link
Contributor

vdboor commented Feb 25, 2018

Wow, thanks for this work! I'll take some time to evaluate this. A few questions:

  • Django 1.7 support is not required, I'm perfectly fine with dropping that for a new release.
  • Since we already support Django 2.0, I'm curious what it would take to get Django 2.0 support for this.
  • How do you avoid returning multiple rows for the same object when there are multiple languages? Is using "prefetch_related()" better perhaps, or am I mistaken here?
  • Please don't bump the version number yet...

Copy link
Contributor

@vdboor vdboor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done a quick review, still have to test how it works in practice. You could already look into my comments.

parler/fields.py Outdated
from parler.utils.i18n import get_language

if (1, 8) <= django.VERSION < (2, 0):
from compositefk.fields import RawFieldValue, CompositeOneToOneField
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with dropping Django 1.7 support, but would love to have Django 2.0 support in.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added support Django 2 to django-composite-foreignkey recently. So can remove the 'if' here.

parler/fields.py Outdated
if 'to_fields' in kwargs:
kwargs['to_fields'] = {'master_id': None, 'language_code': None} # hack: Need always the same dict
if "on_delete" in kwargs:
kwargs['on_delete'] = DONOTHING # hack: Need always the same global object with __module__ attr
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of this hack?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hack is need because repitative migrations problem, here
https://github.com/Andrey-Skvortsov/django-composite-foreignkey/blob/91c093c8c81b38a8267557597c6bc0870e4315c3/compositefk/fields.py#L48
they are doing overriding on_delete and Django migrations produced a new migration then do makemigrations with the same code even there are no any changes. It's do such because it compare the models's field and found diff in on_delete as it different instances here.
It's already fixed here:
https://github.com/Andrey-Skvortsov/django-composite-foreignkey/blob/91c093c8c81b38a8267557597c6bc0870e4315c3/compositefk/fields.py#L196
but not in PyPi yet, so it needs bumped version in django-composite-foreignkey.

return new_model


class TranslatableModel(six.with_metaclass(TranslatableModelBase, TranslatableModelMixin, models.Model)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: It would be good to point out in the CHANGES that TranslatableModel uses a meta class now. Projects that define a meta class for any inherited class need to take note of this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, there are no way to escape to do meta-class for TranslatableModel if we want to support django model inheritance. And we want - we have test_inherited_model in tests.

@@ -1085,6 +1187,7 @@ def __init__(self, base, shared_model, translations_model, related_name):

self.base = base
self.inherited = False
self._local_extensions = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is "local extensions" doing? I find the name a bit vague

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It needs to know which extensions are belong to model or come from parent model through model inheritance.
We create additional attributes only for new extensions. Agree, the name is not clear. I was following the logic we have in django's model._meta: we have fields, concrete_fields, many_to_many attributes to access to all fields of the model and we have local_fields, local_concrete_fields, local_many_to_many to access only to current model fields without inherited fields.

from unittest import skipIf
except ImportError:
# python<2.7
from django.utils.unittest import skipIf
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python 2.6 compatibility is not required. It's EOL and obsolete and parler doesn't support it anymore

raise NotRelationField


def get_extra_related_translalation_paths(model, path):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typoo in the function name!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also needs test coverage for this function (TODO)

pass


def get_model_from_relation(field):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can keep this function private, and use a standard ValueError exception

"""

force_select_related_translations = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming this select_related_translations seems clear to me too

@@ -119,6 +131,47 @@ class TranslationDoesNotExist(AttributeError, ObjectDoesNotExist):
_lazy_verbose_name = lazy(lambda x: ugettext("{0} Translation").format(x._meta.verbose_name), six.text_type)


def create_translations_composite_fk(shared_model, related_name, translated_model):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this function doing and how is it related to the "select related" logic?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function adds 2 attributes to the main model with virtual composite foreign key to model with translations. It let us be able to join main model with translations as one-to-one twice with default and current languages. The QuerySet class adds automatically these attributes to select_related so it let have an access to translations in 2 languages reducing DB hitting.

@@ -1057,6 +1157,8 @@ def __init__(self, shared_model, translations_model, related_name):
self.shared_model = shared_model
self.model = translations_model
self.rel_name = related_name
self.rel_name_active = None
self.rel_name_default = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are rel_name_default and rel_name_active doing?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This the names for new 2 attributes to the main model with virtual composite foreign key. Refer to translations in default and active languages.

@Andrey-Skvortsov
Copy link
Author

Answering to the questions:

  • Dropping support for Django 1.7: Purpose of this PR just add optimisation to parler. It already works with 1.7, do not need to deprecate it because the PR, but It's up to you.
  • Django 2.0 support: django-composite-foreignkey did not have support for it. Today I did PR to them to add support for 2.0, and it is approved and merged already!
    Add Django 2 support, bump to v1.0.1 onysos/django-composite-foreignkey#13
  • Avoiding returning multiple rows: for that I use django-composite-foreignkey package and add 2 attributes - 'virtual' composite foreignkeys to the shared_model in addition to 'translations' attribute. Using composite fks let to join to translations table with 2 fields: original fk 'master_id' = 'id' and 'language_code' = the_constant, so it is unique one-to-one join here.
  • prefetch_related() is good, but it is an addition query to db with huge list of IDs in expression
    ...master_id IN (1,2,3,4....), so if you have 100k records it not such good then... Also If you have e.g. 3 models refer one by one C->B->A, all of them with translations ,you will have 4 queries already even you use prefetch related. An additional problem with prefetch_related in Django (I think it is a bug) is, then you do query to A with prefetch_related('A__B__C') it propagated IDs to restrict data in table C it generate weird IN clause like ...master_id IN (1,1,1,1,..,2,2,2,2....) so many IDs how it's many in A-table-queryset, even here in C-table may by only a few records. Having select_related let us have only one sql for all.
  • Not to bump: you are right of course, I increased the version because struggled to force the applications in I am working on to use new parler version. We use not easy deployment procedure with few apps. It is not a problem to add
    git+https://github.com/Andrey-Skvortsov/django-parler.git@e39979ace39454a9f020efea6ebcec0a52de01ae#egg=django-parler-1.10.0
    in dependency_links for setuptools.setup() with old version number, but the second app is inherited dependences from the main by e- github...#egg=app1[deps] and still using the main PyPi version if it set as old version 1.9.x. I will remove the bump.

current_path = ''
extra_paths = []
for piece in pieces:
field = parent._meta.get_field(piece)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, we are iterating through the chain of fields by fetching the Field instance is from parent._meta.get_field, and then update parent from get_model_from_relation. Why can't we directly iterate the options like:

option = model._meta
for piece in pieces:
    field = option.get_field(piece)
    option = field.get_path_info()[-1].to_opts
    ...

...and then no need to get the model of each field in the chain.

parent = get_model_from_relation(field)
current_path += LOOKUP_SEP + piece if current_path else piece
if issubclass(parent, TranslatableModel):
for extension in parent._parler_meta:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand, get_extra_related_translation_paths is to facilitate SelectRelatedTranslationsQuerySetMixin.

And SelectRelatedTranslationsQuerySetMixin is designed for the case that, if we have the following models:

class SimpleModel(TranslatableModel):
    shared = models.CharField()

    translations = TranslatedFields(
        tr_title=models.CharField()
    )


class SimpleRelatedModelManager(SelectRelatedTranslationsQuerySetMixin, models.Manager):
    pass


class SimpleRelatedModel(models.Model):
    some_attribute = models.CharField()
    some_reference = models.ForeignKey(SimpleModel, on_delete=models.CASCADE)

    objects = SimpleRelatedModelManager()

...then we we do

SimpleRelatedModel.objects.select_related('some_reference')

the queryset will actually do

SimpleRelatedModel.objects.select_related(
    'some_reference',
    'some_reference__translations_active',
    'some_reference__translations_default',
)

However there is a problem in this approach. Say now we have these models instead:

class SimpleModel(TranslatableModel):
    shared = models.CharField()

    translations = TranslatedFields(
        tr_title=models.CharField()
    )


class SimpleRelatedModelManager(SelectRelatedTranslationsQuerySetMixin, models.Manager):
    pass


class TranslatedSimpleRelatedModel(TranslatableModel):
    translations = TranslatedFields(
        tr_attribute = models.CharField()
    )
    some_reference = models.ForeignKey(SimpleModel, on_delete=models.CASCADE)


class AnotherRelatedModel(models.Model):
    another_attribute = models.CharField()
    another_reference = models.OneToOneField(TranslatedSimpleRelatedModel, on_delete=models.CASCADE)

    objects = SimpleRelatedModelManager()

...then when we do

AnotherRelatedModel.objects.select_related('another_reference__some_reference')

...the logic in SelectRelatedTranslationsQuerySetMixin will actually do

AnotherRelatedModel.objects.select_related(
    'another_reference__some_reference, '
    `another_reference__translations_active`,
    `another_reference__translations_default`,
    `another_reference__some_reference__translations_active`,
    `another_reference__some_reference__translations_default`,
)

As we only want to select_related the active and default translations of SimpleModel, another_reference__translations_active and another_reference__translations_default are unnecessary extra paths and we will end up select more related translation paths than we need.

I am working on a branch to address some problems in this PR, and it contains a different approach as a fix for the problem I describe above.
https://github.com/Andrey-Skvortsov/django-parler/pull/1/files#diff-f2bbe1670b2ebcf7b29f969e89fb3540R25

@mahmedk91
Copy link

Hey guy,
Just a quick bug I want to report with this branch.

The dumpdata fails for a TranslatableModel

python manage.py dumpdata appname.modelname --format json --output data.json --indent 2

However dumpdata for Translations succeed

python manage.py dumpdata appname.modelnametranslation --format json --output data_translation.json --indent 2

@vdboor vdboor force-pushed the master branch 4 times, most recently from 2bf75d2 to 577ca2f Compare November 18, 2021 09:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants