Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Schema improvements for V3? #560

Open
aleh-rymasheuski opened this issue Sep 1, 2023 · 1 comment
Open

Schema improvements for V3? #560

aleh-rymasheuski opened this issue Sep 1, 2023 · 1 comment

Comments

@aleh-rymasheuski
Copy link
Contributor

@hramezani, @aqeelat, I reviewed the LogEntry model to check where we can speed up the auditlog and I've got a few suggestions:

  1. Drop object_id field. Rationale: object_pk is a superset of this field, it holds the string representation of the int primary key or the primary key if it's not int (object_id is blank in the latter case). Both are indexed, so we have twice the bloat from having two columns.
  2. Remove db_index from action field. Rationale: this field only has three distinct values, so the selectivity of this index is really low. I don't expect the performance of seq scan to be much worse than the performance of an index scan for a three-value field (measurement needed).
  3. Replace db_index on timestamp column to a composite index by timestamp and id fields. Rationale: Django admin interprets order by timestamp as order by timestamp, id because timestamp is not a unique field and we need complete sort order on the admin. Composite index on the two fields will be bulkier, but will enable both searches by timestamp and by timestamp then id fields.

These changes (particularly, the removal of the object_id field) will make the migration to V3 harder for the users but will improve the performance to a certain degree.

@hramezani
Copy link
Member

Thanks @aleh-rymasheuski for investigating and preparing this issue.

  1. Drop object_id field. Rationale: object_pk is a superset of this field, it holds the string representation of the int primary key or the primary key if it's not int (object_id is blank in the latter case). Both are indexed, so we have twice the bloat from having two columns.

I think we already introduced a lot of breaking changes for V3. One other option would be to deprecate object_id diring V3 and remove it in V4.

  1. Remove db_index from action field. Rationale: this field only has three distinct values, so the selectivity of this index is really low. I don't expect the performance of seq scan to be much worse than the performance of an index scan for a three-value field (measurement needed).

I am ok to keep this index. but as you mentioned we need to measure it. then we can make a better decision.

  1. Replace db_index on timestamp column to a composite index by timestamp and id fields. Rationale: Django admin interprets order by timestamp as order by timestamp, id because timestamp is not a unique field and we need complete sort order on the admin. Composite index on the two fields will be bulkier, but will enable both searches by timestamp and by timestamp then id fields.

Seems a reasonable change. We need to know how long the migration takes for this.

@hramezani hramezani removed this from the 3.0 milestone Apr 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants