Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CEP] Table partitioning for remote API logs #30189

Open
kaapstorm opened this issue Aug 3, 2021 · 1 comment
Open

[CEP] Table partitioning for remote API logs #30189

kaapstorm opened this issue Aug 3, 2021 · 1 comment
Labels
CEP CommCare Enhancement Proposal

Comments

@kaapstorm
Copy link
Contributor

Abstract

Use PostgreSQL table partitioning to improve the performance of remote API logs.

Motivation

Logs are deleted after 90 days. This operation is slow, and is the ideal use case for table partitioning. Because of the size of the table, other operations are also negatively impacted. Everyday uses seldom involve querying records older than a week. This date-based filtering also suits date-based table partitioning well.

Specification

Implementing partitioning on an existing table can be done as follows:

  1. Create parent table based on a dump of the definition of the existing
    table.

     CREATE TABLE public.motech_requestlog (
         id integer NOT NULL,
         domain character varying(126) NOT NULL,
         "timestamp" timestamp with time zone NOT NULL,
         request_method character varying(12) NOT NULL,
         request_url character varying(255) NOT NULL,
         request_headers text NOT NULL,
         request_params text NOT NULL,
         request_body text,
         request_error text,
         response_status integer,
         response_body text,
         log_level integer,
         payload_id character varying(126),
         response_headers text
     );
    
     CREATE SEQUENCE public.motech_requestlog_id_seq
         START WITH 1
         INCREMENT BY 1
         NO MINVALUE
         NO MAXVALUE
         CACHE 1;
    
     ALTER SEQUENCE motech_requestlog_id_seq
         OWNED BY motech_requestlog.id;
    
  2. Set current table to inherit from parent:

     ALTER TABLE dhis2_jsonapilog
         ADD CONSTRAINT before_2021_09
         CHECK ("timestamp" < TIMESTAMP '2021-09-01 00:00:00 Z');
     -- Start of the next month ^^^
    
     ALTER TABLE dhis2_jsonapilog 
         INHERIT motech_requestlog;
    
  3. Point model table to parent table:

     class RequestLog(models.Model):
         ...
         class Meta:
             db_table = 'motech_requestlog'
    
  4. Add architect decorator

     @architect.install(
         'partition',
         type='range',
         subtype='date',
         constraint='month',
         column='timestamp',
     )
     class RequestLog(models.Model):
         ...
    
  5. Implement all of the above as a migration.

    Django will use migrations.AlterModelTable() to rename the table.
    Edit the generated migration, and create a state operation that only
    implements the state change of the AlterModelTable migration:

     def state_forwards(self, app_label, state):
         state.models[app_label, self.name_lower].options["db_table"] = self.table
         state.reload_model(app_label, self.name_lower, delay=True)
    

    Use migrations.runSQL() to execute the SQL above (remember to
    calculate the start of the next month). Call the state_forwards()
    method using runSQL()'s state_operations parameter.

  6. Define reverse_sql:

     ALTER TABLE dhis2_jsonapilog
         NO INHERIT motech_requestlog;
    
     ALTER TABLE dhis2_jsonapilog
         DROP CONSTRAINT before_next_month;
    
     DROP SEQUENCE motech_requestlog_id_seq;
    
     DROP TABLE motech_requestlog;
    

Impact on users
Remote API logs will be more responsive for administrators. No negative impact.

Impact on hosting
Remote API logs will continue to use the default database. Total table sizes will remain the same. No impact on hosting.

Backwards compatibility
This migration can be rolled back. It does not change data, just database structure.

Release Timeline
No specific timeline is being pursued, but roll-out during August seems reasonable.

Open questions and issues
I am keen for feedback on whether this approach is optimal, or whether there are better ways to achieve the same objectives.

@kaapstorm kaapstorm added the CEP CommCare Enhancement Proposal label Aug 3, 2021
@millerdev
Copy link
Contributor

Seems like a good idea to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CEP CommCare Enhancement Proposal
Projects
None yet
Development

No branches or pull requests

2 participants