Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up basket and basketitem table? #2827

Open
5 tasks
abeglova opened this issue Nov 6, 2023 · 7 comments
Open
5 tasks

Clean up basket and basketitem table? #2827

abeglova opened this issue Nov 6, 2023 · 7 comments
Labels
Bug Something isn't working product:xpro

Comments

@abeglova
Copy link
Contributor

abeglova commented Nov 6, 2023

As part of the ol-data-platform work we noticed that xpro has 74,354 record in ecommerce_basket and 56,655 records in ecommerce_basketitem. Should baskets and basket items expire and get deleted from the database after some time?

Acceptance criteria

  • Delete/Expire the Basket and Basket Items upon completion of a checkout (i.e. Order fulfilled).
  • Create a new celery task that runs daily(MID NIGHT - UTC) to check for expired Baskets.
  • Add a configurable timedelta to mark the life span of a basket. This is a threshold value for a basket to live. Let's do a 15 days delta as a default value but we will be able to change it anytime since it will be configurable.
  • Conditions for basket cleanup operation
    • Delete a Basket if there is no associated Basket Item
    • Delete a Basket & associated Basket Items if the Basket has expired. (e.g. Delete a Basket that was created 15 days ago, as per default delta)
  • Check for any Basket, Basket Item associations with other models while implementing this.

Out of scope:

  • No need for initial data migration. When the task runs the first time it would automatically clean the existing data. Although it would take a bit longer to complete its processing.
@abeglova abeglova added the Bug Something isn't working label Nov 6, 2023
@arslanashraf7
Copy link
Contributor

I was looking at the data and here are the details.

Basket/Basket Item creation:

A Basket and associated Basket Item objects are created as soon as a user hits the checkout page. The ideal way to reach that checkout page is by clicking the Enroll Now button on any course page. There is only one Basket per user and we use that every time a user hits the checkout process.

In the current implementation, There is only one basket and one basket item for each order but the system supports multiple basket items per basket.

Data Stats:

Environment Baskets Basket Items
Production 74,354 56,655
RC 158 68

Why are there empty baskets? (Without Basket Items):

  1. When an order is executed successfully, We delete associated basket items but the basket stays there.
  2. When a user directly opens /checkout page URL or when the product id.
  3. When a user opens the checkout/?product=<product_id> but the product id is invalid.

Proposed Solutions:

1. A celery Task

A Celery task that runs after a certain period of time and deletes the basket and basket item objects within a time range based on the threshold.

2. Management command

A management command that would take start/end time and delete all the basket and basket items objects that were created in that duration.

3. A one-time data migration

We write a one-time data migration to clean the existing basket and basket items in a specific range. This is not an ideal solution because it will fix the data but not the root cause. And it will be a one-time operation so we'll end up having baskets and basket items in the future.


Questions on cleanup criteria

  1. How do we decide that a basket has expired? e.g. A basket is empty or a basket is not empty but was created sometime in the past.
  2. If we go for a Celery task, How do we decide the time interval?
  3. Should we delete the basket object as well upon a successful checkout?

FYI, @Ferdi @rachellougee @abeglova

@rachellougee
Copy link
Contributor

Thanks for the detail, @arslanashraf7.

When an order is executed successfully, We delete associated basket items but the basket stays there.

Just curious why we don't delete basket if order is executed successfully. I know you mentioned that there are 2 other ways to create empty basket, but from this use case, it makes sense to delete both unless I am missing something.

I think celery task makes sense to me to eliminate any manual work.

@mbertrand
Copy link
Member

In terms of hubspot integration, I don't think there will be any repercussions from deleting old baskets/items. Hubspot syncs the Order model, and does not interact with the Basket or BasketItem models directly (though the Order is created from the Basket in ecommerce.api.create_unfulfilled_order).

@arslanashraf7
Copy link
Contributor

arslanashraf7 commented Nov 8, 2023

Update

Implementation details:

Based on the opinions and discussions done in a Slack Thread, we can go ahead with below-proposed implementation:

  1. Delete/Expire the Basket and Basket Items upon completion of a checkout (i.e. Order fulfilled).
  2. Create a new celery task that runs daily(MID NIGHT - UTC) to check for expired Baskets.
  3. Add a configurable timedelta to mark the life span of a basket. This is a threshold value for a basket to live. Let's do a 15 days delta as a default value but we will be able to change it anytime since it will be configurable.
  4. Conditions for basket cleanup operation
    • Delete a Basket if there is no associated Basket Item
    • Delete a Basket & associated Basket Items if the Basket has expired. (e.g. Delete a Basket that was created 15 days ago, as per default delta)

NOTE: Check for any Basket, Basket Item associations with other models while implementing this.

Out of scope:

  • No need for initial data migration. When the task runs the first time it would automatically clean the existing data. Although it would take a bit longer to complete its processing.

*EDIT: Updated ticket description with acceptance criteria accordingly.

FYI. @Ferdi @rhysyngsun @jkachel (Please share your thoughts if you have any questions on proposed implementation)

@rhysyngsun
Copy link
Collaborator

One thing we need to watch out for is deleting a basket that, while stale, the application could still could attempt to use.

As part of this, we need to audit our current code that modifies Baskets and BasketItems and most likely update it to be in line with this:

  • Whenever a Basket's items are modified, take a select_for_update() lock on the basket.
  • Once the application has applied the relevant changes to the Basket's items, update Basket.updated_at to the current time.

This will ensure that the deleting task can't delete a Basket out from under the application. This does create a new problem, though, because at any given point some number of Basket records are going to be locked and if those are included in the set of records that need to be deleted it could make the delete query really slow or just fail altogether.

To address that, we'd want to just skip the locked rows - if they're locked they're being written to and so they're no longer stale so we don't want them anyway. To do this in django you can do a delete with a subquery and utlize select_for_update(), passing the skip_locked arg so the query doesn't wait for row locks to be released:

Basket.objects.filter(
  id__in=(
    Basket.objects.select_for_update(skip_locked=True)
      .filter(updated_at__lte=cutoff_date)
  )
).delete()

This will briefly lock those records, but that's probably an ok tradeoff because this should hopefully be pretty quick. That said, definitely add some descriptive logging around the task describing when it's started, how long it took, and how many effected rows.

@pdpinch
Copy link
Member

pdpinch commented Nov 14, 2023

@jkachel can we get your input on this?

How/can we standardize this behavior across applications?

Does this have any implications for a unified ecommerce application?

@jkachel
Copy link
Contributor

jkachel commented Nov 14, 2023

I think this plan is fine. I did some looking at the BasketSerializer and it does do a select_for_update when it's updating the basket itself. (See

basket = Basket.objects.select_for_update().get(id=basket.id)
which gets called in the update method a couple times.) Given this, it should be pretty easy to have it bump the updated_at on the Basket record so it'd be easier to see if a basket is in actual use or not. (I tested this on RC - my account there has a basket from 2022 in it and adding a new item to the cart uses the same basket, just with my new item in it.)

Clearing the basket at checkout is how MITx Online works, so making this change would bring xPRO in line with that.

There's no implication for a unified ecommerce application - the app itself wouldn't maintain a basket at all, it'd just send a message to it to add whatever product to it. The ecommerce app would treat baskets as ephemeral. Once the checkout is complete, we'd clear the basket object as well. (In other words, it'd work like MITx Online does now.) We may add a setting for persistence time for baskets to support the use case of learners adding multiple things to the cart.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working product:xpro
Projects
None yet
Development

No branches or pull requests

7 participants