Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optionally return PKs? #168

Open
hottwaj opened this issue Jun 15, 2023 · 11 comments · May be fixed by #169
Open

optionally return PKs? #168

hottwaj opened this issue Jun 15, 2023 · 11 comments · May be fixed by #169

Comments

@hottwaj
Copy link

hottwaj commented Jun 15, 2023

Would it be possible to provide an option to return PKs of inserted objects?

Given that INSERT is used to put data into the destination table after COPY to the temporary table, it looks like it should be possible via a "RETURNING id" (or whatever the PK field name is)?

Thanks! :)

@geoffrey-eisenbarth
Copy link

Here's a bit of code that will do that. I had considered adding this into my PR #156 but that PR seems to have stalled.

If there's an interest in adding this functionality, I don't mind submitting a separate PR, @palewire

First, need to subclass CopyMapping:

from postgres_copy import CopyMapping

class ResultsCopyMapping(CopyMapping):
  def insert_suffix(self) -> str:
    """Add `RETURNING` clause to get newly created/updated ids."""
    suffix = super().insert_suffix()
    suffix = suffix.split(';')[0] + ' RETURNING id;'
    return suffix
    
  def post_insert(self, cursor) -> None:
    """Extend to retrieve results from `RETURNING` clause."""
    self.obj_ids = [r[0] for r in cursor.fetchall()]

Unfortunately, I don't believe it's possible to implement this "ResultsCopyMapping" the "normal way" (e.g., by putting objects = CopyManager() on your model), so it has to be called explicitly by subclassing CopyQuerySet:

from postgres_copy import CopyQuerySet

class ResultsCopyQuerySet(CopyQuerySet):
  def from_csv(self, csv_path_or_obj, mapping=None, **kwargs):
    mapping = ResultsCopyMapping(self.model, csv_path_or_obj, mapping=None, **kwargs)
    count = mapping.save(silent=True)
    objs = self.model.objects.filter(id__in=mapping.obj_ids)
    return objs, count

Then, finally, on your model you do

class MyModel(models.Model):
  [...]
  objects = ResultsCopyQuerySet.as_manager()

This code is slightly different than the implementation I have that's working for me, so let me know if you have any issues with it!

@hottwaj
Copy link
Author

hottwaj commented Jun 15, 2023

thanks! That's really helpful :)

If an API is added for this, my vote would be for that to be able to return either i) raw IDs (i.e. avoid the SELECT in your code above) or ii) full ORM instances (as in your example)

Also I think option ii) could be made more efficient by selecting the last N rows in the table, where N is the number of rows inserted, rather than doing the selection by id__in as in your code? would need the SELECT to be within same transaction and ordering to be by PK for that to work...

@geoffrey-eisenbarth
Copy link

Definitely! In my case, just grabbing the last N rows in the table wouldn't work because I'm using code that allows "upsert," so it might not always be the case that the rows imported via CSV match the most recent rows in the db.

If you just want to get the ids, just change from_csv to

  def from_csv(self, csv_path_or_obj, mapping=None, **kwargs):
    mapping = ResultsCopyMapping(self.model, csv_path_or_obj, mapping=None, **kwargs)
    count = mapping.save(silent=True)
    return mapping.obj_ids

@hottwaj
Copy link
Author

hottwaj commented Jun 15, 2023

aha thanks I did not consider how the "last N" would interact with "upsert" :)

hope some progress on that PR can made too

@palewire
Copy link
Owner

palewire commented Jun 15, 2023

What would the PR be? Adding a post_insert hook?

@geoffrey-eisenbarth
Copy link

What would the PR be? Adding a post_insert hook?

I'm not 100% on the best implementation API-wise. The necessary hooks are all already there (insert_suffix and post_insert), but I suppose we could add a kwarg to from_csv, something like returning , that would allow the user to specify count (default), ids, or objs (although I'm not sure if it's frowned upon to add kwargs that accept a limited number of expected string values).

@palewire
Copy link
Owner

Gotcha. If this is simply a common extra bit to slap into the hook, I don't think it needs to be integrated into the repo. Sharing the snippet seems like enough. I'd be happy to include the example in the docs as a cookbook style snippet.

@geoffrey-eisenbarth
Copy link

Yeah, I agree that's probably the best approach. I don't mind starting a PR on it.

What do you think about appending it to the "Extending with hooks" section?

geoffrey-eisenbarth added a commit to geoffrey-eisenbarth/django-postgres-copy that referenced this issue Jun 15, 2023
@geoffrey-eisenbarth geoffrey-eisenbarth linked a pull request Jun 15, 2023 that will close this issue
@hottwaj
Copy link
Author

hottwaj commented Jun 15, 2023

I understand if you would prefer not to maintain this in the repo (e.g. due to time constraints, not being core functionality etc).

my opinion (which you are welcome to ignore 😉 ) is that as this functionality is provided by django out of the box it would make this repo stronger by doing the same instead of providing a harder-to-find code snippet containing SQL manipulation...

as I said just my opinion - obviously just trying to change your mind - and understand if you decide against. Thanks!

@palewire
Copy link
Owner

palewire commented Jun 15, 2023

Can you point me to where Django implements it? If we can include something that 100% matches a public API in Django I would reconsider.

@hottwaj
Copy link
Author

hottwaj commented Jun 15, 2023

django's QuerySet.bulk_create does something similar to the snippet @geoffrey-eisenbarth first provided: it returns the inserted objects with their PK set to the value in the database. See https://docs.djangoproject.com/en/4.2/ref/models/querysets/#bulk-create

Actually the django docs are not very clear that the PK is set on the returned objects, but it is :)

Relevant django ticket for this is here: https://code.djangoproject.com/ticket/19527

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants