interest in support for in-memory data? #25

jamesturk · 2016-07-08T23:57:21Z

I wanted to check if a patch to support something like:

CopyMapping(model=Entry, data=huge_list_of_unsaved_entries) would be a welcome patch

I have some code I was considering making into a library that uses pg COPY to replace Django's bulk_create in the form of a function w/ the signature:

def postgres_bulk_copy(objects, ModelCls=None,
                       table_name=None,
                       fields=None, not_null=None, null=None,
                       ):

I was considering abstracting this into a generic library to use Postgres' COPY TO/FROM when I found this library & figured I'd check if you'd be open to expanding the scope for this use case.

The text was updated successfully, but these errors were encountered:

palewire · 2016-07-09T01:43:08Z

Maybe! Just to make sure I understand, the idea is to insert data into an existing table managed by a Django model?

jamesturk · 2016-07-09T04:17:35Z

Yep, the way I have it working is that you instantiate a bunch of objects of the model type. It'd look something like:

entries = []
for data in data_from_some_source():
     # by instantiating Entry here we get a few things like proper defaults
     entries.append(Entry(**item))

# slow way
Entry.objects.bulk_create(entries)

# fast way
CopyMapping(model=Entry, data=entries).save()

& internally it'd essentially create a temporary CSV file w/ all of the data then load it via the same COPY mechanism.

palewire · 2016-07-09T15:00:29Z

Let's do it!

On Fri, Jul 8, 2016, 9:17 PM James Turk notifications@github.com wrote:

Yep, the way I have it working is that you instantiate a bunch of objects
of the model type. It'd look something like:

entries = []
for data in data_from_some_source():
# by instantiating Entry here we get a few things like proper defaults
entries.append(Entry(**item))

slow way

Entry.objects.bulk_create(entries)

fast way

CopyMapping(model=Entry, data=entries).save()

& internally it'd essentially create a temporary CSV file w/ all of the
data then loads it via the same COPY mechanism.

—
You are receiving this because you commented.

Reply to this email directly, view it on GitHub
#25 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AAAnCb64uIgiRNztsEcF5WNzf2ay7esfks5qTyDfgaJpZM4JIfOl
.

virusdefender · 2017-06-06T03:49:23Z

Is there any progress?

denhartog · 2018-01-12T02:42:13Z

I thought about this when I first came across this project a few weeks ago. If I remember correctly, the in-memory object would need a read() and readlines() method based on psycopg2's copy_expert() method (see: https://github.com/psycopg/psycopg2/blob/6da3e7ee69971cd6cb692765a4d66a5ce405f104/psycopg/cursor_type.c#L1366)

jamesturk · 2018-01-12T16:32:21Z

I never got a chance to come back & contribute what I had, but here it is as a POC, in case someone else wants to pick this up.

import csv
from io import StringIO
from django.db import connection


def postgres_bulk_copy(objects, ModelCls=None,
                       table_name=None,
                       fields=None, not_null=None, null=None,
                       include_id=False,
                       ):
    _fields = []
    _not_null = []
    _null = []

    # if a model was passed in, use it to get defaults
    if ModelCls:
        for f in ModelCls._meta.get_fields():
            if (not f.auto_created and
                    not (f.is_relation and f.many_to_many) and
                    f.column):
                _fields.append(f.column)
                if f.null:
                    _null.append(f.column)
                else:
                    _not_null.append(f.column)

        # prefer passed in overrides if they exist
        table_name = table_name or ModelCls._meta.db_table
        fields = fields or _fields
        not_null = not_null or _not_null
        null = null or _null
        if include_id:
            fields.append('id')

        objects = [
            {k: getattr(o, k) for k in fields} for o in objects
        ]

    tmp = StringIO()
    w = csv.DictWriter(tmp, fieldnames=fields)
    w.writeheader()
    w.writerows(objects)

    # flush and seek to start
    tmp.flush()
    tmp.seek(0)

    cursor = connection.cursor()
    sql = "COPY {}({}) FROM STDIN WITH CSV HEADER".format(table_name,
                                                          ', '.join(fields)
                                                          )
    if null:
        sql += " FORCE NULL {}".format(', '.join(null))
    if not_null:
        sql += " FORCE NOT NULL {}".format(', '.join(not_null))

    cursor.copy_expert(sql, tmp)
    # need this commit here so lookups immediately after will work
    cursor.connection.commit()

palewire · 2018-01-16T13:46:17Z

Thanks for sharing your work, @jamesturk. If anybody wants to take a run at a pull request it would be greatly appreciated.

palewire · 2018-02-06T00:36:09Z

I think it's only partial progress towards the goal here, but I just merged #68, which adds support for submitting file objects in addition to file paths.

palewire added the question label Sep 6, 2017

palewire added the help wanted label Oct 11, 2017

palewire mentioned this issue Feb 3, 2018

Allow csv_path to be a file-like object as well #68

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

interest in support for in-memory data? #25

interest in support for in-memory data? #25

jamesturk commented Jul 8, 2016

palewire commented Jul 9, 2016

jamesturk commented Jul 9, 2016 •

edited

palewire commented Jul 9, 2016

slow way

fast way

virusdefender commented Jun 6, 2017

denhartog commented Jan 12, 2018

jamesturk commented Jan 12, 2018 •

edited

palewire commented Jan 16, 2018

palewire commented Feb 6, 2018

interest in support for in-memory data? #25

interest in support for in-memory data? #25

Comments

jamesturk commented Jul 8, 2016

palewire commented Jul 9, 2016

jamesturk commented Jul 9, 2016 • edited

palewire commented Jul 9, 2016

slow way

fast way

virusdefender commented Jun 6, 2017

denhartog commented Jan 12, 2018

jamesturk commented Jan 12, 2018 • edited

palewire commented Jan 16, 2018

palewire commented Feb 6, 2018

jamesturk commented Jul 9, 2016 •

edited

jamesturk commented Jan 12, 2018 •

edited