Adding upsert type capability #54

tgriesser · 2013-08-30T19:39:58Z

Brought up by @adamscybot in bookshelf/bookshelf#55 - this could be a nice feature to add.

ericclemmons · 2014-04-15T04:17:47Z

I agree, this would be a nice feature!

olalonde · 2014-05-15T10:12:40Z

👍

diversario · 2014-05-15T23:22:42Z

👍

coolaj86 · 2014-06-19T02:12:23Z

I'm importing some data from a CSV and there's a good chance that a few of the records overlap from the last import (i.e. last time was imported from Jan 1st to May 31st, this time importing from May 31st to Jun 18th).

Fortunately the third party system assigns reliably unique ids.

What's the best way to go about inserting the new records and updating the old?

I haven't tried it yet, but I was thinking that it would be something like this:

var ids = records.map(function (json) { return json.id })
  ;

Records.forge(ids).fetchAll().then(function () {
  records.forEach(function (record) {
    // now the existing records are loaded in the collection ?
    Object.keys(record).forEach(function (key) {
      Records.forge(record.id).set(key, record[key]);
    });
  });
  Records.invokeThen('save').then(function () {
    console.log('Records have been either inserted or updated');
  });
});

coolaj86 · 2014-06-19T17:19:30Z

Also, sometimes the thing I'm storing is stored by a determinate id value, such as a hash. In those cases I just want to add or replace the data.

I don't always use SQL as traditional SQL. Often I use it as hybrid NoSQL with the benefit of clear relationship mapping and indexes.

clark800 · 2015-03-31T00:18:37Z

👍

diegoitaliait · 2015-04-09T14:09:24Z

Hi,
there are news about this new feature?

Or can someone recommend examples, that show how to simulate this functionality for mysql?

thx

tgriesser · 2015-04-09T21:27:35Z

Right now I'm doing it with raw, but I'm working hard on getting this available in here soon.

olalonde · 2015-05-08T04:30:42Z

Postgres just implemented upsert support by the way 👍

http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=168d5805e4c08bed7b95d351bf097cff7c07dd65

https://news.ycombinator.com/item?id=9509870

Syntax is INSERT ... ON CONFLICT DO UPDATE

hotaru355 · 2015-07-01T16:35:48Z

I was looking for a way to do a REPLACE INTO in MySql and found this feature request. Since REPLACE and INSERT have exactly the same syntax in MySql I would imagine it to be easier to implement than a ON DUPLICATE KEY UPDATE. Are there any plans to implement a REPLACE ? Would a PR be something of value?

bradsimantel · 2016-01-08T17:37:03Z

Any updates on this, especially with PostreSQL 9.5?

RubenCordeiro · 2016-01-24T20:30:45Z

I think one important question is whether or not to expose the same upsert method signature for different dialects, such as PostgreSQL and MySQL. In Sequelize, a issue has been raised regarding the return value of upsert: sequelize/sequelize#3354.

I realize that some of KnexJS library methods have distinctions regarding the return values in the context of different dialects (such as insert, where an array of the first inserted id is returned for Sqlite and MySQL, while an array of all the inserted id's is returned with PostgreSQL).

According to the documentation, the INSERT ... ON DUPLICATE KEY UPDATE syntax in MySQL has the following behaviour (http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html):

With ON DUPLICATE KEY UPDATE, the affected-rows value per row is 1 if the row is inserted as a new row, 2 if an existing row is updated, and 0 if an existing row is set to its current values.

While in PostgreSQL (http://www.postgresql.org/docs/9.5/static/sql-insert.html):

On successful completion, an INSERT command returns a command tag of the form

INSERT oid count

The count is the number of rows inserted or updated. If count is exactly one, and the target table has OIDs, then oid is the OID assigned to the inserted row. The single row must have been inserted rather than updated. Otherwise oid is zero.

If the INSERT command contains a RETURNING clause, the result will be similar to that of a SELECT statement containing the columns and values defined in the RETURNING list, computed over the row(s) inserted or updated by the command.

In this case, the return values can be changed with RETURNING clause.

Thoughts?

hayeah · 2016-02-14T06:56:47Z

I monkey patched Client_PG to add the "onConflict" method for insert. Suppose we want to upsert github oauth credentials, we can write the query like this:

const profile = {
    access_token: "blah blah",
    username: "foobar",
    // ... etc
  }

  const oauth = {
    uid: "13344398",
    provider: "github",
    created_at: new Date(),
    updated_at: new Date(),
    info: profile,
  };

  // todo: add a "timestamp" method

const insert = knex("oauths").insert(oauth).onConflict(["provider", "uid"],{
  info: profile,
  updated_at: new Date(),
});

console.log(insert.toString())

The array of column names specifies the uniquessness constraint.

insert into "authentications" ("created_at", "info", "provider", "uid", "updated_at") values ('2016-02-14T14:42:18.342+08:00', '{\"access_token\":\"blah blah\",\"username\":\"foobar\"}', 'github', '13344398', '2016-02-14T14:42:18.342+08:00') on conflict ("provider", "uid")  do update set "info" = '{\"access_token\":\"blah blah\",\"username\":\"foobar\"}', "updated_at" = '2016-02-14T14:42:18.343+08:00'

See gist: https://gist.github.com/hayeah/1c8d642df5cfeabc2a5b for the monkey patch.

This is a super hacky experiment... so don't exactly copy & paste the monkey patch into your production code : p

Known problems:

The monkey patch is on QueryBuilder affects all dialects, because Client_PG doesn't specialize the builder.
Doesn't support raw update like count = count + 1
onConflict should probably throw if query method is not insert.

Feedback?

RubenCordeiro · 2016-03-09T10:41:00Z

@hayeah I like your approach and it suits Postgres. I am going to try your monkey patch in a project to see if I can empirically detect any issues other than the ones you pointed out.

willfarrell · 2016-03-22T17:18:58Z

Syntax Suggestion: knex('table').upsert(['col1','col2']).insert({...}).update({...}); where upsert would take in the condition statement. This way it's not db specific.

Summary of the different implementations of upserts can be found at https://en.wikipedia.org/wiki/Merge_(SQL)

tthyer · 2016-03-23T19:25:48Z

I'm interested in having this capability too. Use case: building a system that is reliant on lots of outside data from an outside service; I periodically poll it for data that I save to a local MySQL db. Will probably be using knex.raw for now.

haywirez · 2016-03-24T11:40:30Z

Also interested, but in my use case would need to have it work in a way that isn't based on conflicts, as the columns don't always have 'unique' constraints - simply update entries matching the query if they exist, otherwise insert new rows.

hayeah · 2016-03-29T01:59:12Z

@haywirez I am curious as to why there are no unique constraints? Wouldn't you be exposed to race conditions?

haywirez · 2016-03-29T09:07:33Z

@hayeah I have a specific use case with time-windowed data, storing entries that have a value tied to a given day. Therefore I'm inserting and updating entries that have a "combined key" of a matching (day) timestamp, and two other IDs corresponding to PK's in other tables. Within a 24-hour window, I have to either insert them, or update them with the latest counts.

slavafomin · 2016-04-27T17:43:15Z

This would be a great feature to have!

rhys-vdw · 2016-05-10T03:57:24Z

Hi everyone who's ever commented here. I'm adding a PR Please label.

Happy to take a PR adding this functionality, but I'd like to see a discussion of the desired API here first.

PS.

^ Agreed.

I'm going to delete comments like this, if you want to add a +1 do so with the little emoji reaction thing.

reggi · 2016-05-14T16:41:49Z

I have a bit of an issue with the array of column restraints as in @willfarrell and @hayeah's examples. Not sure if these examples can support json properties. Is there a reason none of these proposals don't include where statements / proper "queries" to match the record?

proposal 1

knex('table')
  .where('id', '=', data.id)
  .upsert(data)

proposal 2

knex('table')
  .upsertQuery(knex => {
    return knex('table')
      .where('id', '=', data.id)
  })
  .upsertUpdate(knex => {
    return knex('table')
      .insert(data)
  })

proposal 3

knex('table')
  .where('id', '=', data.id)
  .insert(data)
  .upsert() // or .onConflictDoUpdate()

I'm leaning most toward something like 3.

Just to add in here's how mongodb does it.

db.collection.update(
   <query>,
   <update>,
   {
     upsert: <boolean>,
     multi: <boolean>,
     writeConcern: <document>
   }
)

hayeah · 2016-05-24T19:22:48Z

@reggi I believe my monkey patch is compatible with where...

catamphetamine · 2016-08-24T13:46:47Z

@reggi I don't see your point.
Can you elaborate more on which functionality is missing from the approach proposed in @willfarrell and @hayeah's examples.
Why do you need where at all?
It's just an insert operation.

catamphetamine · 2016-08-24T13:49:37Z

@reggi The MongoDB example you provided reads "First try to UPDATE WHERE ... then do an INSERT if no document matches the query" whereas SQL UPSERT reads "INSERT INTO ... UPDATING in case a row with this primary key already exists".
So, I guess, you're talking about a whole different "upsert" than it's implemented in SQL databases.

catamphetamine · 2016-08-24T13:59:32Z

I would propose this API:

knex.createTable('test')
   .bigserial('id')
   .varchar('unique').notNull().unique()
   .varchar('whatever')

knex.table('test').insert(object, { upsert: ['unique'] })

.insert() function would analyse the second parameter.
If it's a string, then it's the old returning parameter.
If it's an object, then it's an options parameter having options.returning and options.upsert, where options.upsert is a list of the unique keys (can be > 1 in case of a compound unique key constraint).
Then a SQL query is generated which simply excludes the primary key and all options.upsert keys from the object (via clone(object) && delete cloned_object.id && delete cloned_object.unique) and then uses that cloned_object stripped of the primary (and unique) keys to construct the SET clause in the second part of the SQL query: ... ON CONFLICT DO UPDATE SET [iterate cloned_object].

I guess that would be the most simple and unambiguous solution homogeneous with the present API.

timhuff · 2018-02-03T19:33:35Z

As an aside, the fact that it doesn't run immediately is often useful. Sometimes you wanna pass the thing around, configuring it, before executing. There's also situations where you can do stuff like...

const medicalBuildings = knex.select('building_id').from('buildings').where({type: 'medical'})
const medicalWorkers = knex.select().from('workers').whereIn('building', medicalBuildings)

(super contrived example but let's run with it)

I don't actually want to run that first statement - it's just part of my 2nd one.

elhigu · 2018-02-04T02:15:53Z

Not to mention that if all query builders would execute on creation the builder pattern queries would trigger before building is done. It would not work at all without having some terminator method (that executes the query).

timhuff · 2018-02-04T11:05:53Z

@elhigu I mean... I guess you could just always run it on the next tick, right? I'm not suggesting that would by any means be a good idea but how many queries are actually created and executed on different ticks?

elhigu · 2018-02-04T11:38:41Z

@timhuff I hadn't thought about that. Yeah I think that would be possible too. I find case quite common where one starts building query, then fetch some async data and keep on building more. I don't do that very often though.

elhigu · 2018-02-06T14:46:04Z

@lukewlms that ’execute()’ -like method is called ’.then()’ you can call always it when you like to execute query and get promise. It is just how ’thenable’ works and it is explained in promise spec. It is one important and widely used concept in javascript when dealing with promises and async/await (which are pretty much just glorified shortcuts for Promise.resolve and .then). Also if you are executing queries without handling results you are looking for problems like app crashing.

elhigu · 2018-02-06T19:11:22Z

Actually its better to just follow this PR about the upsert feature implementation #2197 it has already API designed how it should work. In this thread is not really any useful information that is not already mentioned in comments of that PR. If needed (PR is closed and never completed) lets open new issue for this one with additional API description.

timhuff · 2018-02-06T19:20:59Z

@elhigu Thanks for the heads up! I was unaware of that thread. Good to hear we're making progress on an upsert coming to the API. Looks like 6 months ago it failed 1 of the 802 tests and so it never passed travis-ci. Is that 1 failing test case the only thing keeping this from becoming a part of the knex API?

elhigu · 2018-02-06T19:24:54Z

@timhuff there was only initial implementation done, it must be completely rewritten. Most important part of that PR is the common API design, which can be supported by most of the dialects. So the feature comes when someone just decides to implement that API. If no-one else does that and some day I have some extra time or need it badly, I'll do it myself. That is one of the most important feature I would like knex to get (in addition to joins in updates).

timhuff · 2018-02-06T19:27:10Z

@elhigu Thanks for filling me in. I'll have to read up on the progress here when I get a little more time.

ashconnell · 2019-10-09T04:04:41Z

I'm not sure if this helps anyone or if i'm just a noob, but for the solution from @timhuff I had to wrap my constraint in quotes because i was getting a query syntax error.

const contraint = '("a", "b")'

I've come to this issue several times in search of a knex-based Postgres upsert. If anyone else needs this, here's how to do it. I've tested this against both single and composite unique keys.

The Setup

Create a unique key constraint on the table using the below. I needed a composite key constraint:
table.unique(['a', 'b'])
The Function

(edit: updated to use raw parameter bindings)
const upsert = (params)=> {
  const {table, object, constraint} = params;
  const insert = knex(table).insert(object);
  const update = knex.queryBuilder().update(object);
  return knex.raw(`? ON CONFLICT ${constraint} DO ? returning *`, [insert, update]).get('rows').get(0);
};
Usage
const objToUpsert = {a:1, b:2, c:3}

upsert({
	table: 'test',
	object: objToUpsert,
	constraint: '(a, b)',
})
If your constraint isn't composite then, naturally, that one line would just be constraint: '(a)'.

This will return either the updated object or the inserted object.

A note about composite nullable indices

If you have a composite index (a,b) and b is nullable, then values (1, NULL) and (1, NULL) are considered mutually unique by Postgres (I don't get it either). If this is your use case, you'll need make a partial unique index and then test for null before upsert to determine which constraint to use. Here's how to make the partial unique index: CREATE UNIQUE INDEX unique_index_name ON table (a) WHERE b IS NULL. If your test determines that b is null, then you'll need to use this constraint in your upsert: constraint: '(a) WHERE b IS NULL'. If a is also nullable, I would guess you'd need 3 unique indices and 4 if/else branches (though this is not my use case, so I'm not sure).

Here's the compiled javascript.

Hope someone finds this useful. @elhigu ~~Any comment on the usage of knex().update(object)?~~ (edit: nevermind - saw the warning - using knex.queryBuilder() now)

elhigu · 2019-10-09T08:53:17Z

Removed some unrelated discussions (about how promises/thenables work).

luke-robertson · 2019-12-09T03:17:12Z

Did this get added ?

elhigu · 2019-12-09T08:24:57Z

No. There is feature request and spec in #3186

bendrucker mentioned this issue Jun 9, 2014

Upsert method bookshelf/bookshelf#381

Closed

RubenCordeiro mentioned this issue Jan 18, 2016

Suggestion on PostgreSQL INSERT ON CONFLICT (Upsert) #1121

Closed

haywirez mentioned this issue Mar 24, 2016

Better way to upsert (if querying isn't done on id's)? Vincit/objection.js#101

Closed

rhys-vdw mentioned this issue May 10, 2016

Query Builder MERGE aka UPSERT #1351

Closed

elhigu closed this as completed Feb 6, 2018

edorivai mentioned this issue Apr 18, 2018

[knex] Add QueryBuilder to knex.raw bindings DefinitelyTyped/DefinitelyTyped#25100

Merged

9 tasks

Kouzukii mentioned this issue Mar 13, 2019

Add import & export by file, admin api and methods exported by start() Unleash/unleash#395

Merged

1 task

Adding upsert type capability #54

Adding upsert type capability #54

Comments

tgriesser commented Aug 30, 2013

ericclemmons commented Apr 15, 2014

olalonde commented May 15, 2014

diversario commented May 15, 2014

coolaj86 commented Jun 19, 2014

coolaj86 commented Jun 19, 2014

clark800 commented Mar 31, 2015

diegoitaliait commented Apr 9, 2015

tgriesser commented Apr 9, 2015

olalonde commented May 8, 2015

hotaru355 commented Jul 1, 2015

bradsimantel commented Jan 8, 2016

RubenCordeiro commented Jan 24, 2016

hayeah commented Feb 14, 2016

RubenCordeiro commented Mar 9, 2016

willfarrell commented Mar 22, 2016

tthyer commented Mar 23, 2016

haywirez commented Mar 24, 2016

hayeah commented Mar 29, 2016

haywirez commented Mar 29, 2016

slavafomin commented Apr 27, 2016 • edited

rhys-vdw commented May 10, 2016

reggi commented May 14, 2016 • edited

hayeah commented May 24, 2016

catamphetamine commented Aug 24, 2016 • edited

catamphetamine commented Aug 24, 2016

catamphetamine commented Aug 24, 2016 • edited

timhuff commented Feb 3, 2018

elhigu commented Feb 4, 2018 • edited

timhuff commented Feb 4, 2018

elhigu commented Feb 4, 2018

elhigu commented Feb 6, 2018

elhigu commented Feb 6, 2018

timhuff commented Feb 6, 2018

elhigu commented Feb 6, 2018 • edited

timhuff commented Feb 6, 2018

ashconnell commented Oct 9, 2019 • edited

The Setup

The Function

Usage

A note about composite nullable indices

elhigu commented Oct 9, 2019

luke-robertson commented Dec 9, 2019

elhigu commented Dec 9, 2019 • edited

slavafomin commented Apr 27, 2016 •

edited

reggi commented May 14, 2016 •

edited

catamphetamine commented Aug 24, 2016 •

edited

catamphetamine commented Aug 24, 2016 •

edited

elhigu commented Feb 4, 2018 •

edited

elhigu commented Feb 6, 2018 •

edited

ashconnell commented Oct 9, 2019 •

edited

elhigu commented Dec 9, 2019 •

edited