Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding upsert type capability #54

Closed
tgriesser opened this issue Aug 30, 2013 · 54 comments
Closed

Adding upsert type capability #54

tgriesser opened this issue Aug 30, 2013 · 54 comments

Comments

@tgriesser
Copy link
Member

Brought up by @adamscybot in bookshelf/bookshelf#55 - this could be a nice feature to add.

@ericclemmons
Copy link

I agree, this would be a nice feature!

@olalonde
Copy link

👍

1 similar comment
@diversario
Copy link

👍

@coolaj86
Copy link
Contributor

I'm importing some data from a CSV and there's a good chance that a few of the records overlap from the last import (i.e. last time was imported from Jan 1st to May 31st, this time importing from May 31st to Jun 18th).

Fortunately the third party system assigns reliably unique ids.

What's the best way to go about inserting the new records and updating the old?

I haven't tried it yet, but I was thinking that it would be something like this:

var ids = records.map(function (json) { return json.id })
  ;

Records.forge(ids).fetchAll().then(function () {
  records.forEach(function (record) {
    // now the existing records are loaded in the collection ?
    Object.keys(record).forEach(function (key) {
      Records.forge(record.id).set(key, record[key]);
    });
  });
  Records.invokeThen('save').then(function () {
    console.log('Records have been either inserted or updated');
  });
});

@coolaj86
Copy link
Contributor

Also, sometimes the thing I'm storing is stored by a determinate id value, such as a hash. In those cases I just want to add or replace the data.

I don't always use SQL as traditional SQL. Often I use it as hybrid NoSQL with the benefit of clear relationship mapping and indexes.

@clark800
Copy link

👍

@diegoitaliait
Copy link

Hi,
there are news about this new feature?

Or can someone recommend examples, that show how to simulate this functionality for mysql?

thx

@tgriesser
Copy link
Member Author

Right now I'm doing it with raw, but I'm working hard on getting this available in here soon.

@olalonde
Copy link

olalonde commented May 8, 2015

Postgres just implemented upsert support by the way 👍

http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=168d5805e4c08bed7b95d351bf097cff7c07dd65

https://news.ycombinator.com/item?id=9509870

Syntax is INSERT ... ON CONFLICT DO UPDATE

@hotaru355
Copy link

I was looking for a way to do a REPLACE INTO in MySql and found this feature request. Since REPLACE and INSERT have exactly the same syntax in MySql I would imagine it to be easier to implement than a ON DUPLICATE KEY UPDATE. Are there any plans to implement a REPLACE ? Would a PR be something of value?

@bradsimantel
Copy link

Any updates on this, especially with PostreSQL 9.5?

@RubenCordeiro
Copy link

I think one important question is whether or not to expose the same upsert method signature for different dialects, such as PostgreSQL and MySQL. In Sequelize, a issue has been raised regarding the return value of upsert: sequelize/sequelize#3354.

I realize that some of KnexJS library methods have distinctions regarding the return values in the context of different dialects (such as insert, where an array of the first inserted id is returned for Sqlite and MySQL, while an array of all the inserted id's is returned with PostgreSQL).

According to the documentation, the INSERT ... ON DUPLICATE KEY UPDATE syntax in MySQL has the following behaviour (http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html):

With ON DUPLICATE KEY UPDATE, the affected-rows value per row is 1 if the row is inserted as a new row, 2 if an existing row is updated, and 0 if an existing row is set to its current values.

While in PostgreSQL (http://www.postgresql.org/docs/9.5/static/sql-insert.html):

On successful completion, an INSERT command returns a command tag of the form

INSERT oid count

The count is the number of rows inserted or updated. If count is exactly one, and the target table has OIDs, then oid is the OID assigned to the inserted row. The single row must have been inserted rather than updated. Otherwise oid is zero.

If the INSERT command contains a RETURNING clause, the result will be similar to that of a SELECT statement containing the columns and values defined in the RETURNING list, computed over the row(s) inserted or updated by the command.

In this case, the return values can be changed with RETURNING clause.

Thoughts?

@hayeah
Copy link

hayeah commented Feb 14, 2016

I monkey patched Client_PG to add the "onConflict" method for insert. Suppose we want to upsert github oauth credentials, we can write the query like this:

const profile = {
    access_token: "blah blah",
    username: "foobar",
    // ... etc
  }

  const oauth = {
    uid: "13344398",
    provider: "github",
    created_at: new Date(),
    updated_at: new Date(),
    info: profile,
  };

  // todo: add a "timestamp" method

const insert = knex("oauths").insert(oauth).onConflict(["provider", "uid"],{
  info: profile,
  updated_at: new Date(),
});

console.log(insert.toString())

The array of column names specifies the uniquessness constraint.

insert into "authentications" ("created_at", "info", "provider", "uid", "updated_at") values ('2016-02-14T14:42:18.342+08:00', '{\"access_token\":\"blah blah\",\"username\":\"foobar\"}', 'github', '13344398', '2016-02-14T14:42:18.342+08:00') on conflict ("provider", "uid")  do update set "info" = '{\"access_token\":\"blah blah\",\"username\":\"foobar\"}', "updated_at" = '2016-02-14T14:42:18.343+08:00'

See gist: https://gist.github.com/hayeah/1c8d642df5cfeabc2a5b for the monkey patch.

This is a super hacky experiment... so don't exactly copy & paste the monkey patch into your production code : p

Known problems:

  • The monkey patch is on QueryBuilder affects all dialects, because Client_PG doesn't specialize the builder.
  • Doesn't support raw update like count = count + 1
  • onConflict should probably throw if query method is not insert.

Feedback?

@RubenCordeiro
Copy link

@hayeah I like your approach and it suits Postgres. I am going to try your monkey patch in a project to see if I can empirically detect any issues other than the ones you pointed out.

@willfarrell
Copy link

Syntax Suggestion: knex('table').upsert(['col1','col2']).insert({...}).update({...}); where upsert would take in the condition statement. This way it's not db specific.

Summary of the different implementations of upserts can be found at https://en.wikipedia.org/wiki/Merge_(SQL)

@tthyer
Copy link

tthyer commented Mar 23, 2016

I'm interested in having this capability too. Use case: building a system that is reliant on lots of outside data from an outside service; I periodically poll it for data that I save to a local MySQL db. Will probably be using knex.raw for now.

@haywirez
Copy link

Also interested, but in my use case would need to have it work in a way that isn't based on conflicts, as the columns don't always have 'unique' constraints - simply update entries matching the query if they exist, otherwise insert new rows.

@hayeah
Copy link

hayeah commented Mar 29, 2016

@haywirez I am curious as to why there are no unique constraints? Wouldn't you be exposed to race conditions?

@haywirez
Copy link

@hayeah I have a specific use case with time-windowed data, storing entries that have a value tied to a given day. Therefore I'm inserting and updating entries that have a "combined key" of a matching (day) timestamp, and two other IDs corresponding to PK's in other tables. Within a 24-hour window, I have to either insert them, or update them with the latest counts.

@slavafomin
Copy link

slavafomin commented Apr 27, 2016

This would be a great feature to have!

@rhys-vdw
Copy link
Member

Hi everyone who's ever commented here. I'm adding a PR Please label.

Happy to take a PR adding this functionality, but I'd like to see a discussion of the desired API here first.

PS.

^ Agreed.

I'm going to delete comments like this, if you want to add a +1 do so with the little emoji reaction thing.

@reggi
Copy link

reggi commented May 14, 2016

I have a bit of an issue with the array of column restraints as in @willfarrell and @hayeah's examples. Not sure if these examples can support json properties. Is there a reason none of these proposals don't include where statements / proper "queries" to match the record?

proposal 1

knex('table')
  .where('id', '=', data.id)
  .upsert(data)

proposal 2

knex('table')
  .upsertQuery(knex => {
    return knex('table')
      .where('id', '=', data.id)
  })
  .upsertUpdate(knex => {
    return knex('table')
      .insert(data)
  })

proposal 3

knex('table')
  .where('id', '=', data.id)
  .insert(data)
  .upsert() // or .onConflictDoUpdate()

I'm leaning most toward something like 3.

Just to add in here's how mongodb does it.

db.collection.update(
   <query>,
   <update>,
   {
     upsert: <boolean>,
     multi: <boolean>,
     writeConcern: <document>
   }
)

@hayeah
Copy link

hayeah commented May 24, 2016

@reggi I believe my monkey patch is compatible with where...

@catamphetamine
Copy link

catamphetamine commented Aug 24, 2016

@reggi I don't see your point.
Can you elaborate more on which functionality is missing from the approach proposed in @willfarrell and @hayeah's examples.
Why do you need where at all?
It's just an insert operation.

@catamphetamine
Copy link

@reggi The MongoDB example you provided reads "First try to UPDATE WHERE ... then do an INSERT if no document matches the query" whereas SQL UPSERT reads "INSERT INTO ... UPDATING in case a row with this primary key already exists".
So, I guess, you're talking about a whole different "upsert" than it's implemented in SQL databases.

@catamphetamine
Copy link

catamphetamine commented Aug 24, 2016

I would propose this API:

knex.createTable('test')
   .bigserial('id')
   .varchar('unique').notNull().unique()
   .varchar('whatever')

knex.table('test').insert(object, { upsert: ['unique'] })

.insert() function would analyse the second parameter.
If it's a string, then it's the old returning parameter.
If it's an object, then it's an options parameter having options.returning and options.upsert, where options.upsert is a list of the unique keys (can be > 1 in case of a compound unique key constraint).
Then a SQL query is generated which simply excludes the primary key and all options.upsert keys from the object (via clone(object) && delete cloned_object.id && delete cloned_object.unique) and then uses that cloned_object stripped of the primary (and unique) keys to construct the SET clause in the second part of the SQL query: ... ON CONFLICT DO UPDATE SET [iterate cloned_object].

I guess that would be the most simple and unambiguous solution homogeneous with the present API.

@timhuff
Copy link
Contributor

timhuff commented Feb 3, 2018

As an aside, the fact that it doesn't run immediately is often useful. Sometimes you wanna pass the thing around, configuring it, before executing. There's also situations where you can do stuff like...

const medicalBuildings = knex.select('building_id').from('buildings').where({type: 'medical'})
const medicalWorkers = knex.select().from('workers').whereIn('building', medicalBuildings)

(super contrived example but let's run with it)

I don't actually want to run that first statement - it's just part of my 2nd one.

@elhigu
Copy link
Member

elhigu commented Feb 4, 2018

Not to mention that if all query builders would execute on creation the builder pattern queries would trigger before building is done. It would not work at all without having some terminator method (that executes the query).

@timhuff
Copy link
Contributor

timhuff commented Feb 4, 2018

@elhigu I mean... I guess you could just always run it on the next tick, right? I'm not suggesting that would by any means be a good idea but how many queries are actually created and executed on different ticks?

@elhigu
Copy link
Member

elhigu commented Feb 4, 2018

@timhuff I hadn't thought about that. Yeah I think that would be possible too. I find case quite common where one starts building query, then fetch some async data and keep on building more. I don't do that very often though.

@elhigu
Copy link
Member

elhigu commented Feb 6, 2018

@lukewlms that ’execute()’ -like method is called ’.then()’ you can call always it when you like to execute query and get promise. It is just how ’thenable’ works and it is explained in promise spec. It is one important and widely used concept in javascript when dealing with promises and async/await (which are pretty much just glorified shortcuts for Promise.resolve and .then). Also if you are executing queries without handling results you are looking for problems like app crashing.

@elhigu
Copy link
Member

elhigu commented Feb 6, 2018

Actually its better to just follow this PR about the upsert feature implementation #2197 it has already API designed how it should work. In this thread is not really any useful information that is not already mentioned in comments of that PR. If needed (PR is closed and never completed) lets open new issue for this one with additional API description.

@elhigu elhigu closed this as completed Feb 6, 2018
@timhuff
Copy link
Contributor

timhuff commented Feb 6, 2018

@elhigu Thanks for the heads up! I was unaware of that thread. Good to hear we're making progress on an upsert coming to the API. Looks like 6 months ago it failed 1 of the 802 tests and so it never passed travis-ci. Is that 1 failing test case the only thing keeping this from becoming a part of the knex API?

@elhigu
Copy link
Member

elhigu commented Feb 6, 2018

@timhuff there was only initial implementation done, it must be completely rewritten. Most important part of that PR is the common API design, which can be supported by most of the dialects. So the feature comes when someone just decides to implement that API. If no-one else does that and some day I have some extra time or need it badly, I'll do it myself. That is one of the most important feature I would like knex to get (in addition to joins in updates).

@timhuff
Copy link
Contributor

timhuff commented Feb 6, 2018

@elhigu Thanks for filling me in. I'll have to read up on the progress here when I get a little more time.

@ashconnell
Copy link

ashconnell commented Oct 9, 2019

I'm not sure if this helps anyone or if i'm just a noob, but for the solution from @timhuff I had to wrap my constraint in quotes because i was getting a query syntax error.

const contraint = '("a", "b")'

I've come to this issue several times in search of a knex-based Postgres upsert. If anyone else needs this, here's how to do it. I've tested this against both single and composite unique keys.

The Setup

Create a unique key constraint on the table using the below. I needed a composite key constraint:

table.unique(['a', 'b'])

The Function

(edit: updated to use raw parameter bindings)

const upsert = (params)=> {
  const {table, object, constraint} = params;
  const insert = knex(table).insert(object);
  const update = knex.queryBuilder().update(object);
  return knex.raw(`? ON CONFLICT ${constraint} DO ? returning *`, [insert, update]).get('rows').get(0);
};

Usage

const objToUpsert = {a:1, b:2, c:3}

upsert({
	table: 'test',
	object: objToUpsert,
	constraint: '(a, b)',
})

If your constraint isn't composite then, naturally, that one line would just be constraint: '(a)'.

This will return either the updated object or the inserted object.

A note about composite nullable indices

If you have a composite index (a,b) and b is nullable, then values (1, NULL) and (1, NULL) are considered mutually unique by Postgres (I don't get it either). If this is your use case, you'll need make a partial unique index and then test for null before upsert to determine which constraint to use. Here's how to make the partial unique index: CREATE UNIQUE INDEX unique_index_name ON table (a) WHERE b IS NULL. If your test determines that b is null, then you'll need to use this constraint in your upsert: constraint: '(a) WHERE b IS NULL'. If a is also nullable, I would guess you'd need 3 unique indices and 4 if/else branches (though this is not my use case, so I'm not sure).

Here's the compiled javascript.

Hope someone finds this useful. @elhigu Any comment on the usage of knex().update(object)? (edit: nevermind - saw the warning - using knex.queryBuilder() now)

@knex knex deleted a comment from lukewlms Oct 9, 2019
@knex knex deleted a comment from timhuff Oct 9, 2019
@knex knex deleted a comment from timhuff Oct 9, 2019
@knex knex deleted a comment from timhuff Oct 9, 2019
@knex knex deleted a comment from lukewlms Oct 9, 2019
@knex knex deleted a comment from lukewlms Oct 9, 2019
@knex knex deleted a comment from timhuff Oct 9, 2019
@knex knex deleted a comment from BadgerBadgerBadgerBadger Oct 9, 2019
@knex knex deleted a comment from alexfedin Oct 9, 2019
@knex knex deleted a comment from amir-s Oct 9, 2019
@knex knex deleted a comment from absolux Oct 9, 2019
@knex knex deleted a comment from amir-s Oct 9, 2019
@knex knex deleted a comment from lukewlms Oct 9, 2019
@knex knex deleted a comment from timhuff Oct 9, 2019
@elhigu
Copy link
Member

elhigu commented Oct 9, 2019

Removed some unrelated discussions (about how promises/thenables work).

@luke-robertson
Copy link

Did this get added ?

@elhigu
Copy link
Member

elhigu commented Dec 9, 2019

No. There is feature request and spec in #3186

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests