Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do I use Knex with AWS Lambda? #1875

Closed
austingayler opened this issue Jan 20, 2017 · 39 comments
Closed

How do I use Knex with AWS Lambda? #1875

austingayler opened this issue Jan 20, 2017 · 39 comments

Comments

@austingayler
Copy link

I'm running into issues with connection pooling while testing some code. I'm expecting my lambda function to be called maybe several thousand times over a couple of seconds, and having trouble figuring out the best way to connect to my db. Here's a very similar issue for node/postgres: essentially the problem is that I need to be able to get a connection from the pool if one is available, however I can't depend on the pool existing because of how AWS (unreliably) reuses lambda containers.

Basically what I'm looking for is a way to reliably get or create a connection to my db. I have not been able to find any examples (like while(!availableConnections) { tryToGetConnection() }. Do I need to interact with node-pool? How can I do this with Knex?

@elhigu
Copy link
Member

elhigu commented Jan 22, 2017

Knex pooling works only if connections are made from the same node process.

If AWS lambda instances are not running shared node process you just have to create new pool with min / max connections 1 in each lambda instance and hope that your database has good enough settings to allow hundreds of simultaneous connections (in RDS it depends on instance size).

After reading this https://forums.aws.amazon.com/thread.jspa?threadID=216000

Looks like lambda really is sharing some processes, so if you can figure out which is maximum count of lambda containers, that are created you should be able to calculate optimal pool size (each container has their separate pool so total connections to DB is pool.max * max container count).

Anyways there is no need to do manual connection requesting from pool, knex waits for connection automatically and if everything is over in couple of seconds, none of the timeouts will not trigger in that time.

If you are getting error from DB which says that maximum connection count has been reached, then you need to make max pool size smaller.

@elhigu elhigu closed this as completed Jan 22, 2017
@elhigu elhigu reopened this Jan 22, 2017
@elhigu
Copy link
Member

elhigu commented Jan 22, 2017

Sorry I committed in a middle of sentence, my kid just threw like 3 liters of water to floor 🥇 I'll update the comment above in a moment...

@austingayler
Copy link
Author

Thanks for the reply. It sounds like estimating the maximum pool size will be my best bet, even though the number of connections will vary drastically over time.

To be clear, there is no way to close a connection in Knex, correct? Only the ability to destroy a connection pool? The fact that Lambda will sometimes reuse containers kind of throws everything off.

@elhigu
Copy link
Member

elhigu commented Jan 24, 2017

Destroying connection pool destroys also all the connections (gracefully waiting that they complete first) and I suppose that when lambda destroys container, all its open TCP sockets will close implicitly when process dies.

I don't see why one should try to close connections explicitly after each request since it would destroy the benefit of pooling. You would get the same effect by creating pool with size 1 and destroying it afterwards.

You can also configure idle timeout for pool which will automatically close connection if it is not used and is just waiting for action in pool.

@ghost
Copy link

ghost commented Feb 17, 2017

Can I use Knex to send a COPY query to the RedShift cluster, and not wait for the results?

Doing this with pg Pool terminates the query as soon as the end of the Lambda function is reached.

@elhigu
Copy link
Member

elhigu commented Feb 20, 2017

@BardiaAfshin If lambda container is destroyed and all its sockets are freed when lambda function end is reached, in that case also db query will die and might not be finished.

I'm not sure either how postgresql react to client-side connection ending if COPY query will be rolled back due to not finished implicit transaction before reading the result values...

Anyways if the query can be sent or not that way is not up to knex, but it depends on how aws lambda and postgresql work.

@ghost
Copy link

ghost commented Feb 20, 2017

My observation is that the query is killed on RedShift and it is rolled back.

@austingayler
Copy link
Author

Anyways there is no need to do manual connection requesting from pool, knex waits for connection automatically and if everything is over in couple of seconds, none of the timeouts will not trigger in that time.

I'm running a db load testing script and this doesn't seem to be true. Anything higher than like 30 simultaneous connections immediately times out, rather than waiting for an open connection.

@Whoaa512
Copy link

Is there a way to manually release the currently used connection once a query is done?

@kurtzilla
Copy link

kurtzilla commented Jun 3, 2017

Does anyone have example code they could share for those of us just getting into AWS Lambda? I am hoping that someone could share patterns and/or anti-patterns of knex/postgres/lambda.

Here is what I am using now - I am certain it can be improved, but hoping for some little bit of vindication as to whether or not I am on the correct path...

'use strict';
var pg = require('pg');

function initKnex(){
  return require('knex')({
      client: 'pg',
      connection: { ...details... }
  });
}

module.exports.hello = (event, context) =>
{
  var knex = initKnex();
  
  // Should I be returning knex here or in the final catch?
  knex
  .select('*')
  .from('my_table')
  .then(function (rows) {
    context.succeed('Succeeded: ' + JSON.stringify(rows || []));
  })
  .catch(function (error) {
    context.fail(error);
  })
  .then(function(){
    // is destroy overkill? - is there an option for knex.client.release, etc?
    knex.destroy();
  })
}

@austingayler
Copy link
Author

I'm in the same boat--still have not figured out what the best way is despite lots of googling around and testing. What I'm doing right now is:

const dbConfig = require('./db');
const knex = require('knex')(dbConfig);

exports.handler = function (event, context, callback) {
...
connection = {..., pool: { min: 1, max: 1 },

This way the connection (max 1 per container) will stay alive so the container can be reused easily. I don't destroy my connection at the end.

http://blog.rowanudell.com/database-connections-in-lambda/

Not sure if this is the best way but it's worked for me so far.

/shrug

@kurtzilla
Copy link

@austingayler

I am not exactly sure what happens when const knex is declared - is this where the initial connection is setup? Can someone clarify? (I am assuming you have connection info in your dbConfig)

In your code, isn't the connection itself being overwritten and re-created everytime the handler is called?

@hassankhan
Copy link

hassankhan commented Jun 20, 2017

Just chiming in a someone who's in the same boat, I've not had much luck trying to figure out containers vs pool size (as per @elhigu's suggestion). My solution has been to destroy the pool after every connection (I know it's not optimal 😒):

const knex = require('knex');

const client = knex(dbConfig);

client(tableName).select('*')
  .then((result) => { 
    
    return Promise.all([
      result,
      client.destroy(),
    ])  
  })
  .then(([ result ]) => {
    
    return result;
  });

@mooyoul
Copy link

mooyoul commented Aug 8, 2017

TL;DR: Simply set context.callbackWaitsForEmptyEventLoop = false before calling callback.

AWS Lambda waits for empty event loop (by default). so function could throw Timeout error even callback executed.

Please see below links for details:
apex/apex@1fe6e91

To @elhigu @tgriesser : This is not an knex issue. This is Definitely issue of Lambda environment. I think tag this issue to question and should be good to close :)

@elhigu
Copy link
Member

elhigu commented Aug 15, 2017

@mooyoul yep, definately not knex issue, but maybe documentation issue... though I think I dont want any aws lambda specific stuff to knex docs, so closing.

@jlancelot2007
Copy link

Please look at this link
https://stackoverflow.com/questions/49347210/why-aws-lambda-keeps-timing-out-when-using-knex-js
You need to close the db connection otherwise Lambda runs until it times out.

@jamesdixon
Copy link

Has anyone found a pattern that works flawlessly for using Knex with Lambda?

@kibertoad
Copy link
Collaborator

Do you have any other issues when you close connection properly?

@jamesdixon
Copy link

The thing I'm trying to avoid is closing the connection and making it available across Lambda calls.

@kibertoad
Copy link
Collaborator

Are you sure Lambda allows maintaining state between calls?

@kibertoad
Copy link
Collaborator

Check out the node.js part of https://scalegrid.io/blog/how-to-use-mongodb-connection-pooling-on-aws-lambda/ it suggests a solution to hanging up on non-empty event loop.

@mmarvick
Copy link

I don't like bumping the thread like this, but since I think it's getting lost in the comments, @mooyoul's answer above worked great for us.

If you don't turn that flag to false, Lambda waits for the event loop to be empty. If you do, then the function finishes execution as soon as you call the callback.

@hongbo-miao
Copy link

hongbo-miao commented May 29, 2019

For me, it worked on my local machine but not after deploying. I was kind of be mislead.

It turns out the RDS inbound source is not open to my Lambda function. Found solution at Stack Overflow: either changing RDS inbound source to 0.0.0.0/0 or use VPC.

After updating RDS inbound source, I can Lambda with Knex successfully.

The Lambda runtime I am using is Node.js 8.10 with packages:

knex: 0.17.0
pg: 7.11.0

The code below using async also just works

const Knex = require('knex');

const pg = Knex({ ... });

module.exports. submitForm = async (event) => {
  const {
    fields,
  } = event['body-json'] || {};

  return pg('surveys')
    .insert(fields)
    .then(() => {
      return {
        status: 200
      };
    })
    .catch(err => {
      return {
        status: 500
      };
    });
};

Hopefully it will help people who might meet same issue in future.

@disbelief
Copy link

Something I'd like to draw people's attention to is the serverless-mysql package, which wraps the standard mysql driver but handles a lot of the lambda-specific pain points around connection pool management that are described in this thread.

However I don't think Knex will work with serverless-mysql at this time (according to this issue) since there's no way to swap in a different driver. There could also be incompatibilities since serverless-mysql uses promises instead of callbacks.

The best approach is probably to add a new client implementation in Knex. I'd be happy to give this a shot, but would love someone more familiar with Knex to tell me if they think it's reasonable/doable?

@disbelief
Copy link

disbelief commented May 31, 2019

Also, was thinking in the meantime, I could use Knex to build but not execute MySQL queries. So just call toSQL() and pass the output to serverless-mysql to execute.

What I'm wondering though is if Knex can be configured without any db connections? There's no sense opening a connection that's never used.

Would the following work?

connection = {..., pool: { min: 0, max: 0 ) },

@elhigu
Copy link
Member

elhigu commented May 31, 2019

@disbelief You can initialize knex without connection. There is an example how to do it in the end of this secion in docs https://knexjs.org/#Installation-client

const knex = require('knex')({client: 'mysql'});

const generatedQuery = knex('table').where('id',1).toSQL().toNative();

@disbelief
Copy link

@elhigu ah cool, thanks. Will give that a shot in the interim.

@disbelief
Copy link

Update: the above doesn't appear to work. Knex throws an error if it can't establish a db connection, even if there's never any call to execute a query.

@fdecampredon
Copy link

@disbelief did you find a solution for that ?

@disbelief
Copy link

@fdecampredon nope. At the moment I'm simply building queries with squel, calling toString() on them, and passing them to the serverless-mysql client.

@Whoaa512
Copy link

Whoaa512 commented Jul 9, 2019

@disbelief It's surprising to me that building the queries failed without a db connection.

Would you mind posting the code & config you're using to build the knex client? Also the knex version.

The following works fine for me with
Node v8.11.1
mysql: 2.13.0
knex: 0.18.3

const k = require('knex')

const client = k({ client: 'mysql' })

console.log('Knex version:', require('knex/package.json').version)
// => 0.18.3
console.log('sql:', client('table').where('id',1).toSQL().toNative())
// => { sql: 'select * from `table` where `id` = ?', bindings: [ 1 ] }

@39otrebla
Copy link

39otrebla commented Sep 4, 2019

As per how Lambda's resources recycling works, Knex instance should always be kept outside any function or class. Also, it's important to have max 1 in connection in the pool.

Something like:

const Knex = require('knex');
let instance = null;

module.exports = class DatabaseManager {
  constructor({ host, user, password, database, port = 3306, client = 'mysql', pool = { min: 1, max: 1 }}) {
    this._client = client;
    this._poolOptions = pool;
    this._connectionOptions = {
      host: DB_HOST || host,
      port: DB_PORT || port,
      user: DB_USER || user,
      password: DB_PASSWORD || password,
      database: DB_NAME || database,
    };
  }

  init() {
    if (instance !== null) {
      return;
    }

    instance = Knex({
      client: this._client,
      pool: this._poolOptions,
      connection: this._connectionOptions,
      debug: process.env.DEBUG_DB == true,
      asyncStackTraces: process.env.DEBUG_DB == true,
    });
  }

  get instance() {
    return instance;
  }
}

In this way, you'll take the most out of Lambda's recycling, i.e. each activated (freezed) container will hold only the same connection.

As a side note, keep in mind that Lambda scales out by default if you don't limit the number of concurrent containers.

@jamesdixon
Copy link

I've had Knex running in Lambda for almost a year now with no problems. I'm declaring my Knex instance outside of my Lambda function and utilizing context.callbackWaitsForEmptyEventLoop = false as mentioned in other posts.

That said, over the past day, something seems to have changed on the Lambda side as I'm now seeing a huge connection spike in Postgres; connections don't seem to be closed.

Has anyone else using the aforementioned approach seen any chances over the past day or so?

@ImreC
Copy link

ImreC commented Jan 6, 2021

@jamesdixon just reading this now while refactoring some of our knex implementations on lambda. Any updates on this? Has context.callbackWaitsForEmptyEventLoop = false stopped working?

@MatissJanis
Copy link
Contributor

serverless-mysql is a great package for managing MySQL connections in the AWS Lambda context. However, it was not playing nicely together with knex, hence I built a simple dialect to solve the problem: https://github.com/MatissJanis/knex-serverless-mysql

This is another way to solve the same problem.

const Knex = require('knex');
const knexServerlessMysql = require('knex-serverless-mysql');

const mysql = require('serverless-mysql')({
  config: {
    host     : process.env.DB_HOST,
    database : process.env.DB_DATABASE,
    user     : process.env.DB_USERNAME,
    password : process.env.DB_PASSWORD,
  },
});

const knex = Knex({
  client: knexServerlessMysql,
  mysql,
});

exports.run = function () {
  return knex('table_name').where('id', 1);
}

@kibertoad
Copy link
Collaborator

@MatissJanis Would you be open to submitting a PR adding your dialect to https://github.com/knex/knex/blob/master/ECOSYSTEM.md?

@MatissJanis
Copy link
Contributor

@kibertoad sure, here you go: #4454

@awm086
Copy link

awm086 commented Oct 6, 2022

does anyone know how to run knex migration on serverless setup? In my case I am using aurora postgres but the database tables don't exsist because I am not sure how to run knex migrate. Can the script be called from the lambda handler?

@nimish-kumar
Copy link

@awm086 yes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests