Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The prod 'verta-backend-0' becomes failed after a few hours #1539

Open
tatianguiqu opened this issue Sep 23, 2020 · 6 comments
Open

The prod 'verta-backend-0' becomes failed after a few hours #1539

tatianguiqu opened this issue Sep 23, 2020 · 6 comments
Assignees
Labels

Comments

@tatianguiqu
Copy link

The prod works well at first, but it always breaks down after a few hours. Then it t starts printing logs indefinitely:
{"thread":"pool-3-thread-1","level":"WARN","loggerName":"ai.verta.modeldb.utils.ModelDBHibernateUtil","message":"ModelDBHibernateUtil checkDBConnection() got error ","thrown":{"commonElementCount":0 │ {"thread":"pool-4-thread-1","level":"DEBUG","loggerName":"ai.verta.modeldb.utils.ModelDBHibernateUtil","message":"ModelDBHibernateUtil getSessionFactory() retrying for DB connection after 2560 milli │ {"thread":"pool-4-thread-1","level":"WARN","loggerName":"ai.verta.modeldb.utils.ModelDBHibernateUtil","message":"ModelDBHibernateUtil checkDBConnection() got error ","thrown":{"commonElementCount":0 │ {"thread":"pool-3-thread-1","level":"DEBUG","loggerName":"ai.verta.modeldb.utils.ModelDBHibernateUtil","message":"ModelDBHibernateUtil getSessionFactory() retrying for DB connection after 2560 milli │ {"thread":"pool-4-thread-1","level":"DEBUG","loggerName":"ai.verta.modeldb.utils.ModelDBHibernateUtil","message":"ModelDBHibernateUtil getSessionFactory() retrying for DB connection after 2560 milli │ {"thread":"pool-3-thread-1","level":"WARN","loggerName":"ai.verta.modeldb.utils.ModelDBHibernateUtil","message":"ModelDBHibernateUtil checkDBConnection() got error ","thrown":{"commonElementCount":0 │ {"thread":"pool-4-thread-1","level":"WARN","loggerName":"ai.verta.modeldb.utils.ModelDBHibernateUtil","message":"ModelDBHibernateUtil checkDBConnection() got error ","thrown":{"commonElementCount":0 │ {"thread":"pool-4-thread-1","level":"DEBUG","loggerName":"ai.verta.modeldb.utils.ModelDBHibernateUtil","message":"ModelDBHibernateUtil getSessionFactory() retrying for DB connection after 2560 milli │ {"thread":"pool-3-thread-1","level":"DEBUG","loggerName":"ai.verta.modeldb.utils.ModelDBHibernateUtil","message":"ModelDBHibernateUtil getSessionFactory() retrying for DB connection after 2560 milli

This error seems to be caused by this call path: ai.verta.modeldb.utils.ModelDBHibernateUtil.checkDBConnectionInLoop()-> checkDBConnection()-> getDBConnection()->Class.forName(rDBDriver)
Because enableTrace is not enabled, RdbDriver is configured as "org.postgresql.Driver".
It's confusing to make mistakes there after a few hours.

@tatianguiqu
Copy link
Author

Our configuration follows this repository. Now, most of the time it's working well, but this error will occur in a few hours or in a day.

@ravishetye
Copy link

@tatianguiqu : it is trying to establish connection with postgres but can not establish one.

  1. Could you double check that the DB is working fine when this happens.
  2. Could you check postgres logs if there are any errors reported there.

@tatianguiqu
Copy link
Author

@ravishetye When I wanted to observe the log and database when bugs happened, it didn't make any mistakes for more than four days. I'm still waiting, but there was always an exception in one day before. It's confusing.

@tatianguiqu
Copy link
Author

The prod 'verta-backend-0' kept running for 17 days without becoming failed. It is so confused. But there were 500 server errors in deploying the service today. After I restarted the prod, the problem was solved. The error log of prod 'verta-backend-0' is following:
{"thread":"pool-4-thread-1","level":"WARN","loggerName":"ai.verta.modeldb.utils.ModelDBHibernateUtil","message":"ModelDBHibernateUtil checkDBConnection() got error ","thrown":{"commonElementCount":0,"localizedMessage":"FATAL: sorry, too many clients already","message":"FATAL: sorry, too many clients already","name":"org.postgresql.util.PSQLException","extendedStackTrace":"org.postgresql.util.PSQLException: FATAL: sorry, too many clients already\n\tat org.postgresql.core.v3.ConnectionFactoryImpl.doAuthentication(ConnectionFactoryImpl.java:520) ~[postgresql-42.2.6.jar!/:42.2.6]\n\tat org.postgresql.core.v3.ConnectionFactoryImpl.tryConnect(ConnectionFactoryImpl.java:141) ~[postgresql-42.2.6.jar!/:42.2.6]\n\tat org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:192) ~[postgresql-42.2.6.jar!/:42.2.6]\n\tat org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:49) ~[postgresql-42.2.6.jar!/:42.2.6]\n\tat org.postgresql.jdbc.PgConnection.<init>(PgConnection.java:195) ~[postgresql-42.2.6.jar!/:42.2.6]\n\tat org.postgresql.Driver.makeConnection(Driver.java:458) ~[postgresql-42.2.6.jar!/:42.2.6]\n\tat org.postgresql.Driver.connect(Driver.java:260) ~[postgresql-42.2.6.jar!/:42.2.6]\n\tat java.sql.DriverManager.getConnection(DriverManager.java:677) ~[java.sql:?]\n\tat java.sql.DriverManager.getConnection(DriverManager.java:228) ~[java.sql:?]\n\tat ai.verta.modeldb.utils.ModelDBHibernateUtil.checkDBConnection(ModelDBHibernateUtil.java:483) [classes!/:1.0-SNAPSHOT]\n\tat ai.verta.modeldb.utils.ModelDBHibernateUtil.checkDBConnectionInLoop(ModelDBHibernateUtil.java:325) [classes!/:1.0-SNAPSHOT]\n\tat ai.verta.modeldb.utils.ModelDBHibernateUtil.loopBack(ModelDBHibernateUtil.java:294) [classes!/:1.0-SNAPSHOT]\n\tat ai.verta.modeldb.utils.ModelDBHibernateUtil.createOrGetSessionFactory(ModelDBHibernateUtil.java:271) [classes!/:1.0-SNAPSHOT]\n\tat ai.verta.modeldb.utils.ModelDBHibernateUtil.getSessionFactory(ModelDBHibernateUtil.java:277) [classes!/:1.0-SNAPSHOT]\n\tat ai.verta.modeldb.cron_jobs.ParentTimestampUpdateCron.run(ParentTimestampUpdateCron.java:25) [classes!/:1.0-SNAPSHOT]\n\tat java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]\n\tat java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) [?:?]\n\tat java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) [?:?]\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]\n\tat java.lang.Thread.run(Thread.java:832) [?:?]\n"},"endOfBatch":false,"loggerFqcn":"org.apache.logging.log4j.spi.AbstractLogger","instant":{"epochSecond":1602301066,"nanoOfSecond":947000000},"threadId":40,"threadPriority":5,"hostName":"verta-backend-0","kubernetes.podIP":""} {"thread":"pool-4-thread-1","level":"DEBUG","loggerName":"ai.verta.modeldb.utils.ModelDBHibernateUtil","message":"ModelDBHibernateUtil getSessionFactory() retrying for DB connection after 40 millisecond ","endOfBatch":false,"loggerFqcn":"org.apache.logging.log4j.spi.AbstractLogger","instant":{"epochSecond":1602301066,"nanoOfSecond":987000000},"threadId":40,"threadPriority":5,"hostName":"verta-backend-0","kubernetes.podIP":""}
@ravishetye

@tatianguiqu
Copy link
Author

The log says 'sorry, too many clients already'. Is it caused by too many open hibernate sessions?

@TaylorHere
Copy link

I face the same case where my deploy version is 1.0.8.1
in my case, the backend restarts in days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants