Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extremely slow test on ci runner #98

Closed
nulian opened this issue Jan 17, 2020 · 21 comments
Closed

Extremely slow test on ci runner #98

nulian opened this issue Jan 17, 2020 · 21 comments

Comments

@nulian
Copy link

nulian commented Jan 17, 2020

I'm trying to upgrade ecto and to myxql but for some reason on the ci runner the tests are extremely slow like 10x slower then normal.

Running test locally they perform fine with the ecto upgrade.

All the queries have around 40 ms delay like below. So originally I thought it's maybe because of the CI server not having the unix socket and it takes myxql some time to fallback. But even when setting protocol in test config for ecto on :tcp it still has the around 40 ms delay on everything.
QUERY OK source="applications" db=39.9ms
QUERY OK db=0.4ms queue=40.3ms

It's a kubernetes gitlab ci runner with mysql running a connected service.
With like the following ecto config

config :synapse, Synapse.Repo,
  username: "root",
  password: "",
  database: "synapse_test",
  hostname: "127.0.0.1",
  pool: Ecto.Adapters.SQL.Sandbox,
  charset: "utf8",
  collation: "utf8_general_ci",
  protocol: :tcp,
  prepare: :unnamed
@wojtekmach
Copy link
Member

Could you paste the exact version numbers of ecto_sql and myxql you tried?

@nulian
Copy link
Author

nulian commented Jan 17, 2020

ecto 3.3.1
ecto_sql 3.3.2
db_connection 2.2.0

@wojtekmach
Copy link
Member

what's your myxql version?

@nulian
Copy link
Author

nulian commented Jan 17, 2020

0.3.1

@deepfryed
Copy link

@nulian do they run faster if you use named prepared statements ? We're investigating a similar issue with 0.3.2 where tests are 4-5x slower with unnamed prepare.

@nulian
Copy link
Author

nulian commented Feb 24, 2020

It does seem to fix test performance removing the unnamed statement flag.
So I could do that to improve test performance. A bit weird that on my local machine I didn't have the big penalty but the test server did.
Running it again with the unnamed flag enabled to be sure nothing else changed in the month.

@nulian
Copy link
Author

nulian commented Feb 24, 2020

Retested them. The myxql without unnamed option performance I would be overall satisfied with.

Mariaex test run: 31.5s
Myxql without unnamed option: 42.7s
Myxql with unnamed option: 398.2s

@melpon
Copy link

melpon commented Jul 19, 2020

I encountered the same problem, but it was solved by setting nodelay: true.

config :synapse, Synapse.Repo,
  ...
  prepare: :unnamed,
  socket_options: [nodelay: true]

However, I didn't know why this option isn't needed in mariaex.

@lucas-nelson
Copy link

lucas-nelson commented Jul 21, 2020

Thank you @melpon! That change got our test execution time back down to where it should be.

@0xAX
Copy link
Contributor

0xAX commented Oct 8, 2021

Hello @josevalim @wojtekmach. Maybe not related to the original problem (let me know if this should go to the separate issue) but I've noticed performance slow-down after 7a5363a. That commit leads to the case when:

Repo.insert(%Model{field1: "value1", field2: nil})
Repo.insert(%Model{field1: "value1", field2: "value2"})
Repo.insert(%Model{field1: "value1", field2: nil})

Will create 3 prepared statements. After first query - first prepared statement will be created and executed - as expected. After the second query the first prepared statement will be closed and second will be created. And after the third query - second prepared statement will be closed and new one created again.

Obviously that may significantly hurt perfomance. Especially if an application often executes insert/update for the same model. Tests of my application are executed almost two times slower after update of myxql. That commit was introduced in context of insert_all and it is kind of clear why to remember only last one insert_all, but why it was done in this way for insert/update (and seems even for select + limit for example)?

I am not really familar with postgex code, but it seems it does not do such things https://github.com/elixir-ecto/postgrex/blob/master/lib/postgrex/protocol.ex#L3363

@josevalim
Copy link
Member

@0xAX the commit above should only reprepare the query if they are different. I would expect the three Repo.insert above to be the exact same statement. Can you check if that's the case or not for you?

I am not really familar with postgex code, but it seems it does not do such things https://github.com/elixir-ecto/postgrex/blob/master/lib/postgrex/protocol.ex#L3363

MySQL unfortunately has a limit on the number of prepared statements, which means we need to be more aggressive to purge them.

@0xAX
Copy link
Contributor

0xAX commented Oct 8, 2021

@josevalim Yes, I checked and I see (in wireshark) that prepared statement is closed everytime after new such insert

@0xAX
Copy link
Contributor

0xAX commented Oct 8, 2021

re-checked just for the case:

I've set pool_size of my Repo configuration to 1 to be sure that everything is related to one connection and executed:

App.Repo.insert(%MyModel{name: "test_1", field_id: nil})
App.Repo.insert(%MyModel{name: "test_2", field_id: "test"})
App.Repo.insert(%MyModel{name: "test_3", field_id: nil})

and I see following queries were sent to my db:


myxql-prepared-statements

is it expected behaviour?

MySQL unfortunately has a limit on the number of prepared statements, which means we need to be more aggressive to purge them.

Thanks for explanation. But isn't it too agressive?

@josevalim
Copy link
Member

Thanks for explanation. But isn't it too agressive?

Oh to be clear, it shouldn't be closing if the query (i.e. the SQL) is the same. So my question is if the statements that are logged are the same or if they have small differences between them.

@0xAX
Copy link
Contributor

0xAX commented Oct 8, 2021

That is exactly the problem. They are different. First and third do not contain placeholder for nil/null value, so they look like:

INSERT INTO `my_model` (`name`,`inserted_at`,`updated_at`) VALUES (?,?,?)

and the second one:

INSERT INTO `my_model` (`name`,`field_id`,`inserted_at`,`updated_at`) VALUES (?,?,?,?)

I've found a ticket that describes that nil values will be skipped in insert since ecto 2.0 and it is kind of clear why. But combination of these things makes myxql very slow for such things.

@josevalim
Copy link
Member

I see. So it is nothing to change here. The issue is that we are using the same name for different queries. If you change this line:

https://github.com/elixir-ecto/ecto_sql/blob/master/lib/ecto/adapters/myxql.ex#L259

To be:

opts = [{:cache_statement, "ecto_insert_#{source}_#{length(fields)}"} | opts]

It should introduce another degree of caching. Another option is to emit all fields and have the value be DEFAULT, this would keep the query the same.

@josevalim
Copy link
Member

Another option is to emit all fields and have the value be DEFAULT, this would keep the query the same.

Nevermind, I believe we can't pass DEFAULT as parameter. :(

@0xAX
Copy link
Contributor

0xAX commented Oct 8, 2021

@josevalim Yes, thank you for this, I saw cache_statement in ecto_sql but didn't think about touch it.

So I will prepare PR for ecto_sql if you don't have any objections.

But before I have some additional question/suggestions about cache_statement:

  • I would suggest to do the same for update and other operations in https://github.com/elixir-ecto/ecto_sql/blob/master/lib/ecto/adapters/sql.ex#L792? Something like you've suggested - cache_statement = "ecto_#{operation}_#{source}_#{length(params)}" (and for insert_all as well) to improve performance of update requests that are similar to the insert case described above. What do you think?
  • Second possible variant - what do you think about adding ability to pass own cache_statement (even for insert_all case)? Actually we can do it now but it will be ignored right now. Passing own cache_statement in some cases will provide us control on prepared statement cache. That is pretty useful in context of performance, for example for insert_all if you usually have batches of the same size no need to go though the full create-close-create cycle everytime

@josevalim
Copy link
Member

Agreed on both notes. Thanks for the convo and I am looking forward to your PR!

@nulian
Copy link
Author

nulian commented Aug 29, 2022

Weird I had that same delay issue trigger on my linux dev environment after installing updates. Which I could solve with the nodelay: true.
No clue why my dev environment suddenly needed that option because before it was fast.

@josevalim
Copy link
Member

Closing this. It is stale and it may even have been fixed in Ecto (it has been a while, so I don’t fully remember).

@josevalim josevalim closed this as not planned Won't fix, can't repro, duplicate, stale May 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants