Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible Memory Leak in exec #191

Closed
haffla opened this issue Oct 17, 2019 · 5 comments
Closed

Possible Memory Leak in exec #191

haffla opened this issue Oct 17, 2019 · 5 comments

Comments

@haffla
Copy link

haffla commented Oct 17, 2019

I am currently writing a crystal script that performs an ETL process. I read data from one table in a MySQL database, perform some transformations and write the result into a Postgres database. The workload is quite big (9 million rows). I stream results from the source db, collect them in an array and after every 500 rows I insert them into the target DB.

# simplified code...

BATCH_SIZE = 500
results = Array(Array(T)).new(BATCH_SIZE) { Array(T).new }
# T is some union type
index = 0

DB.open "mysql://root:root@localhost:3306/some_db" do |db|
  DB.open "postgres://postgres:postgres@localhost:5432/some_db" do |pg_db|
    db.query "select one, two from some_table" do |rs|
      the_values = .... # read values from rs using rs.read(T)
      # perform some transformations
      results[index] = the_values
      if index == (BATCH_SIZE - 1)
        # something's leaking here
        pg_db.exec "insert into some_table (one, two) values #{generate_sql_from(values)}"
        index = 0
      else
        index += 1
      end
    end
  end
end

So the problem now is that the above works but over time memory consumption grows and grows and grows, on my machine finishing with like 600 MB. So initially I thought maybe it's my code or maybe it's just Crystal (I am new to Crystal). I realise that I am creating a lot of huge strings and all but when I just skip the insertion into the Postgres DB (and still generate all that SQL) this runs with a very very low and stable memory profile.

I am opening the issue in crystal-pg and not in crystal-db because I also tested the same script with two MySQL connections, transferring data from one MySQL db into another and there memory consumption is quite stable at around 180 MB.

@asterite
Copy link
Contributor

Interesting!

Could you provide reproducible code? A snippet that we can copy, paste and run on our machines that show the problem. That will make it so much easier to debug it and solve it. Otherwise we have no idea where to start.

@straight-shoota
Copy link
Contributor

straight-shoota commented Oct 17, 2019

Reproducible code:

require "pg"

def generate_values(i)
  String.build do |io|
    500.times do |j|
      io << ',' unless j == 0
      io << "(#{i}, #{j})"
    end
  end
end

DB.open ENV["DATABASE_URL"] do |db|
  db.exec "CREATE TABLE foo (a int, b int)"
  1000.times do |i|
    puts "Iteration #{i}"
    db.exec "INSERT INTO foo (a, b) VALUES #{generate_values(i)}"
    p! GC.stats
  end
  db.exec "DROP TABLE foo"
end

Heap seems to grow by about 140K per iteration.

@asterite
Copy link
Contributor

This is a problem in crystal-db: the string passed to exec is never released. Apparently crystal-db caches query strings but never releases them. So if you keep passing different strings you'll get more and more memory consumed.

This problem is solved by using parameter placeholders, like "$1". But I still think this is an issue in crystal-db: there should be a pool of prepared statements, not an unbounded cache.

Please someone open this issue in crystal-db (I don't have more time right now, sorry).

@asterite
Copy link
Contributor

Actually: crystal-lang/crystal-db#114

@bcardiff
Copy link
Collaborator

Answered in crystal-db we can keep the discussion there. I agree is something that belongs in crystal-db.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants