Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize #add_referenced #238

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

pkmiec
Copy link

@pkmiec pkmiec commented Feb 2, 2024

Summary

The #add_referenced kept track of existing object with a hash when the keys were the objects. This seemed to have been ok with Ruby 2.7, but became significantly slower in Ruby 3.2 (and possibly earlier).

Profiling showed that many of those objects are instances of Hash and Ruby uses #eql? method to compare Hash keys. It is not clear what actually changed in Ruby, but we can work around the issue by using #hash so that the key is an integer. We just need to recompute that integer if the object changes.

Performance

The benchmark was done with the following script,

require 'benchmark'
require 'combine_pdf'

puts "Ruby: #{RUBY_VERSION}"
puts "CombinePDF: #{CombinePDF::VERSION}"

files = []
68.times { |i| files << "/tmp/pdfs/#{i}.pdf" }
files = files * 10 # to exacerbate the effect

result_pdf = CombinePDF.new
files.each { |file| result_pdf << CombinePDF.parse(IO.read(file)) }
puts(Benchmark.measure do
    result_pdf.save('/tmp/combined.pdf')
end)

FYI ... we end up with 32053 pdf objects in @objects array.

Before

Ruby: 2.7.7
CombinePDF: 1.0.26
2.598427 0.011980 2.610407 ( 2.617881)

Ruby: 3.2.2
CombinePDF: 1.0.26
15.067833 0.026986 15.094819 ( 15.139298)

After

Ruby: 2.7.7
CombinePDF: 1.0.26 (with this PR)
2.768545 0.006937 2.775482 ( 2.786386)

Ruby: 3.2.2
CombinePDF: 1.0.26 (with this PR)
1.997242 0.016295 2.013537 ( 2.021782)

The #add_referenced kept track of existing object with a hash when the keys
were the objects. This seemed to have been ok with Ruby 2.7, but became
significantly slower in Ruby 3.2.

Profiling showed that many of those objects are instances of Hash and Ruby
uses #eql? method to compare Hash keys. It is not clear what acutally changed
in Ruby, but we can work around the issue by using #hash so that the key
is an integer. We just need to recompute that integer if the object changes.

Co-authored-by: Jeremy Kirchhoff <Jeremy.Kirchhoff@appfolio.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant