Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update cache checksums to decrease string allocations #7637

Merged
merged 1 commit into from May 24, 2024

Conversation

jacklynhma
Copy link
Contributor

@jacklynhma jacklynhma commented May 8, 2024

What was the end-user or developer problem that led to this PR?

Reduce the memory allocation to the creation of new strings with split method

What is your fix for the problem, implemented in this PR?

Rather than creating a new string, indicate the index of the first and last string.
Below are the updated results from a small sample side:

Screenshot 2024-05-08 at 2 12 02 PM Screenshot 2024-05-08 at 2 12 19 PM

Make sure the following tasks are checked

Copy link

welcome bot commented May 8, 2024

Thanks for opening a pull request and helping make RubyGems and Bundler better! Someone from the RubyGems team will take a look at your pull request shortly and leave any feedback. Please make sure that your pull request has tests for any changes or added functionality.

We use GitHub Actions to test and make sure your change works functionally and uses acceptable conventions, you can review the current progress of GitHub Actions in the PR status window below.

If you have any questions or concerns that you wish to ask, feel free to leave a comment in this PR or join our #rubygems or #bundler channel on Slack.

For more information about contributing to the RubyGems project feel free to review our CONTRIBUTING guide

checksum_start = line.index(" ", name_end + 1) + 1

name = line[0, name_end]
checksums[name] = line[checksum_start..-1]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
checksums[name] = line[checksum_start..-1]
checksums[name] = line[checksum_start, line.size - checksum_start]

will avoid allocating a Range object

@jacklynhma jacklynhma marked this pull request as ready for review May 8, 2024 18:31
@deivid-rodriguez
Copy link
Member

Is the memory gain worth the new code? It seems much simpler before. What's in newmemtest.rb? Should we check the improvements on some standard bundle install runs?

@jacklynhma
Copy link
Contributor Author

Is the memory gain worth the new code? It seems much simpler before. What's in newmemtest.rb? Should we check the improvements on some standard bundle install runs?

@deivid-rodriguez I looked into the specs folder and I could try to see the results of the time bundle install before and after my code and post screenshots. Is there a performance example test that I could follow?

@jacklynhma
Copy link
Contributor Author

jacklynhma commented May 9, 2024

newmemtest.rb is

# frozen_string_literal: true

require "rubygems"
require "bundler/inline"

gemfile do
  source "https://rubygems.org/"
  gem "memory_profiler"
end

# setup code that should not be measured
versions = <<'END'
AXTyper 0.7.0,0.7.1,0.7.2,0.7.3,0.7.4,0.8.0 851cf0fe1259a64bc726d8c1325b6206
A_123 1.2.0 9e32f1fdf2e04c3e212ac1658021b901
AbsoluteRenamer 0.9.0,0.9.0.1,0.9.0.2,0.9.1,0.9.2,0.10.1,0.10.0,1.0.0,1.0.1,1.0.2,1.0.3,1.0.4,1.1.0,1.1.1,1.1.2 63fc20615022163547b539fd0063d5dd
AbsoluteRenamer-date 0.1.0 dabbd4aded502e9b93e39b77c2ac7e64
AbsoluteRenamer-system 0.1.0 58ab9b966c60ee568ad9abdc21f0053c
END

versions_array = versions.split("\n")


def new_version(versions)
    versions.each_with_object({}) do |line, checksums|
        # allows slicing into the string to not allocate a copy
        # of the line
        line.freeze
        name_end = line.index(" ")
        checksum_start = line.index(" ", name_end + 1) + 1
        checksum_end = line.size - checksum_start

        # freeze name since it is used as a hash key, pre-freezing
        # means a frozen copy isn't created
        name = line[0, name_end].freeze
        checksum = line[checksum_start, checksum_end]
        checksums[name] = checksum
    end
end


# end...


puts "new code"
MemoryProfiler.report do
    new_version(versions_array)
end.pretty_print(scale_bytes: true)

puts new_version(versions_array).inspect

@deivid-rodriguez
Copy link
Member

Ah I see, it's just the new and old piece of code with small versions file fragment 👍.

We don't have any example performance tests at the moment, I was thinking of using ruby-memory-profiler bundle install before and after (not sure if that specific command works, but you get the idea, measure the total memory used by running Bundler).

@jacklynhma
Copy link
Contributor Author

jacklynhma commented May 14, 2024

@deivid-rodriguez

Summary: The new code reduces allocations and is faster. 🎆

Thanks @segiddins for sharing your performance script! I made some modifications to have it work on my machine.

Below is the test code:
jacklynhma/rubygems@update-cache...jacklynhma:rubygems:test-code-for-update-cache

Below are screenshots of the third run:

Screenshot 2024-05-14 at 4 30 07 PM Screenshot 2024-05-14 at 4 30 23 PM

@jacklynhma
Copy link
Contributor Author

During the first run the speed was the same but the new code still reduces the allocations

Screenshot 2024-05-14 at 6 08 20 PM Screenshot 2024-05-14 at 6 09 49 PM

The second run showed the same results as the third. The new code reduces allocations and is faster. 🎆

Screenshot 2024-05-14 at 6 12 26 PM Screenshot 2024-05-14 at 6 13 03 PM

@simi
Copy link
Member

simi commented May 15, 2024

Is there any real benefit like on common sized Gemfile (you can check on https://github.com/rubygems/rubygems.org/blob/master/Gemfile)? The code seems hard to read and is more like "obfuscated". Wouldn't be possible to submit this kind of performance optimisation to upstream or at least hide in some kind of method like "optimized_string_split"?

@simi simi self-requested a review May 15, 2024 23:36
@deivid-rodriguez
Copy link
Member

deivid-rodriguez commented May 16, 2024

Thank you! Those performance and memory improvements seem nice :)

I totally agree with @simi though that it makes the code harder to read and understand. I fully agree on evaluating whether some of the benefits can be upstreamed in some way to String#split, and regardless of that, trying to at least isolate them to a utility method that keeps the main code similar to what it is today and explains why we're not using the built in String method.

Also, could we make sure the performance script before this change uses Bundler 2.5.9, or master? The presence of "Using..." output makes me think it's a quite out of date version of Bundler, so improvements already merged and released could be mistakenly getting tagged as improvements provided by this PR.

@jacklynhma
Copy link
Contributor Author

@deivid-rodriguez you were right!! I was comparing it against an older bundler. 😄 opps

With the updated gemfile (https://github.com/rubygems/rubygems.org/blob/master/Gemfile), I reran the scripts on master (4814e41ea93) and my branch and verified they were using the latest bundlers
image

The results showed that the build isn't faster, but it does reduce objects.

Below are the results:
Screenshot 2024-05-16 at 5 21 53 PM

Given these results, if the team still finds this code change helpful, I can move it to a utility folder

@deivid-rodriguez
Copy link
Member

Great, thanks!

The gain seems very marginal so I'm a bit reluctant to do this now. What are the results if you completely wipe out the ~/.bundle/cache directory first?

@martinemde
Copy link
Member

martinemde commented May 17, 2024

Hey, I think we're testing this wrong. It surprised me how hard this code path was to trigger. We don't run this on bundle install, only on bundle update. Here are my results running identical versions of bundler with only an ENV to switch between the old checksums and new.

rubygems.org(master) ✗: ./script.sh update
++ pwd
+ AFTER=true
+ ruby -I/Users/martinemde/p/github.com/rubygems/rubygems/lib:/Users/martinemde/p/github.com/rubygems/rubygems/bundler/lib /Users/martinemde/p/github.com/rubygems/rubygems.org/bundle.rb update
--- new checksums ---
Fetching gem metadata from https://rubygems.org/.........
Resolving dependencies...
Bundle updated!
1 installed gem you directly depend on is looking for funding.
  Run `bundle fund` for details
++ pwd
+ ruby -I/Users/martinemde/p/github.com/rubygems/rubygems/lib:/Users/martinemde/p/github.com/rubygems/rubygems/bundler/lib /Users/martinemde/p/github.com/rubygems/rubygems.org/bundle.rb update
--- old checksums ---
Fetching gem metadata from https://rubygems.org/.........
Resolving dependencies...
Bundle updated!
1 installed gem you directly depend on is looking for funding.
  Run `bundle fund` for details
+ fd 'memprof*'
+ xargs -n1 head -n2 ''
==> memprof.after.txt <==
Total allocated: 689.06 MB (9638226 objects)
Total retained:  237.01 MB (2979180 objects)
==> memprof.before.txt <==
Total allocated: 755.64 MB (10379242 objects)
Total retained:  236.94 MB (2977745 objects)

I added puts "--- new checksums ---" so you can see that it's actually running (which helped me find that bundle install does not run this code.

I think 66.58MB is worth a little ugly code. I also don't think this code is necessarily ugly.

Copy link
Member

@martinemde martinemde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work. It was great working with you at RailsConf. Thanks for persevering with all the questions and performance testing.

@simi
Copy link
Member

simi commented May 18, 2024

I think 66.58MB is worth a little ugly code. I also don't think this code is necessarily ugly.

That's the beauty of Ruby, even "little obfuscated" looks briliant 😄. I'm just afraid of making this a little cryptic since it is just String#split in optimized way. Would be good to find a way to make it clear for code hackers as well.

@simi
Copy link
Member

simi commented May 18, 2024

@martinemde would be YJIT also related to this? Worth to test difference with enabled?

@segiddins
Copy link
Member

YJIT shows a relatively similar improvement.

no yjit:

Calculating -------------------------------------
                  PR    60.732M memsize (     0.000  retained)
                       559.265k objects (     0.000  retained)
                        50.000  strings (     0.000  retained)
                 old   117.692M memsize (     0.000  retained)
                         1.119M objects (     0.000  retained)
                        50.000  strings (     0.000  retained)

Comparison:
                  PR:   60731524 allocated
                 old:  117692108 allocated - 1.94x more
ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [arm64-darwin23]
Warming up --------------------------------------
                  PR     1.000 i/100ms
                 old     1.000 i/100ms
Calculating -------------------------------------
                  PR     11.717 (± 8.5%) i/s -     58.000 in   5.014633s
                 old      7.299 (± 0.0%) i/s -     37.000 in   5.075101s

Comparison:
                  PR:       11.7 i/s
                 old:        7.3 i/s - 1.61x  slower

yjit

Calculating -------------------------------------
                  PR    60.732M memsize (     0.000  retained)
                       559.265k objects (     0.000  retained)
                        50.000  strings (     0.000  retained)
                 old   117.692M memsize (     0.000  retained)
                         1.119M objects (     0.000  retained)
                        50.000  strings (     0.000  retained)

Comparison:
                  PR:   60731524 allocated
                 old:  117692108 allocated - 1.94x more
ruby 3.3.0 (2023-12-25 revision 5124f9ac75) +YJIT [arm64-darwin23]
Warming up --------------------------------------
                  PR     1.000 i/100ms
                 old     1.000 i/100ms
Calculating -------------------------------------
                  PR     13.074 (± 7.6%) i/s -     65.000 in   5.020382s
                 old      8.369 (± 0.0%) i/s -     42.000 in   5.034536s

Comparison:
                  PR:       13.1 i/s
                 old:        8.4 i/s - 1.56x  slower

even better tho, especially with yjit:

ret = {}
    s = SOURCE_P.read.freeze
    idx = s.index("\n---\n")&.+(4) || -1
    len = s.size
    while idx < len
      n = idx + 1
      nl = s.index("\n", n)
      sp = s.index(" ", n)
      break if !nl && !sp
      if nl && sp > nl
        raise "space after newline: #{{n:, nl:, sp:, l: s[n..nl]}.inspect}"
      end
      unless sp
        raise "No space on line: #{{n:, nl:, sp:, l: s[n..]}.inspect}"
      end
      
      k = s[n, sp-n].freeze
      n = sp + 1
      
      sp = s.index(" ", n)
      unless sp
        raise "space after newline: #{{n:, nl:, sp:, l: s[n..sp]}.inspect}"
      end
      
      n = sp + 1
      
      v = s[n, nl-sp-1]
      n = nl + 1
      
      ret[k] = v
      
      break unless idx = nl
    end
    
    ret

@martinemde
Copy link
Member

martinemde commented May 19, 2024

I'm inclined to merge this as is, because every other version I can come up with is more messy and more extreme in its approach to optimizing. I also want to give @jacklynhma credit for her work.

That said, #7672 is my more extreme approach that eschews parsing the file entirely, opting to simply search it in memory for each gem name. It saves about 125MB compared to master and retains 35MB less than either version.

I say we merge this (as in this current PR, here, where we're conversing) and then continue the comparison in follow-ups.

@simi
Copy link
Member

simi commented May 19, 2024

@martinemde I'm all in, it is just cryptic code and there is no guard against moving it back to split. Wrapping it in private method in same class with proper documentation comment explaining this is optimized for given use-case only would be enough for me.

And there are still open questions. Is this something upstream String#split can benefit from? Are there any other places to use this in RubyGems/Bundler codebase?

@martinemde
Copy link
Member

martinemde commented May 19, 2024

Part of the reason this is so much better is that we're selecting only the first and third result of split. Upstreaming would require something complicated.

🦄 🌈 Let me run with the idea for a moment of what I'd like to see, a daydream if you will: an enumerator like object, something that represents the result of a string manipulation, but it does not immediately perform it. You operate on what you want the output to be and once you've committed the operation, it pulls exactly and only the strings you requested. Dreamcode:

File.parse(path).lines.split(" ", 3).values_at(0,2).to_h

Imagine parse creates an object that accumulates the instructions, to_h finally evaluates them into the direct creation of a hash from 0th and 2nd space delimited strings with no intermediary arrays or strings that weren't needed.

@martinemde
Copy link
Member

martinemde commented May 19, 2024

@simi, I think you're suggesting something like this:

      def checksums
        lines(versions_path).each_with_object({}) do |line, checksums|
          parse_version_checksum(line, checksums)
        end
      end

@deivid-rodriguez
Copy link
Member

deivid-rodriguez commented May 20, 2024

Thanks for the clarification on the improvements. I'm definitely in!

@martinemde Yes, I think that's basically what @simi and I are suggesting, with an explanation on the rationale of the new logic, so that it prevents some future hacker from thinking: "Mmmmmm, is this the same as line.split(" ", 3)? I'll switch to that since it's much simpler!".

@jacklynhma
Copy link
Contributor Author

@martinemde It was a pleasure working with you at RailsConf!

Are there any modifications you would like me to make to finalize this? Would you prefer that I extract this into a method, as demonstrated here? #7637 (comment)

@martinemde martinemde changed the base branch from master to martinemde/compact-index-cache-refactor May 23, 2024 23:14
@martinemde martinemde changed the base branch from martinemde/compact-index-cache-refactor to master May 23, 2024 23:16
@martinemde
Copy link
Member

martinemde commented May 23, 2024

@jacklynhma Can you make the code look like this:

      def checksums
        lines(versions_path).each_with_object({}) do |line, checksums|
          parse_version_checksum(line, checksums)
        end
      end

Then put everything that was inside the block into the method parse_version_checksum and put that at the bottom of the file?

        # This is mostly the same as `split(" ", 3)`, but splitting is much less efficient in this case.
        # This method gets called around 190,000 times when parsing the versions file.
        def parse_version_checksum(line, checksums)
          # allows slicing into the string to not allocate a copy
          # of the line
          line.freeze
          name_end = line.index(" ")
          checksum_start = line.index(" ", name_end + 1) + 1
          checksum_end = line.size - checksum_start
          # freeze name since it is used as a hash key, pre-freezing
          # means a frozen copy isn't created
          name = line[0, name_end].freeze
          checksum = line[checksum_start, checksum_end]
          checksums[name] = checksum
        end

Oh, and can you squash the commits into one so we can merge a single commit?

@martinemde martinemde enabled auto-merge May 24, 2024 19:56
@martinemde martinemde merged commit 48eb917 into rubygems:master May 24, 2024
83 checks passed
@simi
Copy link
Member

simi commented May 25, 2024

Good job everyone! I think we have found the best of all worlds in here. I really appreciate the comment on the top of the method keeping it super simple to follow. 💪 🥳

@jacklynhma
Copy link
Contributor Author

jacklynhma commented May 25, 2024

Thank you, everyone, for your feedback and guidance :)! 🥳

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants