Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sass-embedded to REUSE_AS_BINARY_ON_TRUFFLERUBY #3565

Merged
merged 1 commit into from May 14, 2024

Conversation

ntkme
Copy link
Contributor

@ntkme ntkme commented May 6, 2024

I'm the maintainer of sass-embedded gem. This gem has "native extension" that is a Dart VM binary executable, which is completely independent and not linked to libruby. On the platform "ruby", there is a Rakefile based extension that downloads the correct binary for the current platform, and it is extremely slow on TruffleRuby comparing to MRI (likely due to Rakefile extension launching a child rake process etc). Therefore, it's better to just use the pre-compiled gems that bundles the binary to speed up the installation.

Copy link

Thank you for your pull request and welcome to our community! To contribute, please sign the Oracle Contributor Agreement (OCA).
The following contributors of this PR have not signed the OCA:

To sign the OCA, please create an Oracle account and sign the OCA in Oracle's Contributor Agreement Application.

When signing the OCA, please provide your GitHub username. After signing the OCA and getting an OCA approval from Oracle, this PR will be automatically updated.

If you are an Oracle employee, please make sure that you are a member of the main Oracle GitHub organization, and your membership in this organization is public.

@oracle-contributor-agreement oracle-contributor-agreement bot added the OCA Required At least one contributor does not have an approved Oracle Contributor Agreement. label May 6, 2024
@eregon
Copy link
Member

eregon commented May 7, 2024

Thank you for the PR!
Did you submit the OCA? If so I'll ping people to process it.

On the platform "ruby", there is a Rakefile based extension that downloads the correct binary for the current platform, and it is extremely slow on TruffleRuby comparing to MRI (likely due to Rakefile extension launching a child rake process etc).

Do you have a log of that or some timings?
I found https://github.com/sass-contrib/sass-embedded-host-ruby/actions/runs/8978997934/job/24660309742, 27s doesn't seem too bad for the install step (jruby takes approximately the same time), but that's on Native and on JVM it's 85s which is a lot.
In general I think it's fine to only test on Native (truffleruby) and not JVM (truffleruby+graalvm) in CI, because the latter is typically much slower in CI and there should be no difference anyway.

Locally gem install sass-embedded takes 29s (first run, also building google-protobuf) & 22s (second run) which is not that fast (for comparison json takes 5s).
With your PR, it takes 4s for gem i -V sass-embedded locally for me, a nice gain.

I wonder what takes so long, https://github.com/sass-contrib/sass-embedded-host-ruby/blob/main/ext/sass/Rakefile looks like a couple downloads which shouldn't be that slow. And a single rake subprocess from what I can see. That seems worth investigating & profiling, cc @andrykonchin but either way this PR seems good to merge.

Copy link

Thank you for signing the OCA.

@oracle-contributor-agreement oracle-contributor-agreement bot added OCA Verified All contributors have signed the Oracle Contributor Agreement. and removed OCA Required At least one contributor does not have an approved Oracle Contributor Agreement. labels May 7, 2024
@eregon
Copy link
Member

eregon commented May 7, 2024

And a single rake subprocess from what I can see. That seems worth investigating & profiling, cc @andrykonchin but either way this PR seems good to merge.

I tried cloning https://github.com/sass-contrib/sass-embedded-host-ruby and cd ext/sass and then time rake.
That takes 3s on CRuby but 20s on truffleruby-dev (Native).
git clean -Xf . between cleans the state.
I tried time ruby --cpusampler -S rake but that gave:

Zlib::BufError: buffer error
exception.c:29:in `rb_exc_raise'
exception.c:107:in `rb_ensure'
mutex.c:39:in `rb_mutex_synchronize'
/home/eregon/code/sass-embedded-host-ruby/ext/sass/Rakefile:191:in `fetch'
/home/eregon/code/sass-embedded-host-ruby/ext/sass/Rakefile:89:in `block in <top (required)>'
/home/eregon/code/sass-embedded-host-ruby/ext/sass/Rakefile:8:in `block in <top (required)>'

which looks like an issue of interrupting C ext code to get a backtrace.

On a second try it worked though, and took about 20s:

----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Thread[#1,main,5,main]
 Name                                                                               ||             Total Time    ||              Self Time    || Location             
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Hash#[]=                                                                           ||             2250ms   5.8% ||             2140ms   5.5% || (core)~1:0
 Marshal::StringState#consume_byte                                                  ||             1930ms   5.0% ||             1930ms   5.0% || resource:/truffleruby/core/marshal.rb~1423-1428:38993-39169
 String#<=>                                                                         ||             1630ms   4.2% ||             1630ms   4.2% || resource:/truffleruby/core/string.rb~1354-1376:39630-40399
 Array#map                                                                          ||             3050ms   7.8% ||             1380ms   3.5% || (core)~1:0
 NameTuple#initialize                                                               ||             1150ms   3.0% ||             1140ms   2.9% || ../../../../.rubies/truffleruby-dev/lib/mri/rubygems/name_tuple.rb~8-17:170-395
 Array#<<                                                                           ||              700ms   1.8% ||              700ms   1.8% || (core)~1:0
 Truffle::RegexpOperations.match_in_region                                          ||              720ms   1.8% ||              680ms   1.7% || resource:/truffleruby/core/truffle/regexp_operations.rb~76-84:2324-2787
 BasicObject#__id__                                                                 ||              630ms   1.6% ||              630ms   1.6% || (core)~1:0
 Marshal::State#construct                                                           ||            10130ms  26.0% ||              580ms   1.5% || resource:/truffleruby/core/marshal.rb~640-741:18778-21700
 Marshal::State#construct_integer                                                   ||             1460ms   3.7% ||              470ms   1.2% || resource:/truffleruby/core/marshal.rb~884-931:24658-26179
 Thread#value                                                                       ||              460ms   1.2% ||              460ms   1.2% || (core)~1:0
 Module#const_defined?                                                              ||              440ms   1.1% ||              440ms   1.1% || resource:/truffleruby/core/module.rb~111-113:4154-4262
 Integer#times                                                                      ||            10060ms  25.8% ||              420ms   1.1% || resource:/truffleruby/core/integer.rb~144-153:4421-4565
 Array#bottom_up_merge                                                              ||             1720ms   4.4% ||              400ms   1.0% || resource:/truffleruby/core/array.rb~1462-1481:33779-34262
 Truffle::Interop.execute                                                           ||              290ms   0.7% ||              290ms   0.7% || (core)~1:0
 Marshal::State#add_object                                                          ||             2910ms   7.5% ||              210ms   0.5% || resource:/truffleruby/core/marshal.rb~617-621:18348-18454
 block in SpecFetcher#tuples_for                                                    ||              170ms   0.4% ||              170ms   0.4% || ../../../../.rubies/truffleruby-dev/lib/mri/rubygems/spec_fetcher.rb~256:6404-6420
 Truffle::Type.check_funcall                                                        ||              390ms   1.0% ||              170ms   0.4% || resource:/truffleruby/core/type.rb~289-306:9734-10246
 String#byteslice                                                                   ||              160ms   0.4% ||              160ms   0.4% || resource:/truffleruby/core/string.rb~44-71:2090-3073
 Array#map!                                                                         ||              170ms   0.4% ||              130ms   0.3% || (core)~1:0
 Truffle::Splitter.add_empty                                                        ||              130ms   0.3% ||              120ms   0.3% || resource:/truffleruby/core/splitter.rb~212-214:7206-7314
 Module#const_get                                                                   ||              120ms   0.3% ||              120ms   0.3% || resource:/truffleruby/core/module.rb~115-151:4267-5369
 Enumerable::SortedElement#initialize                                               ||              120ms   0.3% ||              120ms   0.3% || resource:/truffleruby/core/enumerable.rb~482-484:12795-12868
 block in Enumerable#sort_by                                                        ||              420ms   1.1% ||              120ms   0.3% || resource:/truffleruby/core/enumerable.rb~500-502:13200-13254

Looks like Marshal is taking a bunch of time here. But I'm not sure how much we can trust that profiling output (notably it will likely not account well for time spent in native code, and guest safepoint bias).

@andrykonchin andrykonchin added the in-ci The PR is being tested in CI. Do not push new commits. label May 7, 2024
@eregon
Copy link
Member

eregon commented May 7, 2024

Here is a profile with async-profiler on JVM on 24.0.1:
profile.zip
(using ruby '--vm.agentpath:/home/eregon/Downloads/async-profiler-3.0-linux-x64/lib/libasyncProfiler.so=start,event=cpu,file=profile.html,include=*org/truffleruby*' --vm.XX:+UnlockDiagnosticVMOptions --vm.XX:+DebugNonSafepoints -S rake)
Interestingly it also took about 20s seconds. I wonder if we have some weird network timeout or so.
But from looking at the profile very quickly it looks like we spend a bunch of time in Ruby code and not much in native code, so maybe Marshal is indeed a bottleneck here, it'd be good to confirm with some manual timings.

@eregon eregon added this to the 24.1.0 Release (Sep 17, 2024) milestone May 7, 2024
@ntkme
Copy link
Contributor Author

ntkme commented May 7, 2024

I took some time and found this line is where most of the time was spent, and why Marshal shows up in CPU profiling:

    specs_and_sources, _errors = Gem::SpecFetcher.fetcher.spec_for_dependency(dependency, false)

https://github.com/sass-contrib/sass-embedded-host-ruby/blob/6cd3c23933e2c76e07683c19e7157f5cc4e3c64a/ext/sass/Rakefile#L215

The context here is that, let's say user is installing platform ruby sass-embedded 1.77.0 on aarch64-linux-gnu, the Rakefile will download the rubygems index from rubygems.org and find 1.77.0-aarch64-linux-gnu precompiled gem from the index, then download the gem and extract the pre-built extension from it - this is to avoid rely on external command like tar, unzip, etc used in local development environment, and to make it work with whatever gem sources of users' choice (if we do third party download it may be blocked by some enterprise firewall).

@eregon
Copy link
Member

eregon commented May 7, 2024

Ah BTW I have been thinking it might worth adding an option to append custom gems to that list. And that could then be used via TRUFFLERUBYOPT.
@andrykonchin Could you do that in a separate PR?

@eregon
Copy link
Member

eregon commented May 7, 2024

@ntkme Thank you for looking into it. Do you know why gem install sass-embedded together with your change doesn't occur this overhead? It still downloads sass-embedded-1.77.0-aarch64-linux-gnu.gem in the end.
So I guess it's the part about figuring out the gem for the current platform which is really slow. Surprising as I'd expect there is little data to unmarshal. But OTOH marshaled gemspecs contain the list of files which can be big.

@ntkme
Copy link
Contributor Author

ntkme commented May 7, 2024

Do you know why gem install sass-embedded together with your change doesn't occur this overhead?
Surprising as I'd expect there is little data to unmarshal.

I checked this and it's due to gem install sass-embedded fetches index differently than Gem::SpecFetcher.fetcher.spec_for_dependency:

This is how gem install sass-embedded fetches the index:

HEAD https://index.rubygems.org/
200 OK
GET https://index.rubygems.org/info/sass-embedded
200 OK
GET https://index.rubygems.org/quick/Marshal.4.8/sass-embedded-1.77.0-arm64-darwin.gemspec.rz
200 OK
GET https://index.rubygems.org/info/google-protobuf
200 OK
GET https://index.rubygems.org/info/rake
200 OK
Downloading gem sass-embedded-1.77.0-arm64-darwin.gem
GET https://index.rubygems.org/gems/sass-embedded-1.77.0-arm64-darwin.gem
Fetching sass-embedded-1.77.0-arm64-darwin.gem
200 OK

This is how Gem::SpecFetcher.fetcher.spec_for_dependency fetches the index:

GET https://rubygems.org/specs.4.8.gz
304 Not Modified

It turns out spec_for_dependency is fetching the complete index, rather than a partial index, which explains why it's way slower. Let me see if I can figure out gem install works internally.

@ntkme
Copy link
Contributor Author

ntkme commented May 7, 2024

I was able to bring it down to roughly 4s with sass-contrib/sass-embedded-host-ruby@7f10921

Still, this means installing native gem directly will take 4s, and install platform ruby will take roughly 2 x 4 = 8s.

@eregon
Copy link
Member

eregon commented May 7, 2024

Good catch and that explains it clearly, fetching all specs is quite expensive and something that IIRC shouldn't happen anymore with recent RubyGems & Bundler.
It might be a RubyGems bug that Gem::SpecFetcher.fetcher.spec_for_dependency fetches the full index (but maybe that API makes it difficult to use index.rubygems.org?).

@eregon
Copy link
Member

eregon commented May 7, 2024

In hindsight I should probably have tried --cpusampler=flamegraph, that might have shown the full index is used (but I had little time to investigate this today).

In any case, we'll merge this PR, thank you for the investigation and PR!

@ntkme
Copy link
Contributor Author

ntkme commented May 7, 2024

maybe that API makes it difficult to use index.rubygems.org?

Yes, the two APIs work differently:

  • Gem::SpecFetcher returns Gem::Specification
  • Gem::Resolver::BestSet returns Gem::Resolver::Specification

As the documentation says:

# A Resolver::Specification contains a subset of the information
# contained in a Gem::Specification.  Only the information necessary for
# dependency resolution in the resolver is included.

@graalvmbot graalvmbot merged commit 2124e4a into oracle:master May 14, 2024
14 checks passed
@ntkme ntkme deleted the sass-embedded-reuse-as-binary branch May 14, 2024 08:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in-ci The PR is being tested in CI. Do not push new commits. OCA Verified All contributors have signed the Oracle Contributor Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants