New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Capture subprocess output using IO.pipe instead of tempfiles. #664
base: master
Are you sure you want to change the base?
Capture subprocess output using IO.pipe instead of tempfiles. #664
Conversation
PR looks good to me (@byroot). However, for your benchmark, instead of |
Yeah. This benchmark is meaningless as written. I can convert it to benchmark-ips tho... My real concern is whether this code works on Windows. I suspect it doesn't, but I'm not able to confirm myself. |
I found it hard to find any useful information about |
The documentation of However the method seem to have special code for win32 added in 2011. So it's really not clear. The best would be to find a windows machine. |
You can use AppVeyor to run CI builds on a windows machine. It's free for OSS. |
Seems like IO.pipe does in fact now work on Windows. Using Windows 10 with Ruby 2.3.3 from the ruby installer program. Here's what I did: I cloned wvanbergen's copy of the minitest repo, and checked out his branch in I made a file:
and ran it:
|
Further more, I used the basic example from minitest's readme, and made this:
|
@zenspider Anything I can do for you to consider merging this PR?
If not, I am also happy to close this PR. |
$: << "lib"
require "benchmark/ips"
require 'minitest/assertions'
class B
extend Minitest::Assertions
end
Benchmark.ips do |x|
x.report("io") do |max|
max.times do
B.capture_io { $stdout.puts; $stderr.puts }
end
end
x.report("subprocess_io") do |max|
max.times do
B.capture_subprocess_io { $stdout.puts; $stderr.puts }
end
end
x.report("pipe_io") do |max|
max.times do
B.capture_pipe_io { $stdout.puts; $stderr.puts }
end
end
x.compare!
end
# Comparison:
# io: 805744.8 i/s
# pipe_io: 28359.2 i/s - 28.41x slower
# subprocess_io: 3023.1 i/s - 266.53x slower |
My work setup has a windows slice I can test on. Hopefully. |
lib/minitest/assertions.rb
Outdated
|
||
return captured_stdout.read, captured_stderr.read | ||
return stdout_reader.read, stderr_reader.read |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a bug: a pipe only has a limited capacity, if the output to stdout or stderr exceeds it, this will block indefinitely because we only start reading after all the writing is done.
The solution is to start reading in a thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe specify a length to read and do it in a loop until empty? There's probably 100 ways to do this and most are bad. :P
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the fix I used in another project.
stdout_reader, stdout_writer = IO.pipe
stderr_reader, stderr_writer = IO.pipe
orig_stdout, orig_stderr = $stdout.dup, $stderr.dup
begin
$stdout.reopen stdout_writer
$stderr.reopen stderr_writer
stdout_reader_thread = Thread.new { stdout_reader.read }
stderr_reader_thread = Thread.new { stderr_reader.read }
yield
ensure
$stdout.reopen orig_stdout
$stderr.reopen orig_stderr
stdout_writer.close
stderr_writer.close
end
return stdout_reader_thread.join.value, stderr_reader_thread.join.value
I pushed a commit to address this issue. Happy to take another approach if you prefer!
…lock when the IO.pipe is full.
yield | ||
begin | ||
$stdout.reopen stdout_writer | ||
$stderr.reopen stderr_writer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is more of a theoretical issue, but $stderr
and $stdout
are "normal" Ruby variables that the user could have overwritten to something else. It might be better to use the constants instead, STDERR
and STDOUT
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly the opposite. STDERR
and STDOUT
are constants are really only there so you can restore $stderr
and $stdout
. puts
and warn
and friends use the globals to output and so should we. If the user overwrote them to something else, it was probably for a reason and we still want to be able to test that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the user overwrote them to something else, it was probably for a reason and we still want to be able to test that
But a subprocess will not pick up on that. So if the user assigned something else to these variables in the parent process, this will have no effect on the subprocess, and we end up not capturing the output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But... You have that problem anyways. Using the constants won't help if they're not being used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrote this test script
require 'minitest'
class Foo
include Minitest::Assertions
def call
stdout, _ = capture_subprocess_io do
system("echo From subprocess")
$stdout.puts "From parent using $stdout"
STDOUT.puts "From parent using STDOUT"
end
end
end
output, _ = Foo.new.call
STDERR.puts "> Captured string before redirecting $stdout:"
STDERR.print output
STDERR.print
$stdout = Tempfile.new
begin
output, _ = Foo.new.call
STDERR.puts "> Captured string after redirect:"
STDERR.print output
ensure
$stdout.unlink
$stdout.close
end
When I run this, using the current implementation (which uses $stdout/$stderr):
> Captured string before redirecting $stdout:
From subprocess
From parent using $stdout
From parent using STDOUT
From subprocess # not captured, printed directly
From parent using STDOUT # not captured, printed directly
> Captured string after redirect:
From parent using $stdout
When I switch the implementation to use STDOUT and STDERR instead:
> Captured string before redirecting $stdout:
From subprocess
From parent using $stdout
From parent using STDOUT
> Captured string after redirect:
From subprocess
From parent using STDOUT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no perfect solution to this problem. I prefer the second option, because it will at least capture the IO from the subprocess, which is what the name of the method implies.
Note that this is also an issue with the current implementation using tempfiles.
…ke sure we don’t leave anything behind in case an exception occurs during yield.
This is looking increasingly problematic... Would you mind publishing it as a separate gem, at least for a while? |
I am not sure if this makes sense.
If you're uncertain about this, that's fine. It's just a performance improvement. We can close this PR. But I don't feel like maintaining a separate gem for this is going to give us much. |
This changes
Minitest::Assertions#capture_subprocess_io
to useIO.pipe
instead of temporary files to capture output. This is much faster and doesn't have a dependency on the filesystem.Benchmark results
I wrote a basic benchmark script to measure the performance:
Result for current master branch:
With this patch applied.
Still a lot slower than using
StringIO
, but an order of magnitude faster than the old implementation.cc @rafaelfranca @byroot