Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Capture subprocess output using IO.pipe instead of tempfiles. #664

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

wvanbergen
Copy link

@wvanbergen wvanbergen commented Dec 9, 2016

This changes Minitest::Assertions#capture_subprocess_io to use IO.pipe instead of temporary files to capture output. This is much faster and doesn't have a dependency on the filesystem.

Benchmark results

I wrote a basic benchmark script to measure the performance:

require 'benchmark'
require_relative '../../lib/minitest/assertions'

include Minitest::Assertions

Benchmark.bmbm do |x|
  x.report("capture_subprocess_io") do
    10000.times { capture_subprocess_io { $stdout.puts; $stderr.puts } }
  end
  x.report("capture_io") do
    10000.times { capture_io { $stdout.puts; $stderr.puts } }
  end
end

Result for current master branch:

Rehearsal ---------------------------------------------------------
capture_subprocess_io   1.370000   3.020000   4.390000 (  6.547303)
capture_io              0.030000   0.010000   0.040000 (  0.027946)
------------------------------------------------ total: 4.430000sec

                            user     system      total        real
capture_subprocess_io   1.390000   3.100000   4.490000 (  6.642254)
capture_io              0.020000   0.000000   0.020000 (  0.019464)

With this patch applied.

Rehearsal ---------------------------------------------------------
capture_subprocess_io   0.260000   0.230000   0.490000 (  0.488526)
capture_io              0.020000   0.000000   0.020000 (  0.021939)
------------------------------------------------ total: 0.510000sec

                            user     system      total        real
capture_subprocess_io   0.260000   0.220000   0.480000 (  0.476628)
capture_io              0.020000   0.000000   0.020000 (  0.019058)

Still a lot slower than using StringIO, but an order of magnitude faster than the old implementation.

cc @rafaelfranca @byroot

@casperisfine
Copy link

PR looks good to me (@byroot). However, for your benchmark, instead of 10_000.times, it's best to use Benchmark.ips

@zenspider
Copy link
Collaborator

Yeah. This benchmark is meaningless as written. I can convert it to benchmark-ips tho...

My real concern is whether this code works on Windows. I suspect it doesn't, but I'm not able to confirm myself.

@zenspider zenspider self-assigned this Dec 9, 2016
@wvanbergen
Copy link
Author

I found it hard to find any useful information about IO.pipe on Windows, but this seems to indicate it does work: http://stackoverflow.com/questions/18127095/ruby-io-popen-not-working-lame-stdin-and-stdout-encoding

@byroot
Copy link

byroot commented Dec 10, 2016

The documentation of IO.pipe says " Not available on all platforms" from 2003

However the method seem to have special code for win32 added in 2011.

So it's really not clear. The best would be to find a windows machine.

@blowmage
Copy link

You can use AppVeyor to run CI builds on a windows machine. It's free for OSS.

@snarfmason
Copy link

Seems like IO.pipe does in fact now work on Windows. Using Windows 10 with Ruby 2.3.3 from the ruby installer program.

Here's what I did:

I cloned wvanbergen's copy of the minitest repo, and checked out his branch in D:\ruby\minitest-repos\wvanbergen\minitest

I made a file: D:\ruby\capture_io_example.rb with contents:

require_relative "minitest-repos/wvanbergen/minitest/lib/minitest/assertions"

include Minitest::Assertions

out, err = capture_subprocess_io do
  system "echo Some info"
  system "echo You did a bad thing 1>&2"
end

puts "***#{out}***"
puts "***#{err}***

and ran it:

D:\ruby>ruby capture_io_example.rb
***Some info
***
***You did a bad thing
***

@snarfmason
Copy link

Further more, I used the basic example from minitest's readme, and made this:
https://gist.github.com/snarfmason/a32e2c882054ce16b1b8977634435557

D:\ruby>ruby basic_mini_test.rb
Run options: --seed 8701

# Running:

.S.

Finished in 0.003289s, 912.1613 runs/s, 912.1613 assertions/s.

3 runs, 3 assertions, 0 failures, 0 errors, 1 skips

You have skipped tests. Run with --verbose for details.

@wvanbergen
Copy link
Author

@zenspider Anything I can do for you to consider merging this PR?

  • Test on windows
  • Redo benchmark using ips
  • Anything else?

If not, I am also happy to close this PR.

@zenspider
Copy link
Collaborator

$: << "lib"

require "benchmark/ips"
require 'minitest/assertions'

class B
  extend Minitest::Assertions
end

Benchmark.ips do |x|
  x.report("io") do |max|
    max.times do
      B.capture_io { $stdout.puts; $stderr.puts }
    end
  end

  x.report("subprocess_io") do |max|
    max.times do
      B.capture_subprocess_io { $stdout.puts; $stderr.puts }
    end
  end

  x.report("pipe_io") do |max|
    max.times do
      B.capture_pipe_io { $stdout.puts; $stderr.puts }
    end
  end

  x.compare!
end

# Comparison:
#                   io:   805744.8 i/s
#              pipe_io:    28359.2 i/s - 28.41x  slower
#        subprocess_io:     3023.1 i/s - 266.53x  slower

@zenspider
Copy link
Collaborator

My work setup has a windows slice I can test on. Hopefully.


return captured_stdout.read, captured_stderr.read
return stdout_reader.read, stderr_reader.read
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a bug: a pipe only has a limited capacity, if the output to stdout or stderr exceeds it, this will block indefinitely because we only start reading after all the writing is done.

The solution is to start reading in a thread.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe specify a length to read and do it in a loop until empty? There's probably 100 ways to do this and most are bad. :P

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the fix I used in another project.

stdout_reader, stdout_writer = IO.pipe
stderr_reader, stderr_writer = IO.pipe

orig_stdout, orig_stderr = $stdout.dup, $stderr.dup

begin
  $stdout.reopen stdout_writer
  $stderr.reopen stderr_writer

  stdout_reader_thread = Thread.new { stdout_reader.read }
  stderr_reader_thread = Thread.new { stderr_reader.read }

  yield
ensure
  $stdout.reopen orig_stdout
  $stderr.reopen orig_stderr

  stdout_writer.close
  stderr_writer.close
end

return stdout_reader_thread.join.value, stderr_reader_thread.join.value

I pushed a commit to address this issue. Happy to take another approach if you prefer!

yield
begin
$stdout.reopen stdout_writer
$stderr.reopen stderr_writer
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more of a theoretical issue, but $stderr and $stdout are "normal" Ruby variables that the user could have overwritten to something else. It might be better to use the constants instead, STDERR and STDOUT.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly the opposite. STDERR and STDOUT are constants are really only there so you can restore $stderr and $stdout. puts and warn and friends use the globals to output and so should we. If the user overwrote them to something else, it was probably for a reason and we still want to be able to test that.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the user overwrote them to something else, it was probably for a reason and we still want to be able to test that

But a subprocess will not pick up on that. So if the user assigned something else to these variables in the parent process, this will have no effect on the subprocess, and we end up not capturing the output.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But... You have that problem anyways. Using the constants won't help if they're not being used.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote this test script

require 'minitest'

class Foo
  include Minitest::Assertions

  def call
    stdout, _ = capture_subprocess_io do
      system("echo From subprocess")
      $stdout.puts "From parent using $stdout"
      STDOUT.puts "From parent using STDOUT"
    end
  end
end

output, _ = Foo.new.call

STDERR.puts "> Captured string before redirecting $stdout:"
STDERR.print output

STDERR.print

$stdout = Tempfile.new
begin
  output, _ = Foo.new.call

  STDERR.puts "> Captured string after redirect:"
  STDERR.print output
ensure
  $stdout.unlink
  $stdout.close
end

When I run this, using the current implementation (which uses $stdout/$stderr):

> Captured string before redirecting $stdout:
From subprocess
From parent using $stdout
From parent using STDOUT

From subprocess            # not captured, printed directly
From parent using STDOUT   # not captured, printed directly
> Captured string after redirect:
From parent using $stdout

When I switch the implementation to use STDOUT and STDERR instead:

> Captured string before redirecting $stdout:
From subprocess
From parent using $stdout
From parent using STDOUT

> Captured string after redirect:
From subprocess
From parent using STDOUT

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no perfect solution to this problem. I prefer the second option, because it will at least capture the IO from the subprocess, which is what the name of the method implies.

Note that this is also an issue with the current implementation using tempfiles.

…ke sure we don’t leave anything behind in case an exception occurs during yield.
@zenspider
Copy link
Collaborator

This is looking increasingly problematic... Would you mind publishing it as a separate gem, at least for a while?

@wvanbergen
Copy link
Author

This is looking increasingly problematic... Would you mind publishing it as a separate gem, at least for a while?

I am not sure if this makes sense.

  • I pushed a fixed for the blocking pipe issue. That code has been in production on our large CI suite (which is why I knew about it), and gets exercised hundreds of thousands of times a day.
  • The issue with STDOUT vs. $stdout also exists in the current implementation using Tempfiles. That doesn't really have anything to do with this PR.
  • Only outstanding thing I cannot test is Windows compatibility.

If you're uncertain about this, that's fine. It's just a performance improvement. We can close this PR. But I don't feel like maintaining a separate gem for this is going to give us much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants