Capture subprocess output using IO.pipe instead of tempfiles. #664

wvanbergen · 2016-12-09T11:25:24Z

This changes Minitest::Assertions#capture_subprocess_io to use IO.pipe instead of temporary files to capture output. This is much faster and doesn't have a dependency on the filesystem.

Benchmark results

I wrote a basic benchmark script to measure the performance:

require 'benchmark'
require_relative '../../lib/minitest/assertions'

include Minitest::Assertions

Benchmark.bmbm do |x|
  x.report("capture_subprocess_io") do
    10000.times { capture_subprocess_io { $stdout.puts; $stderr.puts } }
  end
  x.report("capture_io") do
    10000.times { capture_io { $stdout.puts; $stderr.puts } }
  end
end

Result for current master branch:

Rehearsal ---------------------------------------------------------
capture_subprocess_io   1.370000   3.020000   4.390000 (  6.547303)
capture_io              0.030000   0.010000   0.040000 (  0.027946)
------------------------------------------------ total: 4.430000sec

                            user     system      total        real
capture_subprocess_io   1.390000   3.100000   4.490000 (  6.642254)
capture_io              0.020000   0.000000   0.020000 (  0.019464)

With this patch applied.

Rehearsal ---------------------------------------------------------
capture_subprocess_io   0.260000   0.230000   0.490000 (  0.488526)
capture_io              0.020000   0.000000   0.020000 (  0.021939)
------------------------------------------------ total: 0.510000sec

                            user     system      total        real
capture_subprocess_io   0.260000   0.220000   0.480000 (  0.476628)
capture_io              0.020000   0.000000   0.020000 (  0.019058)

Still a lot slower than using StringIO, but an order of magnitude faster than the old implementation.

cc @rafaelfranca @byroot

casperisfine · 2016-12-09T11:39:11Z

PR looks good to me (@byroot). However, for your benchmark, instead of 10_000.times, it's best to use Benchmark.ips

zenspider · 2016-12-09T21:43:35Z

Yeah. This benchmark is meaningless as written. I can convert it to benchmark-ips tho...

My real concern is whether this code works on Windows. I suspect it doesn't, but I'm not able to confirm myself.

wvanbergen · 2016-12-09T21:53:24Z

I found it hard to find any useful information about IO.pipe on Windows, but this seems to indicate it does work: http://stackoverflow.com/questions/18127095/ruby-io-popen-not-working-lame-stdin-and-stdout-encoding

byroot · 2016-12-10T12:31:39Z

The documentation of IO.pipe says " Not available on all platforms" from 2003

However the method seem to have special code for win32 added in 2011.

So it's really not clear. The best would be to find a windows machine.

blowmage · 2016-12-10T15:19:51Z

You can use AppVeyor to run CI builds on a windows machine. It's free for OSS.

snarfmason · 2016-12-10T19:46:34Z

Seems like IO.pipe does in fact now work on Windows. Using Windows 10 with Ruby 2.3.3 from the ruby installer program.

Here's what I did:

I cloned wvanbergen's copy of the minitest repo, and checked out his branch in D:\ruby\minitest-repos\wvanbergen\minitest

I made a file: D:\ruby\capture_io_example.rb with contents:

require_relative "minitest-repos/wvanbergen/minitest/lib/minitest/assertions"

include Minitest::Assertions

out, err = capture_subprocess_io do
  system "echo Some info"
  system "echo You did a bad thing 1>&2"
end

puts "***#{out}***"
puts "***#{err}***

and ran it:

D:\ruby>ruby capture_io_example.rb
***Some info
***
***You did a bad thing
***

snarfmason · 2016-12-10T19:49:23Z

Further more, I used the basic example from minitest's readme, and made this:
https://gist.github.com/snarfmason/a32e2c882054ce16b1b8977634435557

D:\ruby>ruby basic_mini_test.rb
Run options: --seed 8701

# Running:

.S.

Finished in 0.003289s, 912.1613 runs/s, 912.1613 assertions/s.

3 runs, 3 assertions, 0 failures, 0 errors, 1 skips

You have skipped tests. Run with --verbose for details.

wvanbergen · 2017-10-21T11:01:54Z

@zenspider Anything I can do for you to consider merging this PR?

Test on windows
Redo benchmark using ips
Anything else?

If not, I am also happy to close this PR.

zenspider · 2019-06-11T09:04:57Z

$: << "lib"

require "benchmark/ips"
require 'minitest/assertions'

class B
  extend Minitest::Assertions
end

Benchmark.ips do |x|
  x.report("io") do |max|
    max.times do
      B.capture_io { $stdout.puts; $stderr.puts }
    end
  end

  x.report("subprocess_io") do |max|
    max.times do
      B.capture_subprocess_io { $stdout.puts; $stderr.puts }
    end
  end

  x.report("pipe_io") do |max|
    max.times do
      B.capture_pipe_io { $stdout.puts; $stderr.puts }
    end
  end

  x.compare!
end

# Comparison:
#                   io:   805744.8 i/s
#              pipe_io:    28359.2 i/s - 28.41x  slower
#        subprocess_io:     3023.1 i/s - 266.53x  slower

zenspider · 2019-06-11T09:05:37Z

My work setup has a windows slice I can test on. Hopefully.

wvanbergen · 2019-06-11T09:09:13Z

lib/minitest/assertions.rb


-          return captured_stdout.read, captured_stderr.read
+          return stdout_reader.read, stderr_reader.read


There is a bug: a pipe only has a limited capacity, if the output to stdout or stderr exceeds it, this will block indefinitely because we only start reading after all the writing is done.

The solution is to start reading in a thread.

Maybe specify a length to read and do it in a loop until empty? There's probably 100 ways to do this and most are bad. :P

This is the fix I used in another project.

stdout_reader, stdout_writer = IO.pipe stderr_reader, stderr_writer = IO.pipe orig_stdout, orig_stderr = $stdout.dup, $stderr.dup begin $stdout.reopen stdout_writer $stderr.reopen stderr_writer stdout_reader_thread = Thread.new { stdout_reader.read } stderr_reader_thread = Thread.new { stderr_reader.read } yield ensure $stdout.reopen orig_stdout $stderr.reopen orig_stderr stdout_writer.close stderr_writer.close end return stdout_reader_thread.join.value, stderr_reader_thread.join.value

I pushed a commit to address this issue. Happy to take another approach if you prefer!

…lock when the IO.pipe is full.

wvanbergen · 2019-06-11T09:20:03Z

lib/minitest/assertions.rb

-          yield
+          begin
+            $stdout.reopen stdout_writer
+            $stderr.reopen stderr_writer


This is more of a theoretical issue, but $stderr and $stdout are "normal" Ruby variables that the user could have overwritten to something else. It might be better to use the constants instead, STDERR and STDOUT.

Exactly the opposite. STDERR and STDOUT are constants are really only there so you can restore $stderr and $stdout. puts and warn and friends use the globals to output and so should we. If the user overwrote them to something else, it was probably for a reason and we still want to be able to test that.

If the user overwrote them to something else, it was probably for a reason and we still want to be able to test that

But a subprocess will not pick up on that. So if the user assigned something else to these variables in the parent process, this will have no effect on the subprocess, and we end up not capturing the output.

But... You have that problem anyways. Using the constants won't help if they're not being used.

I wrote this test script

require 'minitest' class Foo include Minitest::Assertions def call stdout, _ = capture_subprocess_io do system("echo From subprocess") $stdout.puts "From parent using $stdout" STDOUT.puts "From parent using STDOUT" end end end output, _ = Foo.new.call STDERR.puts "> Captured string before redirecting $stdout:" STDERR.print output STDERR.print $stdout = Tempfile.new begin output, _ = Foo.new.call STDERR.puts "> Captured string after redirect:" STDERR.print output ensure $stdout.unlink $stdout.close end

When I run this, using the current implementation (which uses $stdout/$stderr):

> Captured string before redirecting $stdout: From subprocess From parent using $stdout From parent using STDOUT From subprocess # not captured, printed directly From parent using STDOUT # not captured, printed directly > Captured string after redirect: From parent using $stdout

When I switch the implementation to use STDOUT and STDERR instead:

> Captured string before redirecting $stdout: From subprocess From parent using $stdout From parent using STDOUT > Captured string after redirect: From subprocess From parent using STDOUT

There is no perfect solution to this problem. I prefer the second option, because it will at least capture the IO from the subprocess, which is what the name of the method implies.

Note that this is also an issue with the current implementation using tempfiles.

…ke sure we don’t leave anything behind in case an exception occurs during yield.

zenspider · 2019-06-11T10:51:01Z

This is looking increasingly problematic... Would you mind publishing it as a separate gem, at least for a while?

wvanbergen · 2019-06-11T11:10:12Z

This is looking increasingly problematic... Would you mind publishing it as a separate gem, at least for a while?

I am not sure if this makes sense.

I pushed a fixed for the blocking pipe issue. That code has been in production on our large CI suite (which is why I knew about it), and gets exercised hundreds of thousands of times a day.
The issue with STDOUT vs. $stdout also exists in the current implementation using Tempfiles. That doesn't really have anything to do with this PR.
Only outstanding thing I cannot test is Windows compatibility.

If you're uncertain about this, that's fine. It's just a performance improvement. We can close this PR. But I don't feel like maintaining a separate gem for this is going to give us much.

Capture subprocess output using IO.pipe instead of tempfiles.

dcfda74

zenspider self-assigned this Dec 9, 2016

wvanbergen commented Jun 11, 2019

View reviewed changes

Read redirected stderr and stdout in a thread, so the process won’t b…

057bc0c

…lock when the IO.pipe is full.

wvanbergen commented Jun 11, 2019

View reviewed changes

Move the closing if the IO pipes writers into the ensure block, to ma…

a5b5050

…ke sure we don’t leave anything behind in case an exception occurs during yield.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Capture subprocess output using IO.pipe instead of tempfiles. #664

Capture subprocess output using IO.pipe instead of tempfiles. #664

wvanbergen commented Dec 9, 2016 •

edited

casperisfine commented Dec 9, 2016

zenspider commented Dec 9, 2016

wvanbergen commented Dec 9, 2016

byroot commented Dec 10, 2016

blowmage commented Dec 10, 2016

snarfmason commented Dec 10, 2016

snarfmason commented Dec 10, 2016

wvanbergen commented Oct 21, 2017

zenspider commented Jun 11, 2019

zenspider commented Jun 11, 2019

wvanbergen Jun 11, 2019

zenspider Jun 11, 2019

wvanbergen Jun 11, 2019 •

edited

wvanbergen Jun 11, 2019

zenspider Jun 11, 2019

wvanbergen Jun 11, 2019 •

edited

zenspider Jun 11, 2019

wvanbergen Jun 11, 2019 •

edited

wvanbergen Jun 11, 2019

zenspider commented Jun 11, 2019

wvanbergen commented Jun 11, 2019


		return captured_stdout.read, captured_stderr.read
		return stdout_reader.read, stderr_reader.read

Capture subprocess output using IO.pipe instead of tempfiles. #664

Are you sure you want to change the base?

Capture subprocess output using IO.pipe instead of tempfiles. #664

Conversation

wvanbergen commented Dec 9, 2016 • edited

Benchmark results

casperisfine commented Dec 9, 2016

zenspider commented Dec 9, 2016

wvanbergen commented Dec 9, 2016

byroot commented Dec 10, 2016

blowmage commented Dec 10, 2016

snarfmason commented Dec 10, 2016

snarfmason commented Dec 10, 2016

wvanbergen commented Oct 21, 2017

zenspider commented Jun 11, 2019

zenspider commented Jun 11, 2019

wvanbergen Jun 11, 2019

Choose a reason for hiding this comment

zenspider Jun 11, 2019

Choose a reason for hiding this comment

wvanbergen Jun 11, 2019 • edited

Choose a reason for hiding this comment

wvanbergen Jun 11, 2019

Choose a reason for hiding this comment

zenspider Jun 11, 2019

Choose a reason for hiding this comment

wvanbergen Jun 11, 2019 • edited

Choose a reason for hiding this comment

zenspider Jun 11, 2019

Choose a reason for hiding this comment

wvanbergen Jun 11, 2019 • edited

Choose a reason for hiding this comment

wvanbergen Jun 11, 2019

Choose a reason for hiding this comment

zenspider commented Jun 11, 2019

wvanbergen commented Jun 11, 2019

wvanbergen commented Dec 9, 2016 •

edited

wvanbergen Jun 11, 2019 •

edited

wvanbergen Jun 11, 2019 •

edited

wvanbergen Jun 11, 2019 •

edited