Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow loading html from file uri #209

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Conversation

jkowens
Copy link

@jkowens jkowens commented Dec 2, 2023

Puppeteer can load HTML from the local file system.

url = "file:///some/absolute/path/content.html"

pdf = Grover.new(url,
                 launch_args: [
                    '--no-sandbox',
                    '--disable-web-security',
                    '--disable-dev-shm-usage',
                    '--disable-setuid-sandbox',
                    '--allow-file-access-from-files',
                    '--enable-local-file-accesses'
                  ],
                  wait_until: ['load', 'domcontentloaded'],
                  allow_file_url: true
                 ).to_pdf

@jkowens
Copy link
Author

jkowens commented Dec 5, 2023

I added some specs for this change.

@jkowens
Copy link
Author

jkowens commented Dec 11, 2023

@abrom I would love to get your thoughts on this PR 😊

@abrom
Copy link
Contributor

abrom commented Dec 21, 2023

hey @jkowens, sorry I have been busy.

At first look this appears ok, but I do have some pretty serious concerns about what this might mean for system security. ie if someone was able to upload a file, then using the middleware convert the content of that file to a pdf, they could just upload a file with file:///etc/your/system/configs in it. Grover would subsequently have the contents of the system files returned as a PDF allowing for exfiltration of potentially sensitive files.

I get the use-case you're going for here, but it likely needs extra steps to be involved to isolate it from mechanisms that would allow exfil. i.e an opt-in setting that can't be set via any of the dynamic means, and at the same time makes it painfully clear that absolutely control over the file URI being rendered is required

Thoughts?

@jkowens
Copy link
Author

jkowens commented Dec 22, 2023

@abrom ah yes, I didn't consider the middleware usage. Would a config option to allow_file_url that is disabled by default be acceptable?

@jkowens
Copy link
Author

jkowens commented Dec 24, 2023

@abrom I tweaked the PR based on your feedback 👍

@abrom
Copy link
Contributor

abrom commented Mar 9, 2024

hey @jkowens I'm really sorry for the delay in getting back to this.. It has been a hectic few months for me. 😢

I think the changes you've made are headed in the right direction, but the risk of system exfiltration is still high as the middleware and this option will happily (but unsafely) coexist. I've created a commit to address this, but you appear to have opted to disable upstream developers from pushing to your fork. I'll attach them as a diff instead.

diff --git a/lib/grover.rb b/lib/grover.rb
index 13b71da..bce4f92 100644
--- a/lib/grover.rb
+++ b/lib/grover.rb
@@ -33,9 +33,9 @@ class Grover
   #   see https://github.com/puppeteer/puppeteer/blob/main/docs/api/puppeteer.pdfoptions.md
   #   and https://github.com/puppeteer/puppeteer/blob/main/docs/api/puppeteer.screenshotoptions.md
   #
-  def initialize(url, **options)
+  def initialize(url, middleware: false, **options)
     @url = url.to_s
-    @options = OptionsBuilder.new(options, @url)
+    @options = OptionsBuilder.new(options, @url, middleware: middleware)
     @root_path = @options.delete 'root_path'
     @front_cover_path = @options.delete 'front_cover_path'
     @back_cover_path = @options.delete 'back_cover_path'
diff --git a/lib/grover/errors.rb b/lib/grover/errors.rb
index 367c347..fe999a3 100644
--- a/lib/grover/errors.rb
+++ b/lib/grover/errors.rb
@@ -15,4 +15,5 @@ class Grover
       const_set name, Class.new(Error)
     end
   end
+  UnsafeConfigurationError = Class.new(Error)
 end
diff --git a/lib/grover/middleware.rb b/lib/grover/middleware.rb
index ec5782a..cbd7e99 100644
--- a/lib/grover/middleware.rb
+++ b/lib/grover/middleware.rb
@@ -111,7 +111,7 @@ class Grover
       body = body.join if body.is_a?(Array)
       body = HTMLPreprocessor.process body, root_url, protocol
 
-      options = { display_url: request_url }
+      options = { display_url: request_url, middleware: true }
       cookies = Rack::Utils.parse_cookies(env).map do |name, value|
         { name: name, value: Rack::Utils.escape(value), domain: env['HTTP_HOST'] }
       end
diff --git a/lib/grover/options_builder.rb b/lib/grover/options_builder.rb
index 3f84e7c..afdd0af 100644
--- a/lib/grover/options_builder.rb
+++ b/lib/grover/options_builder.rb
@@ -8,7 +8,7 @@ class Grover
   # Build options from Grover.configuration, meta_options, and passed-in options
   #
   class OptionsBuilder < Hash
-    def initialize(options, url)
+    def initialize(options, url, middleware:)
       super()
       @url = url
       combined = grover_configuration
@@ -16,6 +16,12 @@ class Grover
       Utils.deep_merge! combined, meta_options unless url_source?
 
       update OptionsFixer.new(combined).run
+
+      # The combination of middleware and allowing file URLs is exceptionally
+      # unsafe as it can lead to data exfiltration from the host system.
+      return unless middleware && (self['allow_file_url'] || self['allowFileUrl'])
+
+      raise UnsafeConfigurationError, 'using `allow_file_url` option with middleware is exceptionally unsafe'
     end
 
     private
diff --git a/spec/grover/middleware_spec.rb b/spec/grover/middleware_spec.rb
index 74b18f3..b6f360e 100644
--- a/spec/grover/middleware_spec.rb
+++ b/spec/grover/middleware_spec.rb
@@ -509,11 +509,14 @@ describe Grover::Middleware do
       it 'passes through the request URL (sans extension) to Grover' do
         allow(Grover).to(
           receive(:new).
-            with('Grover McGroveryface', display_url: 'http://www.example.org/test').
+            with('Grover McGroveryface', display_url: 'http://www.example.org/test', middleware: true).
             and_return(grover)
         )
         allow(grover).to receive(:to_pdf).with(no_args).and_return 'A converted PDF'
-        expect(Grover).to receive(:new).with('Grover McGroveryface', display_url: 'http://www.example.org/test')
+        expect(Grover).to(
+          receive(:new).
+            with('Grover McGroveryface', display_url: 'http://www.example.org/test', middleware: true)
+        )
         expect(grover).to receive(:to_pdf).with(no_args)
         get 'http://www.example.org/test.pdf'
         expect(last_response.body).to eq 'A converted PDF'
@@ -524,12 +527,14 @@ describe Grover::Middleware do
           allow(Grover).to receive(:new).with(
             'Grover McGroveryface',
             display_url: 'http://www.example.org/test',
+            middleware: true,
             cookies: [{ domain: 'www.example.org', name: k, value: v }]
           ).and_return(grover)
           allow(grover).to receive(:to_pdf).with(no_args).and_return 'A converted PDF'
           expect(Grover).to receive(:new).with(
             'Grover McGroveryface',
             display_url: 'http://www.example.org/test',
+            middleware: true,
             cookies: [{ domain: 'www.example.org', name: k, value: v }]
           )
           expect(grover).to receive(:to_pdf).with(no_args)
@@ -564,11 +569,14 @@ describe Grover::Middleware do
         it 'passes through the request URL (sans extension) to Grover' do
           allow(Grover).to(
             receive(:new).
-              with('Grover McGroveryface', display_url: 'http://www.example.org/test').
+              with('Grover McGroveryface', display_url: 'http://www.example.org/test', middleware: true).
               and_return(grover)
           )
           allow(grover).to receive(:to_png).with(no_args).and_return 'A converted PNG'
-          expect(Grover).to receive(:new).with('Grover McGroveryface', display_url: 'http://www.example.org/test')
+          expect(Grover).to(
+            receive(:new).
+              with('Grover McGroveryface', display_url: 'http://www.example.org/test', middleware: true)
+          )
           expect(grover).to receive(:to_png).with(no_args)
           get 'http://www.example.org/test.png'
           expect(last_response.body).to eq 'A converted PNG'
@@ -591,11 +599,14 @@ describe Grover::Middleware do
         it 'passes through the request URL (sans extension) to Grover' do
           allow(Grover).to(
             receive(:new).
-              with('Grover McGroveryface', display_url: 'http://www.example.org/test').
+              with('Grover McGroveryface', display_url: 'http://www.example.org/test', middleware: true).
               and_return(grover)
           )
           allow(grover).to receive(:to_jpeg).with(no_args).and_return 'A converted JPEG'
-          expect(Grover).to receive(:new).with('Grover McGroveryface', display_url: 'http://www.example.org/test')
+          expect(Grover).to(
+            receive(:new).
+              with('Grover McGroveryface', display_url: 'http://www.example.org/test', middleware: true)
+          )
           expect(grover).to receive(:to_jpeg).with(no_args)
           get 'http://www.example.org/test.jpeg'
           expect(last_response.body).to eq 'A converted JPEG'
diff --git a/spec/grover/options_builder_spec.rb b/spec/grover/options_builder_spec.rb
index 821fb46..40fcaba 100644
--- a/spec/grover/options_builder_spec.rb
+++ b/spec/grover/options_builder_spec.rb
@@ -4,11 +4,12 @@ require 'spec_helper'
 require 'grover/options_builder'
 
 describe Grover::OptionsBuilder do
-  subject(:built_options) { described_class.new(options, url_or_html) }
+  subject(:built_options) { described_class.new(options, url_or_html, middleware: middleware) }
 
   let(:url_or_html) { 'https://google.com' }
   let(:options) { {} }
   let(:global_config) { { cache: false, quality: 95 } }
+  let(:middleware) { false }
 
   before { allow(Grover.configuration).to receive(:options).and_return(global_config) }
 
@@ -157,4 +158,30 @@ describe Grover::OptionsBuilder do
 
     it { is_expected.to include('viewport' => { 'height' => 100, 'width' => 200, 'device_scale_factor' => 2.5 }) }
   end
+
+  context 'when the middleware flag is enabled' do
+    let(:middleware) { true }
+
+    it 'returns combined passed-in/global options' do
+      expect(built_options).to eq('cache' => false, 'quality' => 95)
+    end
+
+    context 'when providing `allow_file_url` option inline' do
+      let(:options) { { allow_file_url: true } }
+
+      it 'raises `Grover::UnsafeConfigurationError`' do
+        expect { built_options }.to raise_error Grover::UnsafeConfigurationError,
+                                                'using `allow_file_url` option with middleware is exceptionally unsafe'
+      end
+    end
+
+    context 'when providing `allow_file_url` option through global options' do
+      let(:global_config) { { allow_file_url: true } }
+
+      it 'raises `Grover::UnsafeConfigurationError`' do
+        expect { built_options }.to raise_error Grover::UnsafeConfigurationError,
+                                                'using `allow_file_url` option with middleware is exceptionally unsafe'
+      end
+    end
+  end
 end

@jkowens
Copy link
Author

jkowens commented Mar 18, 2024

Thanks for the patch @abrom! I've applied it and pushed up the commit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants