Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't use PHP stream wrappers #763

Open
NuclearDog opened this issue May 4, 2021 · 0 comments
Open

Can't use PHP stream wrappers #763

NuclearDog opened this issue May 4, 2021 · 0 comments

Comments

@NuclearDog
Copy link

(Before I get started--apologies for not using the issue template. I think most of the information is irrelevant to what I'm about to discuss and I didn't want to make this any longer than it already is, but absolutely feel free to tell me if you feel it's needed and I'll provide it.)

Per the issue title, I was trying to make some modifications to another project to allow storing/loading files in AWS S3 via the stream wrapper the AWS SDK provides. Everything more or less went well, but the project uses Imagine for thumbnailing images and it ended up being the bulk of the work.

I understand the core of the issue is that the external libraries (imagick, gd, etc) don't support stream wrappers, but I think there are some relatively simple workarounds in most cases and most of the functionality already exists in the remote file handling, we just need to glue it together in a bit of a different way. I did also run into one specific issue with Imagine itself that makes working with stream wrappers impossible that needed to be addressed as well.

The stream wrappers are kind of a tricky situation for a project like Imagine since they end up straddling the line between local and remote. In any context where we're using PHP's file I/O we need to treat them as local, but in any context where we're passing them through to external libraries we need to treat them as remote.

Below I've written out the quick fix I implemented just as a proof-of-concept to see if using this stream wrapper was even feasible in the project I was looking at (not clean or generalized, just to provide some examples and context) as well as tried to outline best I can how I could see this being implemented in a proper way with a goal in mind of making Imagine able to transparently work with stream wrappers without requiring changes in how the relying application uses Imagine. My hope is someone can provide some guidance and an indication that you'd be amenable to merging changes like this, and then I could start working on the changes and submit a PR.

Proof of Concept

There were really only two major points I had to address to get this working:

  1. File\Loader: It currently treats any URL-like path as something to be passed to cURL. This excludes using any stream wrapper.
  2. Each driver's Imagine::open: We need to treat stream-wrapped paths as remote.
  3. Each driver's Image::save: We need to add handling for stream-wrapped paths, basically just moving from using library I/O to PHP's I/O functions.

I only implemented this in imagick for my testing. I basically just tried to do this by writing the minimum code possible to make it work, so this is far from optimal. A better solution is discussed below.

Loader

Right now, in __construct, it's detecting anything that parses as a valid URL as "remote". Anything remote is being passed through to cURL if it's available.

What I did was added an additional check on anything that appeared to be a URL to check whether the protocol matched any registered stream wrappers (stream_get_wrappers()) and was not http/https. If so, it unsets isUrl and sets the file as a "stream wrapper" (with an associated protected property + getter).

This makes the loader basically work as expected, at least internally, using PHP's I/O by way of the readLocalFile function.

if ($this->isUrl)
{
	$protocol = substr($this->path, 0, strpos($this->path, ':'));
	if (in_array($protocol, stream_get_wrappers()) && !in_array($protocol, ['http', 'https']))
	{
		$this->isUrl = false;
		$this->isStreamWrapper = true;
	}
}

Driver: Open

In a few places, Imagine decides whether to pass the file path directly to imagick or to rely on the loader to fetch the data and pass it through as a blob based on whether it's local. I updated the open() method to check whether the file was local and not a stream wrapper path such that it would treat stream wrapper paths as "remote" in this context and use the loader to fetch the data which it passed through to imagick.

Driver: Save

This is obviously going to be a little trickier as the image lacks the loader context at save time, but as a non-general solution, a similar tactic works for saving. Instead of asking imagick to write the image directly to the path, I just grabbed the image blob and used file_put_contents to write it out. i.e.,

try {
	$this->prepareOutput($options, $path);
	file_put_contents($path, $this->imagick->getImagesBlob());
	//$this->imagick->writeImages($path, true);
} catch (\ImagickException $e) {
	throw new RuntimeException('Save operation failed', $e->getCode(), $e);
}

This is obviously non-optimal in cases where we don't need the extra step as anyone working with large images is going to see their memory requirements go way up. This could also be implemented as a save out to a temporary file and then a copy(), requiring a little more complexity but allowing handling large files via stream wrappers as well.

Proposal

At a high level, I think what would make sense is to:

  1. Update loader to treat any protocol which cURL doesn't support and is a registered stream wrapper as a "stream wrapper" path.
  2. Update each driver's Imagine::open method to handle these files correctly depending on context.
  3. Update the Image class to hold a reference to the loader used to create it where appropriate
  4. Update the Image::save method to use the Loader class to parse destination file paths to handle stream wrapper paths appropriately

Detailed below...

Loader

When do we consider a path a stream wrapper path

The first issue to tackle is not having Loader treat any URL-like path as something to be passed to cURL. That means we need to define what paths are going to be handled as "stream wrapper" paths.

I think something along the lines of the quick-fix I implemented above is the right path. Basically:

  1. Check if a path resembles a URL
  2. Check if the protocol is not supported by cURL (which, holy cow does cURL support a lot of protocols)
  3. Check if the protocol is in the list of registered stream wrappers
  4. If all of the above, it's a stream wrapper path

From a backward-compatibility standpoint, the only situation where anyone would see a behaviour change is if they were (1) passing a URL into Imagine that (2) was not supported by cURL and (3) is supported by one of their stream wrappers and (4) expecting to get an error back. I think this is acceptable.

By explicitly directing all cURL-supported protocols to cURL we would not allow users to use stream wrappers for any protocol that cURL supports, however there's no regression from the current behaviour as right now all stream wrappers are unavailable. I don't see any way this would negatively impact adding this support in the future.

There are some security considerations around this because PHP does offer some stream wrappers like expect which, for instance, allow running arbitrary programs. That said, in order for this to become an actual vulnerability it would require that the user had that extension installed (not in the default distribution, installed via PECL) and was passing untrusted paths directly into Imagine. I'd argue that anyone passing untrusted paths (including protocol) directly into imagine or relying on a blacklist of disallowed protocols already had a security issue. Certainly PHP itself makes no effort to protect users from this with any of the file I/O functions, and I think it's fair for Imagine to follow that lead. I'm open to further discussion on this if anyone thinks there are steps that should be taken to mitigate it. The obvious solution I could see would be a configurable whitelist, but my goal is to add support transparently with no changes in the relying application.

What do we do with stream wrapper paths

The smallest possible change we could make here is to track whether a path uses a stream wrapper internally to the loader and expose it as a "remote" path in the context of isLocalFile. For all current cases, this creates the desired behaviour: Loader can treat the file as "local" for the purposes of getData, reading it with file_get_contents while all the drivers and EXIF reader will treat it as remote and fall back to using Loader::getData for reading.

My preference here just from an API design standpoint would be to not rely on any magic and instead expose and implement this case specifically, though I don't see that it's strictly necessary with the current drivers. This does allow for doing the right thing later if any driver does support stream wrappers natively. In that case, we'd add a new protected property for stream wrapper paths (isUrl, isStreamWrapper), a public getter for the isStreamWrapper property, and update isLocalFile to not treat stream wrapper paths as local.

The downside to this would be we'd be changing the LoaderInterface in a backward-incompatible way. Not sure if you guys see that as a big issue. Alternatively, we could:

  • Add an additional interface to specify the new method (StreamWrapperAwareInterface?), implement that on the default loader, and check for implementation before calling isStreamWrapper().
  • Just not expose the information ("smallest possible change"). Not exposing this information definitely complicates the save() logic, as all we know is that the file is not local, but not whether it's via stream wrapper or cURL. Given there's currently only 3 drivers I don't think it's a huge issue to have a bit of this logic leak out there though obviously not my favourite solution.

Imagine::open

I would review every driver's open() method to ensure it handled stream wrapper paths correctly and explicitly. In the case of gd this is already the case (it always uses Loader::getData), for imagick/gmagick, this is just adding a check for stream wrapper paths to use the "remote" code path in that case.

Image::save

This is where we get into some slightly bigger changes and I'd definitely appreciate some input on how you guys would want this done.

The Image class has no context from the loader, which creates two issues:

  1. Determining which paths are stream wrapper paths
  2. Calling save() without passing a path, expecting it to write back to wherever it was loaded from

Obviously we don't want to reimplement the "is this a stream wrapper path?" logic in every driver.

And as far as #2 it looks like the drivers largely attempt to determine the file that was originally opened by interrogating the underlying graphics library. For files loaded from stream wrappers where we've instead used PHP's I/O and passed in a blob, that data won't be available.

I think the most natural solution to both of these problems would be for each driver's Imagine to pass a reference to the Loader (where available) through to the Image class on instantiation and have the image class keep that reference. At that point, each Image::save could be updated to rely on the Loader for path handling and interrogation. That is:

  1. If an explicit path is passed in, create a Loader instance and check whether it's a stream wrapper or local file and handle appropriately.
  2. If no path is passed in, see if there is an instance of the Loader available and use that path
  3. Otherwise, throw exception

I'm not sure if this could introduce any change in behaviour by using the path passed in rather than the one the underlying library returns. Logically, these two should be the same thing. If we wanted to ensure no potential for backward compatibility issues here we could continue to check the underlying library/metadata between steps #1 and #2 and only fall back to using the loader instance when there's no path otherwise available.

Test Cases

This is definitely the area I put the least investigation into so far. I'll look at implementing a sample stream wrapper that simply provides a way to read/write arbitrary file paths a la file:// and adding tests to validate load, save with explicit path, and save without explicit path via that stream wrapper.

Fin

Sorry for the literal novel. I'm interested in making these changes, I just wanted to get some validation on the changes I'm about to make first and make sure I'm not going down a path that's obviously wrong or that is contrary to how you guys see the architecture/progression of the library.

Happy to answer any questions and clarify anything necessary. Let me know if this sounds good and I'll get hacking away. :)

Thanks for your time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant