Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PHP: Support http:// https:// and ssl:// stream wrappers #1093

Open
wants to merge 3 commits into
base: trunk
Choose a base branch
from

Conversation

adamziel
Copy link
Collaborator

@adamziel adamziel commented Mar 7, 2024

What is this PR doing?

Intercepts all network traffic coming from PHP and handles http:// and https:// requests using fetch(). This enables using the native networking features of PHP without implementing custom transport classes.

🚧 Work in progress – this PR needs discussion and cleaning up before it can be shipped 🚧

How is it implemented?

Emscripten can be configured to stream all network traffic through a WebSocket. @php-wasm/node and wp-now use that to access the internet via a local WebSocket->TCP proxy, but the in-browser version of WordPress Playground exposes no such proxy.

This PR ships a "fake" WebSocket implementation that doesn't initiate any ws:// connection. Instead, it mocks the WebSocket interface and analyzes the connection address and transmitted bytes to infer a corresponding fetch() call.

In case of HTTP, it parses the request text, extracts the method, path, and headers (body TBD), and feeds that information to a fetch() call. Then, as the response status, headers, and the data stream comes in, it rewrites it as raw bytes and pretends to emit them as incoming WebSocket data.

In case of HTTPS, it uses the node-forge package to start a HTTPS server with a self-signed certificate that is also added to PHP CA store via the openssl.cafile PHP.ini setting. The outbound traffic is piped to node-forge which handles the SSL handshake, the encryption, and yields unencrypted HTTP request bytes. They are treated the same as in the previous paragraph, and then the response is piped back to node-forge for encryption and then, finally, piped back to PHP.

The current implementation is naive and optimistically assumes the data is intended for either HTTP or HTTPS use.

Upsides

  • file_get_contents(), fopen(), fsockopen() and other functions using stream wrappers will now work with http://, https://, and ssl:// URLs!
  • libcurl can be now be supported without any customizations as the network traffic handler doesn't care about the library that initiated the connection.

Downsides

  • node-forge doesn't seem to work well with the latest TLS so we'd have to force-downgrade it in PHP. It doesn't seem like a big deal in the browser because all SSL security is provided by fetch(), but it's something to note. This approach is , of course, unsuitable and unnecessary outside of web browsers.
  • Raw sockets and CORS-less URLs still can't be supported this way.
  • I'm not sure whether we can reliably distinguish between HTTP / HTTPS traffic and a byte transmission via fsockopen('ssl://somesite.com'). Perhaps not. In this case we could ship a naive heuristic that would cover the majority of cases, and then patch PHP to provide an explicit flag "I'm about to request something via HTTPS".
  • node-forge is slow and the certificates are generated synchronously. It would be far better to use the browser-native asynchronous API like crypto.subtle.generateKey().
  • HTTPS adds noticeable overhead. Sticking to the FetchTransport in WordPress sounds like a good idea since it triggers fetch() directly and without an encryption layer.
  • I think each domain requires a separate cert, which means a bit of slowness added to the first request to every domain. That sounds like a fair trade-off for providing the much-requested networking support.

Remaining work

  • Discuss the approach
  • Clean up the code and the import structure
  • Remove as much synchronous processing as possible
  • Generate SSL certificates lazily

Testing Instructions

That Blueprint above runs the following PHP code:

<?php
echo 'Hello-dolly.zip downloaded from https://downloads.wordpress.org/plugin/hello-dolly.1.7.3.zip has this many bytes: ';
var_dump(strlen(file_get_contents('https://downloads.wordpress.org/plugin/hello-dolly.1.7.3.zip')));

Related to #724

cc @dmsnell @bgrgicak @brandonpayton @ThomasTheDane

Copy link
Collaborator

@bgrgicak bgrgicak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's great to see progress on this 🚀

Should this eventually become a separate package?

A naive question. Is HTTPS necessary or could we just use HTTP internally?

@bgrgicak
Copy link
Collaborator

bgrgicak commented Mar 8, 2024

@adamziel When I opened the test link (on Linux) I got back an error Hello-dolly.zip downloaded from https://downloads.wordpress.org/plugin/hello-dolly.1.7.3.zip has this many bytes: int(1887).

@adamziel
Copy link
Collaborator Author

adamziel commented Mar 8, 2024

That's the expected output and not an error :-)

@adamziel
Copy link
Collaborator Author

adamziel commented Mar 8, 2024

A naive question. Is HTTPS necessary or could we just use HTTP internally?

How would that work?

@bgrgicak
Copy link
Collaborator

bgrgicak commented Mar 8, 2024

That's the expected output and not an error :-)

I should read all the testing instructions before I comment 😕

@bgrgicak
Copy link
Collaborator

bgrgicak commented Mar 8, 2024

How would that work?

I imagined that we could just send these requests without having an SSL server.
WordPress and CURL requests can be configured to ignore SSL certificates.
But that's probably a bad idea.

Copy link
Collaborator

@dmsnell dmsnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like this is still early on in its development?

were there any specific things you'd like review on?

@@ -44,6 +48,560 @@ const fakeWebsocket = () => {
};
};

async function generateKeys() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this and createCertificate() would both be very good functions to document, specifically why they exist, what kind of keys and certificates they generate, and why

},
{
name: 'localityName',
value: 'Blacksburg',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we probably don't need all of these attributes do we? if not, maybe we should have a good explanation for why we're using the very specific values we are. we could probably make up a virtual/imaginary Playground land in WordPressWorld so that if someone were to examine the certificates it would be clearer that these are for a virtual system and not supposed to be actual legitimate signed certificates

sslCA: true,
emailCA: true,
objCA: true,
},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we surely don't need all these capabilities in the certificate do we? it shouldn't hurt anything, but do we expect, for example, the Playground to sign email signatures?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, most of these should be fine to remove

10
);

const ws = this;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copy/paste or AI code?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick&dirty prototype, ws is used in a function in line 316 below where this refers to a different object.

@adamziel
Copy link
Collaborator Author

were there any specific things you'd like review on?

Mostly the general idea and approach in case there was something off with it.

@dmsnell
Copy link
Collaborator

dmsnell commented Mar 25, 2024

Summarizing a conversation @adamziel and I had: this seems like a fine approach, though I think it's a long way away from where it needs to be with naming and documentation.

This is essentially a TLS proxy standing as a middleman between the PHP code and JS code. What it isn't is writing our own security layer, and this is because ultimately all TLS connections on the JS side will be enforced with the security mechanisms that the browser provides.

Proxying through fetch() is challenging because it requires that we grab the data out of the encrypted stream and present it to PHP in a new encrypted stream, acting like it was never translated, thus the middleman.

It looks like we can provide a ReadableStream to fetch as the body of the network request so I think it's even possible to go beyond what's here and provide a generalized socket interface into PHP. I was on my way to testing that out when I ran out of time. There's still the challenge of detecting when a TLS handshake is beginning, but maybe that's not too hard to do.

@adamziel adamziel added this to the PHP Feature Parity milestone Apr 4, 2024
adamziel added a commit that referenced this pull request Apr 29, 2024
Ships the Node.js version of PHP built with `--with-libcurl` option to support the curl extension.

It also changes two nuances in the overall PHP build process:

* It replaces the `select(2)` function using `-Wl,--wrap=select` emcc
option instead of patching PHP source code – this enables supporting
asynchronous `select(2)` in curl without additional patches.
* Brings the `__wrap_select` implementation more in line with
`select(2)`, add support for `POLLERR`.
* Adds support for polling file descriptors that represent neither child
processes nor streams in `poll(2)` – that's because `libcurl` polls
`/dev/urandom`.

Builds on top of and supersedes
#1133

## Debugging Asyncify problems

The [typical way of resolving Asyncify
crashes](https://wordpress.github.io/wordpress-playground/architecture/wasm-asyncify/)
didn't work during the work on this PR. Functions didn't come up in the
error messages and even raw stack traces. The reasons are unclear.

[The JSPI build of
PHP](#1339) was
more helpful as it enabled logging the current stack trace in all the
asynchronous calls, which quickly revealed all the missing
`ASYNCIFY_ONLY` functions. This is the way to debug any future issues
until we fully migrate to JSPI.

## Testing Instructions

Confirm the CI checks pass. This PR ships a few new tests specifically
targeting networking with curl.


## Related resources

* #85
* #1093

---------

Co-authored-by: Adam Zieliński <adam@adamziel.com>
Co-authored-by: MHO <yannick@chillpills.io>
adamziel added a commit that referenced this pull request May 15, 2024
…1389)

Aligns the boot process between the in-browser Playground Remote and
Node-oriented Playground CLI.

With this PR, both apps use a similar `createPHP()` function that:

* Sets up the SAPI name
* Sets up PHP ini entries
* Sets up the `/phpinfo.php` route
* Sets up platform-level mu0plugins
* Proxies filesystem directories from secondary PHP instances to primary
* Sets up PHP runtime rotation to avoid OOM errors in long-running
primary processes

There are still the following discrepancies:

* The in-browser PHP sets up a SAPI name conditionally, Node.js one
always uses `cli` (it probably shouldn't)
* The in-browser PHP uses a custom spawn handler
* The in-browser PHP uses a different set of php.ini directives
* The in-browser PHP loads more mu-plugins
* The Node.js PHP sets up CA certificates for HTTPS connections (the
in-browser PHP [will fake the CA chain
eventually](#1093))

This is the first step towards a consistent Boot Protocol, see
#1379 for
more details.

## Testing Instructions

* Confirm the CI checks work
* Run `bun packages/playground/cli/src/cli.ts server --login`, confirm
the server starts without issues, test wp-admin and HTTPS-reliant
features like the plugin directories.

We'll need a set of unit tests for these new boot-related features,
let's create them sooner than later.
adamziel added a commit that referenced this pull request May 15, 2024
…1389)

Aligns the boot process between the in-browser Playground Remote and
Node-oriented Playground CLI.

With this PR, both apps use a similar `createPHP()` function that:

* Sets up the SAPI name
* Sets up PHP ini entries
* Sets up the `/phpinfo.php` route
* Sets up platform-level mu0plugins
* Proxies filesystem directories from secondary PHP instances to primary
* Sets up PHP runtime rotation to avoid OOM errors in long-running
primary processes

There are still the following discrepancies:

* The in-browser PHP sets up a SAPI name conditionally, Node.js one
always uses `cli` (it probably shouldn't)
* The in-browser PHP uses a custom spawn handler
* The in-browser PHP uses a different set of php.ini directives
* The in-browser PHP loads more mu-plugins
* The Node.js PHP sets up CA certificates for HTTPS connections (the
in-browser PHP [will fake the CA chain
eventually](#1093))

This is the first step towards a consistent Boot Protocol, see
#1379 for
more details.

## Testing Instructions

* Confirm the CI checks work
* Run `bun packages/playground/cli/src/cli.ts server --login`, confirm
the server starts without issues, test wp-admin and HTTPS-reliant
features like the plugin directories.

We'll need a set of unit tests for these new boot-related features,
let's create them sooner than later.
@adamziel adamziel mentioned this pull request May 17, 2024
28 tasks
@adamziel
Copy link
Collaborator Author

node-forge is quite slow when it comes to certificate generation, perhaps https://pkijs.org/docs/examples/certificates-and-revocation/create-and-validate-certificate would be faster

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

None yet

3 participants