Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crawling from WP CLI - Mangled URL - Uncaught InvalidArgumentException: Unable to parse URI #908

Open
3 tasks done
stellarpower opened this issue Nov 4, 2023 · 3 comments

Comments

@stellarpower
Copy link

Before creating an issue / filing a support request

  • try to troubleshoot the issue yourself (see Troubleshooting guide)
  • prepare as much information as possible to help the developer
  • Identify the issue as likely: Theme / Plugin / Environment or WP2Static bug

Determining if it's an issue with Theme, Plugin, Environment or a bug in WP2Static

This is a difference in behaviour between the CLI and web interfaces to WP2Static. So, even if there are issues elsewhere, I believe this is most appropriate as a bug against WP2Static itself for the time being.

Describe the bug
So far, WP2Static has worked without a hitch. I have installed from a ZIP file and only ever processed form the web UI.

This site is hosted on a local machin in a contianer; after exporting, the static site is sent up into the cloud for live hosting. _In the settings, I set the "Deployment URL" to be simply /; this has allowed maximal flexibility with hosting downstream, where the static site can be viewed under multiple subdomains without any problems. The export process from the web UI works fine with this.

I took some time today to have a play kicking off the process programmatically using the wp CLI tool. If I begin the export this way (wp wp2static crawl), from the logs it fetches all the pages okay, and then after this, at some point it seems an invalid URL replacement is being performed - or in some other manner, a totally mangled URL comes out. Then, this is throwing an exception and I get a backtrace in the logs.

If I then proceed to generate the export again from the web UI, I get a 500 message back from the browser same as this one

If I delete the plugin, and re-upload from a zip, to nuke my settings (can I do this a faster way, BTW?); go back and change my settings, then we are back to normal. Given the documentation seems to be a little out of date, it's possible I am not using the CLI tool properly. Ideally I'd like it to kick off a job with the exact same settings as currently configured in the UI; but perhaps I need to give it some more options. Otherwise, this seems to suggest that the CLI tool is missing or adding a step that mutates the state in the settings, and so then web-based calls are failing too.

To Reproduce
Steps to reproduce the behavior:

  • Remove and re-install the WP2Static plugin (7.2 zip upload)
  • Exort okay from the web UI.
  • wp wp2static crawl

Environment (please complete the following information):

  • Hosting OS: Linux
  • Web server setup: container (image)
  • Hosting company: local installation.

The website is behind a reverse-proxy using a self-signed certificate. The reverse proxy only serves TLS; it communicates with the WordPress unencrypted only. The instance is externally visible on a non-standard port.

Log files (please complete the following information):

[04-Nov-2023 02:55:56 UTC] PHP Fatal error:  Uncaught InvalidArgumentException: Unable to parse URI: https://machine.domain:888http/machine.domain:888/wp-content/et-cache/1010/et-core-unified-1010.min.css in /var/www/html/sitename/wp-content/plugins/wp2static/vendor/leonstafford/wp2staticpsr7/src/Uri.php:72
Stack trace:
#0 /var/www/html/sitename/wp-content/plugins/wp2static/vendor/leonstafford/wp2staticpsr7/src/Request.php(42): WP2StaticGuzzleHttp\Psr7\Uri->__construct()
#1 /var/www/html/sitename/wp-content/plugins/wp2static/src/Crawler.php(136): WP2StaticGuzzleHttp\Psr7\Request->__construct()
#2 /var/www/html/sitename/wp-content/plugins/wp2static/vendor/leonstafford/wp2staticguzzle/src/Pool.php(56): WP2Static\Crawler->WP2Static\{closure}()
#3 [internal function]: WP2StaticGuzzleHttp\Pool::WP2StaticGuzzleHttp\{closure}()
#4 /var/www/html/sitename/wp-content/plugins/wp2static/vendor/leonstafford/wp2staticpromises/src/EachPromise.php(212): Generator->next()
#5 / in /var/www/html/sitename/wp-content/plugins/wp2static/vendor/leonstafford/wp2staticpsr7/src/Uri.php on line 72
[2023-11-04T02:12:43+00:00] Starting crawling
[2023-11-04T02:12:43+00:00] Using basic auth credentials to crawl
[2023-11-04T02:12:43+00:00] Starting to crawl detected URLs.
[2023-11-04T02:12:43+00:00] Using CrawlCache.
[2023-11-04T02:13:21+00:00] Crawling progress: 300 crawled, 300 skipped (cached).
[2023-11-04T02:13:25+00:00] Crawling progress: 600 crawled, 600 skipped (cached).
[2023-11-04T02:13:29+00:00] Crawling progress: 900 crawled, 900 skipped (cached).
[2023-11-04T02:13:32+00:00] Crawling progress: 1200 crawled, 1200 skipped (cached).
[2023-11-04T02:13:45+00:00] Crawling progress: 1500 crawled, 1500 skipped (cached).
[2023-11-04T02:13:51+00:00] Crawling progress: 1800 crawled, 1800 skipped (cached).
[2023-11-04T02:13:54+00:00] Crawling progress: 2100 crawled, 2100 skipped (cached).
[2023-11-04T02:13:58+00:00] Crawling progress: 2400 crawled, 2400 skipped (cached).
[2023-11-04T02:14:01+00:00] Crawling progress: 2700 crawled, 2700 skipped (cached).
[2023-11-04T02:14:05+00:00] Crawling progress: 3000 crawled, 3000 skipped (cached).
[2023-11-04T02:14:09+00:00] Crawling progress: 3300 crawled, 3300 skipped (cached).
[2023-11-04T02:14:12+00:00] Crawling progress: 3600 crawled, 3600 skipped (cached).
[2023-11-04T02:14:17+00:00] Crawling progress: 3900 crawled, 3900 skipped (cached).
[2023-11-04T02:14:22+00:00] Crawling progress: 4200 crawled, 4200 skipped (cached).
[2023-11-04T02:14:25+00:00] Crawling progress: 4500 crawled, 4500 skipped (cached).
[2023-11-04T02:14:28+00:00] Crawling progress: 4800 crawled, 4800 skipped (cached).
[2023-11-04T02:14:32+00:00] Crawling progress: 5100 crawled, 5100 skipped (cached).
@stellarpower
Copy link
Author

Logging this as an error rather than failing here

@patrickdk77
Copy link
Contributor

The issue is, your site is configured for https, but some things are returning http urls, and wp2static doesn't respect that http and https should be respected as the same, so http://example.com != https://example.com and causes this error

@stellarpower
Copy link
Author

That makes sense I guess. I don't know where I would find the plugin that is the culprit, the thing is behind a reverse proxy so it all should be pointing to the ingress and never use unencrypted.

How come it's bunged on the end of the URL though? From memory I thought I added some logging and that literal URL was what was trying to be parsed by Guzzler.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants