Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Authorization Failing // Truncate Referrer URLs #418

Closed
superbuggy opened this issue Jun 24, 2019 · 7 comments · Fixed by #431
Closed

Authorization Failing // Truncate Referrer URLs #418

superbuggy opened this issue Jun 24, 2019 · 7 comments · Fixed by #431
Assignees
Labels
type:bug Something isn't working.
Milestone

Comments

@superbuggy
Copy link

Describe the bug

Distributor generates some large URLs that include potentially long lists of posts in the URL's query string. WP includes the entire URL as Referrer in the header for outbound requests. With long enough URLs (~8k+ UTF-8 chars), this will invalidate headers with servers that are configured with an 8kb size limit for headers, as it did in our case (wamu.org). The headers get garbled, and since they authorization data, this ultimately causes authorization to fail.

Steps to Reproduce

  1. Log into a WP site with Application Passwords and Distributor installed.
  2. Click Pull Content.
  3. Choose an external connection configured with a user with creds generated by Application Password from the Pull Content from dropdown.
  4. Select a post type from the dropdown menu next to the filter button, where the post type is one that has been pulled from frequently (this is important because I think it affects the URL size, specifically a list of posts to exclude in the query string).
  5. Click filter.
  6. The error message Could not pull content from connection due to error.

Expected behavior
Headers should not have the full URL so they don't get garbled.

Screenshots

Environment information

  • Device: MacBook Pro (13-inch, 2018, Four Thunderbolt 3 Ports)
  • OS: MacOS 10.14.3
  • Browser and version: Chrome 74.0.3729.169
  • Distributor version: 1.4.1
  • Theme and version: 3.0.28
  • Other installed plugin(s) and version(s):

Additional context

Proposed Solution: Truncate the referrer URL...

add_action('http_api_curl', function($handle, $request_args, $url) {
    if (strpos($url, "wp-json/wp/v2/wamu_story") !== false || strpos($url, "wp-json/wp/v2/posts") !== false) {
        curl_setopt($handle, CURLOPT_REFERER, site_url());
    }
}, 10, 3);
@superbuggy superbuggy added the type:bug Something isn't working. label Jun 24, 2019
@jeffpaul
Copy link
Member

@superbuggy welcome to Distributor and thanks for the feedback! I'm tagging this for review during our upcoming 1.5.0 release, so stay tuned for an update then.

@dkotter any chance your improvements to the Pull Content page resolve this issue?

@jeffpaul jeffpaul added this to the 1.5.0 milestone Jun 24, 2019
@helen
Copy link
Contributor

helen commented Jul 11, 2019

Thanks @superbuggy - I think it definitely makes sense to truncate the referer/referrer URL for now, am looking into whether it's better to cut all the way back to site URL or use the current admin page URL. I also would love your opinion on whether long-term it would be better to stop shoving excluded posts into the the query string (leaving the actual functionality intact) and refer to a stored option instead.

@helen
Copy link
Contributor

helen commented Jul 16, 2019

After digging into this some more with @dkotter's help, I can trigger this easily by setting $this->sync_log = range( 0, 400 ) in PullListTable::prepare_items(). We do need to fix this, as it's really easy to enter into the several hundreds of posts pulled, and if we keep going this way we are going to trigger a fatal with post max vars. I don't want to drop this variable entirely, as then already-pulled posts won't be excluded, but I don't really see a perfect fix. There are two things we could try doing here that I think are as good as it gets:

  1. For 1.5, truncate $sync_log in the above method to some arbitrary number (maybe 200?) so that hopefully at least the first several pages of content to pull are still accurate, and then anything past that gets grayed out in the list table (comparing against the local sync log) so that pagination is still correct. Otherwise you run the risk of having a page of noticeably fewer items than the posts per page, or even none at all, which would surely lead to user confusion. I think it's a reasonable UX given that we are talking about going pretty far back in pagination in the default state - search results would bring up grayed out rows earlier, but so long as we're not making them actually available for pull I think it's okay.

  2. For a future release (1.6 or 2.0 maybe), see about storing the sync log on the remote site (if we're not already?) for authenticated connections and using that for returning results instead. I guess we'd have to store it in options with a hash of the remote site URL or something. For unauthenticated connections, we'd still fall back to the original fix.

Any thoughts/objections? Working on a PR now.

@helen
Copy link
Contributor

helen commented Jul 17, 2019

@superbuggy I've opened #431 to address this, if you're able to take a look and give an opinion on the approach and/or the UI results and how that impacts you.

@helen
Copy link
Contributor

helen commented Jul 18, 2019

Coming back with more due diligence before I merge #431 in - with @cmmarslender on the assist, it looks to be a 414 Request-URI Too Large issue, not a header size issue. Neither of us could reproduce any headers being sent besides the auth one. I am still happy to help chase down referer issues if you want to connect directly (Slack, or what have you) about specific errors you've seen on your end, but for now I think my PR is indeed the correct first phase fix.

@superbuggy
Copy link
Author

superbuggy commented Jul 22, 2019

Sounds good! In our case, I just added in the code block above to functions.php for our main theme as a hotfix. I'm primarily a JS dev and am newer to WP, so I don't yet have a detailed, low-level grasp on how WP uses referrer URLs but that 414 issue makes total sense given my comprehension of the issue we encountered. As an additional measure, we ended up just changing our NGINX config to allow for headers to be up 32k.

Story the post exclusion list on the remote makes sense to me as well. Feels an intuitive that a site should keep track of which posts it has already pulled in. This could be stored in the options table, maybe?

@superbuggy
Copy link
Author

Thank you for your responses!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:bug Something isn't working.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants