Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Urlencode in parse? #63

Open
DanAlexson90 opened this issue Feb 20, 2021 · 1 comment
Open

Urlencode in parse? #63

DanAlexson90 opened this issue Feb 20, 2021 · 1 comment

Comments

@DanAlexson90
Copy link

Something not mentioned in the official documentation:

https://sabre.io/uri/usage/

but only in the source code:

https://github.com/sabre-io/uri/blob/master/lib/functions.php

is this piece:

// Normally a URI must be ASCII, however. However, often it's not and
// parse_url might corrupt these strings.
//
// For that reason we take any non-ascii characters from the uri and
// uriencode them first.
$uri = preg_replace_callback(
    '/[^[:ascii:]]/u',
    function ($matches) {
        return rawurlencode($matches[0]);
    },
    $uri
);

Urlencoding is NOT appropriate for domain name / FQDN!

For example, these IRIs:

should NOT be parsed to:

array (size=7)
  'scheme' => string 'https' (length=5)
  'host' => string '%D0%BF%D1%80%D0%B8%D0%BC%D0%B5%D1%80.%D1%80%D1%84' (length=49)
  'path' => string '/' (length=1)
  ...

array (size=7)
  'scheme' => string 'https' (length=5)
  'host' => string '%D0%BF%D1%80%D0%B8%D0%BA%D0%BB%D0%B0%D0%B4.%D1%83%D0%BA%D1%80' (length=61)
  'path' => string '/' (length=1)
  ...

array (size=7)
  'scheme' => string 'https' (length=5)
  'host' => string '%CF%80%CE%B1%CF%81%CE%AC%CE%B4%CE%B5%CE%B9%CE%B3%CE%BC%CE%B1.%CE%B5%CE%BB' (length=73)
  'path' => string '/' (length=1)
  ...

Instead they should be parsed to:

array (size=7)
  'scheme' => string 'https' (length=5)
  'host' => string 'пример.рф' (length=17)
  'path' => string '/' (length=1)
  ...

array (size=7)
  'scheme' => string 'https' (length=5)
  'host' => string 'приклад.укр' (length=21)
  'path' => string '/' (length=1)
  ...

array (size=7)
  'scheme' => string 'https' (length=5)
  'host' => string 'παράδειγμα.ελ' (length=25)
  'path' => string '/' (length=1)
  ...
@evert
Copy link
Member

evert commented Feb 20, 2021

This is a valid concern. I would say that IRI domains are not supported at all at the moment. I never really dove into this enough to truly understand what's needed. The punycode representation will ofc work.

So the parser is really only for RFC3986 URIs.

I would guess you would want to automatically turn the international domains into punycode on parse, and not just retain it (just like the Javascript APIs).

Unfortunately I'm no longer really active on this project, so this feature would have to be contributed by someone else.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants