a_b_c.domain.com — Neither domain, nor publicSuffix? (but valid) #73

ikari-pl · 2015-12-07T15:06:30Z

The URL http://wsc4_1.webspectator.com/ is returning null for both getDomain and getPublicSuffix. I can't even find webspectator.com on public suffix list, so I assume the corect result would be webspectator.com for domain and com for public suffix.

Demo:

var tld = require('tldjs');
tld.getDomain('http://wsc4_1.webspectator.com/'); // null
tld.getDomain('wsc4_1.webspectator.com'); // null
tld.getPublicSuffix('http://wsc4_1.webspectator.com/'); // null
tld.isValid('http://wsc4_1.webspectator.com/'); // true

but:

> tld.getDomain('wsc41.webspectator.com')
'webspectator.com'

So it seems it's all about the _ character.
See:

> tld.getDomain('a_b.google.com')
null
> tld.getDomain('a-b.google.com')
'google.com'

The text was updated successfully, but these errors were encountered:

ZLightning · 2016-05-06T04:08:05Z

Technically host names containing an underscore are not RFC compliant (only A-Z, a-z, 0-9, -, and . are allowed), however a newer RFC notes that a DNS server can be used to serve arbitrary data, and no DNS server should refuse to load a zone that contains invalid characters in host names.

thom4parisot · 2016-09-13T17:29:38Z

Yes indeed it is tight to the character _.

@ZLightning do you have a link towards the new RFC change?

A possibility could be to have a strict mode or not (I guess, disabled by default) in order to properly extract domains and such. For cookie creation, we might want to stick to the RFC compliant mode but that's something to discuss later on.

What do you think folks?

ZLightning · 2016-09-14T14:40:49Z

RFC2181 is only a proposed standard, but I have confirmed subdomains with an _ in them still resolve. I think a strict and sloppy mode would be a great feature. The default being strict is a good idea for backwards compatibility.

nebulade · 2017-11-01T22:07:57Z

Is there any update on that, as I also just hit that unfortunately.

LesBarstow · 2018-04-26T21:41:38Z

Note if anyone's still following this: HOSTNAMES cannot contain underscores, but other DNS entries can. e.g. _spf.google.com is a valid DNS name.

$ dig +short TXT _spf.google.com
"v=spf1 include:_netblocks.google.com include:_netblocks2.google.com include:_netblocks3.google.com ~all"

The DNS itself places only one restriction on the particular labels that can be used to identify resource records. That one restriction relates to the length of the label and the full name. [...] Implementations of the DNS protocols must not place any restrictions on the labels that can be used. In particular, DNS servers must not refuse to serve a zone because it contains labels that might not be acceptable to some DNS client programs.

AFAIK, no registrar allows you to register a domain under a TLD with an underscore, but technically that too is allowed.

thom4parisot · 2018-04-27T15:58:19Z

@LesBarstow I find your comment valuable but I did not have in mind the context of hostnames in regard of DNS entries.

There is a proposal in issue #122 to be either strict or lenient on hostnames with underscores.

Do you think it will address what you mention?

LesBarstow · 2018-04-27T16:33:54Z

My personal opinion: the only calls that should care about character restrictions (aside from length) are isValidHostname() and the isValid property returned by parse(). We use both tldExists() and getDomain(), and those shouldn't care, ever.

For isValidHostname() and parse().isValid: FWIW, the defaults in PHP filtering and Perl Net regex patterns are both lenient, with options for strict. This matches the DNS RFC itself - no restrictions except for proper hostnames, which are limited by RFCs 952 and 1123.

Just my two cents.

LesBarstow · 2018-04-27T19:48:38Z

Alternately, the code could care about the validity of the publicSuffix in a strict form while the rest of the domain name would be lenient. (No registrar registers domains with an underscore as they can't be used for hostnames at all...) This is more annoying, though, because if someone does want to be lenient on the publicSuffix, now you have to have two flag options: reallyStrict, default, and reallyLenient.

remusao · 2018-05-02T11:37:57Z

Hi @LesBarstow and thanks for the great feedback! It's really interesting to get another perspective. I would like to add the following, which is just my opinion on the matter. Currently isValid is used for two different purposes internally:

It's used to quickly check if the input to any of the function is already a valid hostname, in which case we can skip the expensive parsing step. In this case, we could probably use the lenient version of isValid.
It is used to indicate if the input is a valid url/hostname through the two functions: parse, isValid, exposed as part of the public API.

So what we could do perhaps is to use the lenient mode for 1. (as an internal optimization). And for 2. allow an extra parameter to provide options about the behavior isValid.

Last but not least, we had similar discussions in the past regarding hostname parsing (which is hard and different libraries have different behaviors). In the end, we made the opinionated choice of using a specific module but gave the flexibility for a user of the library to provide their own parsing logic. In a way, tldjs is not about validating urls/hostnames. So maybe it is ok to pick one option (let's say we always validate hostnames in a lenient way), and let users who need it use some more complex ways to validate the hostnames depending on their usecase.

As was pointed out, tldjs could only care about validating the public suffix part, since it's what the library is about.

We can of course recommend/suggest other libraries which can be used along-side tld.js to do this validation.

7c · 2018-09-10T19:50:58Z

Hi, i am using parse() function with real world urls from squid logs to determine domainnames. I understand that this repo is all about publicsuffix but look at this real-world-example:

console.log(parse('http://spons_700.spns.nrb-apps.com/ajax/footpanel_process.php'));
{ hostname: 'spons_700.spns.nrb-apps.com',
  isValid: false,
  isIp: false,
  tldExists: false,
  publicSuffix: null,
  domain: null,
  subdomain: null }

console.log(parse('http://spons700.spns.nrb-apps.com/ajax/footpanel_process.php'));
{ hostname: 'spons700.spns.nrb-apps.com',
  isValid: true,
  isIp: false,
  tldExists: true,
  publicSuffix: 'com',
  domain: 'nrb-apps.com',
  subdomain: 'spons700.spns' }

many bigger providers do have _ in their hostnames and if the purphose of parse() is to determine publicSuffix then this function fails with real-world urls

remusao · 2018-09-23T19:23:46Z

Hi @taskinosman, thank you for your input. I proposed a solution a few weeks ago in the form of an option to enable a "lenient mode" for hostname validation in the following PR: #122 but unfortunately the PR was not merged/reviewed yet. In the meanwhile I forked and published tldts which is based on tld.js (but re-written in Typescript + a few other modifications) and provides a different set of default; among which the more permissive hostname validation is enabled by default. Maybe this would solve your problem? Don't hesitate to give me any feedback on it.

7c · 2018-09-24T11:36:45Z

Thanks, sorry i should have seen #122 . I have commented that one

ZLightning mentioned this issue May 6, 2016

tldExists(ipv4_1-lagg0-c026.1.lax004.ix.nflxvideo.net) returns False #75

Closed

thom4parisot added the bug label Sep 13, 2016

remusao mentioned this issue Sep 2, 2017

Implement a more strict hostname validation (isValid) #100

Closed

remusao mentioned this issue Mar 16, 2018

Introduce 'lenient' mode for hostname validation. #122

Closed

thom4parisot mentioned this issue Mar 19, 2018

API design considerations for tld.js@3.0 #124

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

a_b_c.domain.com — Neither domain, nor publicSuffix? (but valid) #73

a_b_c.domain.com — Neither domain, nor publicSuffix? (but valid) #73

ikari-pl commented Dec 7, 2015

ZLightning commented May 6, 2016

thom4parisot commented Sep 13, 2016

ZLightning commented Sep 14, 2016

nebulade commented Nov 1, 2017

LesBarstow commented Apr 26, 2018 •

edited

thom4parisot commented Apr 27, 2018

LesBarstow commented Apr 27, 2018

LesBarstow commented Apr 27, 2018

remusao commented May 2, 2018 •

edited

7c commented Sep 10, 2018

remusao commented Sep 23, 2018

7c commented Sep 24, 2018

a_b_c.domain.com — Neither domain, nor publicSuffix? (but valid) #73

a_b_c.domain.com — Neither domain, nor publicSuffix? (but valid) #73

Comments

ikari-pl commented Dec 7, 2015

ZLightning commented May 6, 2016

thom4parisot commented Sep 13, 2016

ZLightning commented Sep 14, 2016

nebulade commented Nov 1, 2017

LesBarstow commented Apr 26, 2018 • edited

thom4parisot commented Apr 27, 2018

LesBarstow commented Apr 27, 2018

LesBarstow commented Apr 27, 2018

remusao commented May 2, 2018 • edited

7c commented Sep 10, 2018

remusao commented Sep 23, 2018

7c commented Sep 24, 2018

LesBarstow commented Apr 26, 2018 •

edited

remusao commented May 2, 2018 •

edited