Skip to content

A very simple domain parser for PHP version 5.6.2+. Splits a URL into subdomain(s), registrable domain, and public suffix(es).

License

Notifications You must be signed in to change notification settings

flacle/simplePHPDomainParser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

simplePHPDomainParser

A very simple domain parser for PHP version 5.6.2+. It splits a URL into subdomain(s), registrable domain, and public suffix(es).

Why simple and why custom?

I am working on a big data processor and needed a domain parsing utility that is lightweight and fast. While I haven't benchmarked the performance of the app, I opted to use basic string fuctions such as strpos instead of more intensive regex functions for string pattern matching. While this utility uses an externally maintained reference list, there are no external requests being made as the reference list is pre-processed into a PHP array that can be loaded once per runtime. Because it's a small utility I also made it entirely procedural instead of object oriented.

As experienced by other parser developers, domain parsing is tricky business. For instance, think about the number of segments (such as http://a.b.c.d.e). This complexity comes at a cost where it becomes difficult to accurately parse a domain from an input URL into sub, registrable, and suffixes. One way to quite accurately parse a domain is to compare the input URL with a maintained list of the ICANN database, which is what this utility does.

There are also minor issues that I've encountered with PHP's own parse_url() function, and so this utility does not make use of PHP’s own built in URL parser, nor any regex functions for that matter. Please have a have a look at demo.php to see some tests with several URLs.

Installation

This utility is procedural and does not require classes to be auto loaded. It has a namespace simplePHPDomainParser for encapsulation, but that's also it. To incorporate the utility into your own project, paste the folder and include it by adding a statement such as require_once '../util/simplePHPDomainParser/index.php'; at the top of your script.

Usage

Parsing URLs for domains

The below snippet of code:

require_once './index.php';
$url = 'http://shop.retail.mystore.co.uk';
var_dump(\simplePHPDomainParser\getDomain($url));

Would output:

array(3) {
  [0]=>
  string(11) "shop.retail"
  [1]=>
  string(7) "mystore"
  [2]=>
  string(5) "co.uk"
}

By including index.php into your project you automatically include the file parser.php that contains the utilities' logic. The main function is getDomain($url). For convenience you can also ask for specific components. Calling getSubDomain($url) would return just shop.retail. Have a look demo.php that contains an array of test URLs.

Maintaining the ICANN reference list

The ICANN public suffix list comes from https://github.com/publicsuffix/list (thanks Mozilla!). This list is maintained from time to time and if you decide to use it you should also update public_suffix_list.dat from time to time stored in folder /publicsuffixlists/. The util parses only public ICANN domains, and not private ones, however feel free to fork and adapt the code as as you see fit. Every time you update the .dat file, you should also run /src/serializeToPHP.php to update the PHP array as well.

Contributor

About

A very simple domain parser for PHP version 5.6.2+. Splits a URL into subdomain(s), registrable domain, and public suffix(es).

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages