- Updated CHANGELOG
- Fixed small issue in ArticleExtractor where $html variable was not defined.
- Reinstates paquettg/php-html-parser as the preferred DOM parser.
- Note that this updates many composer dependencies so releasing this as a separate release just in case.
- Added ability to pass in HTML and process via the
processHTML
method - Revised documentation with updates and fixes
- Updated test cases
- Added ability to override/force the reading method.
- Added handling of common Google referral URLs
- Added 'result_url' to the return structure to inform the caller what the resultant URL was after redirects
- Turned off debugging left on by mistake
- Added ability to manually set User-Agent, fixing many readability issues
- Updated redirect detection logic to more accurately read HTTP headers.
- Updated dependencies
- Updated PHPUnit to ^8.0
- Updated andreskrey/readability.php to ^2.1.0
- Updated PHP dependency to ^7.2
- Fixed minor issue with
parse_url
check.
- Updated to modify the approach for cleaning HTML tags and dealing with newlines.
- Updated README.md to outline the new text format.
- Closes issue #25
- Updated to include cleaning up of article text.
- Updated redirect checking logic to include ports
- Resolved 301 redirects to incomplete URL
- Closes issue #23 related to 301 redirects when scheme and host is not present.
- Added andreskrey/readability.php library as the default method of article parsing, using prior methods as a backup.
- This closes multiple issues related to article reading including #6, #7, #8, #17, #18
- Changed the call to parse a URL from
getArticleText to
processURL` - Added README.md
- Upgraded to PHPUnit 6.x for testing
- Started CHANGELOG
- Moved PHP DOM Parser dependency to thesoftwarefanatics/php-html-parser for support of PHP 7.2+