Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle edge case: "invalid url in base tag" #179

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

delamotte-pierrick
Copy link

Add test and bugfix in case the href content of 'base' tag is a relative url.

The bug have been encounter on the url : https://www.gla.ac.uk/myglasgow/digitalaccessibility/

the current base tag contain : <base href="/myglasgow/digitalaccessibility/" />

image

Stack Trace and error message:

string(193) "The URL of the element is relative, so you must define its base URI passing an absolute URL to the constructor of the "Symfony\Component\DomCrawler\AbstractUriElement" class ("://" was passed)."

#0 /srv/app/vendor/spekulatius/phpscraper/src/UsesContent.php(492): Symfony\Component\DomCrawler\AbstractUriElement->__construct(Object(DOMElement), '://')
#1 /srv/app/vendor/spekulatius/phpscraper/src/PHPScraper.php(138): Spekulatius\PHPScraper\Core->linksWithDetails()
#2 /srv/app/src/Services/ScraperService.php(107): Spekulatius\PHPScraper\PHPScraper->__call('linksWithDetail...', Array)
#3 /srv/app/src/Services/AnalyzerService.php(56): App\Services\ScraperService->extractData(Object(Spekulatius\PHPScraper\PHPScraper))
#4 /srv/app/src/Services/AnalyzerService.php(28): App\Services\AnalyzerService->extractDataFromUrl('https://www.gla...', 'en', 'uk')
#5 /srv/app/src/Controller/HomeController.php(80): App\Services\AnalyzerService->analyzeUrl('https://www.gla...', Object(App\Entity\Language), 'uk')
#6 /srv/app/vendor/symfony/http-kernel/HttpKernel.php(163): App\Controller\HomeController->index(Object(Symfony\Component\HttpFoundation\Request))
#7 /srv/app/vendor/symfony/http-kernel/HttpKernel.php(74): Symfony\Component\HttpKernel\HttpKernel->handleRaw(Object(Symfony\Component\HttpFoundation\Request), 1)
#8 /srv/app/vendor/symfony/http-kernel/Kernel.php(184): Symfony\Component\HttpKernel\HttpKernel->handle(Object(Symfony\Component\HttpFoundation\Request), 1, true)
#9 /srv/app/vendor/symfony/runtime/Runner/Symfony/HttpKernelRunner.php(35): Symfony\Component\HttpKernel\Kernel->handle(Object(Symfony\Component\HttpFoundation\Request))
#10 /srv/app/vendor/autoload_runtime.php(29): Symfony\Component\Runtime\Runner\Symfony\HttpKernelRunner->run()
#11 /srv/app/public/index.php(5): require_once('/srv/app/vendor...')
#12 {main}"

Is that possible for you to generate an url with an invalid base tag, that i can use instead of those of Glasgow University, please ?

Regards,
Pierrick

@spekulatius
Copy link
Owner

Hello Pierrick,

Thank you for your PR and an interesting discovery!

For tests, I usually 'mock up' a small HTML page and add it to this test pages repo. This way the hosted examples remain in control and won't change unexpectedly. Could you prepare such a test page and tweak the test?

Let me know if any questions come up :)

Peter

@delamotte-pierrick
Copy link
Author

Hey Peter,

I have added the file and maded the pull request here.

I'm not sure if the domain https://test-pages.phpscraper.de is the good one for the test, so just tell me if I need to change them

Have a nice day !
Pierrick

@spekulatius
Copy link
Owner

Hello Pierrick,

Thank you for the PR!

The host should be okay, no?

According to Mozilla relative paths are valid. I was wondering if we should return an absolute URL. Should we rename the tests accordingly and make sure it returns the absolute PATH (even if relative is provided)? PHPScraper returns absolute URLs for anything, even if relative paths are provided.

Cheers,
Peter

tests/BaseHrefTest.php Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants