Skip to content
This repository has been archived by the owner on Apr 1, 2023. It is now read-only.

Goutte can't find elements that are out of view or still haven't loaded #423

Open
matveynikon opened this issue Aug 31, 2020 · 1 comment

Comments

@matveynikon
Copy link

I am trying to make a simple youtube seo tool with goutte. It is supposed to search for a keyword, find a certain video and print the position at which the video is at for that keyword. My problem is that my goutte bot can't find videos that are under the top 10 results. I suppose that is either because those videos haven't loaded yet because for those videos to load a person has to actually scroll down(which I am unable to do with goutte) or because the video is simply out of view port.

Does anyone know a solution? Or If anyone knows if there is a way to scroll in goute, please tell me.

My code:

request('GET', 'https://www.youtube.com/results?search_query=php+web+scraping'); sleep(5); $crawler->selectLink('php web scraping tutorial(simple)')->link();//this video is in the top 30 ?>
@jeromegamez
Copy link

I had the same issue with another site and, while debugging, stumbled upon the mention of a HTML5 class in the Crawler class of the DOMCrawler component:

use Masterminds\HTML5;
// ...
$this->html5Parser = class_exists(HTML5::class) ? new HTML5(['disable_html_ns' => true]) : null;

A follow-up Google search then lead me to https://github.com/Masterminds/html5-php and https://symfony.com/blog/new-in-symfony-4-3-better-html5-parser-for-domcrawler

Long story short: a composer require masterminds/html5 solved the issue for me 🥳

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants