Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Take full page screenshots #143 #148

Merged
merged 1 commit into from May 12, 2024

Conversation

kamtschatka
Copy link
Contributor

Added the fullPage flag to take full screen screenshots updated the UI accordingly to properly show the screenshots instead of scaling it down

Copy link

vercel bot commented May 10, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
hoarder-app-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback May 10, 2024 9:21pm
hoarder-app-landing ✅ Ready (Inspect) Visit Preview 💬 Add feedback May 10, 2024 9:21pm

Added the fullPage flag to take full screen screenshots
updated the UI accordingly to properly show the screenshots instead of scaling it down
@MohamedBassem
Copy link
Collaborator

MohamedBassem commented May 11, 2024

First of all, again thanks a lot for taking the time and implementing that feature you requested! :) I tried it locally, but it seems that for some reason, sometimes, it results into the crawler getting stuck fetching that page.

for example when fetching that link, it always times out which doesn't happen when full page screenshot is enabled:

2024-05-11T17:33:20.086Z info: [Crawler][279] Will crawl "https://www.amazon.co.uk/ASUS-GeForce-Graphics-DisplayPort-TUF-RTX4090-O24G-GAMING/dp/B0BGV6LQYR/ref=pd_ci_mcx_mh_mcx_views_0?pd_rd_w=HX8NY&content-id=amzn1.sym.c428a25c-eebb-4a80-982c-a8845b70a765:amzn1.symc.ca948091-a64d-450e-86d7-c161ca33337b&pf_rd_p=c428a25c-eebb-4a80-982c-a8845b70a765&pf_rd_r=KEX284MKDMNSA32NBMFC&pd_rd_wg=em7oq&pd_rd_r=da88c671-7bc3-488b-999d-e650f5af151e&pd_rd_i=B0BGV6LQYR" for link with id "hs653stzi2oqtleqsvjla4wb"
2024-05-11T17:33:25.181Z info: [Crawler][279] Successfully navigated to "https://www.amazon.co.uk/ASUS-GeForce-Graphics-DisplayPort-TUF-RTX4090-O24G-GAMING/dp/B0BGV6LQYR/ref=pd_ci_mcx_mh_mcx_views_0?pd_rd_w=HX8NY&content-id=amzn1.sym.c428a25c-eebb-4a80-982c-a8845b70a765:amzn1.symc.ca948091-a64d-450e-86d7-c161ca33337b&pf_rd_p=c428a25c-eebb-4a80-982c-a8845b70a765&pf_rd_r=KEX284MKDMNSA32NBMFC&pd_rd_wg=em7oq&pd_rd_r=da88c671-7bc3-488b-999d-e650f5af151e&pd_rd_i=B0BGV6LQYR". Waiting for the page to load ...
2024-05-11T17:33:29.020Z info: [Crawler][279] Finished waiting for the page to load.
2024-05-11T17:34:20.102Z error: [Crawler][279] Crawling job failed: Error: Timed-out after 60 secs
2024-05-11T17:34:22.139Z info: [Crawler][279] Will crawl "https://www.amazon.co.uk/ASUS-GeForce-Graphics-DisplayPort-TUF-RTX4090-O24G-GAMING/dp/B0BGV6LQYR/ref=pd_ci_mcx_mh_mcx_views_0?pd_rd_w=HX8NY&content-id=amzn1.sym.c428a25c-eebb-4a80-982c-a8845b70a765:amzn1.symc.ca948091-a64d-450e-86d7-c161ca33337b&pf_rd_p=c428a25c-eebb-4a80-982c-a8845b70a765&pf_rd_r=KEX284MKDMNSA32NBMFC&pd_rd_wg=em7oq&pd_rd_r=da88c671-7bc3-488b-999d-e650f5af151e&pd_rd_i=B0BGV6LQYR" for link with id "hs653stzi2oqtleqsvjla4wb"
2024-05-11T17:34:24.868Z info: [Crawler][279] Successfully navigated to "https://www.amazon.co.uk/ASUS-GeForce-Graphics-DisplayPort-TUF-RTX4090-O24G-GAMING/dp/B0BGV6LQYR/ref=pd_ci_mcx_mh_mcx_views_0?pd_rd_w=HX8NY&content-id=amzn1.sym.c428a25c-eebb-4a80-982c-a8845b70a765:amzn1.symc.ca948091-a64d-450e-86d7-c161ca33337b&pf_rd_p=c428a25c-eebb-4a80-982c-a8845b70a765&pf_rd_r=KEX284MKDMNSA32NBMFC&pd_rd_wg=em7oq&pd_rd_r=da88c671-7bc3-488b-999d-e650f5af151e&pd_rd_i=B0BGV6LQYR". Waiting for the page to load ...
2024-05-11T17:34:25.871Z info: [Crawler][279] Finished waiting for the page to load.
2024-05-11T17:35:22.152Z error: [Crawler][279] Crawling job failed: Error: Timed-out after 60 secs

I don't know how puppeteer implements this, but I wonder if it misbehaves if the page expands on scrolling for example or something.

@kamtschatka
Copy link
Contributor Author

Thanks, I'll check it out.

@kamtschatka
Copy link
Contributor Author

Are you running that with the docker container for chrome or just with the worker? I tried that out many times now and never had any issues.

@MohamedBassem MohamedBassem merged commit d33be14 into hoarder-app:main May 12, 2024
6 checks passed
@MohamedBassem
Copy link
Collaborator

Ok, no worries. Let's merge it and I'll add some protections so that taking screenshots timing out doesn't fail the entire crawling :)

@kamtschatka kamtschatka deleted the fullscreen-screenshot branch May 18, 2024 12:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants