You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What did you expect to happen? What happened instead?
Missing ads on most used news sites.
replay of news sites are missing most of the ads - some are traced with Archived Page Not Found or not displayed and a few displayed. All ads can be seen in watch crawl window.
This may be a result of switching to Brave browser which has more agressive privacy settings by default. These should be able to be disabled on a per-browser profile basis, but should likely be off by default unless the "block ads" setting has been enabled by the user.
In the mean time, try creating a browser profile with some of Brave's "Shields" settings disabled.
I’m using the beta.browsertrix GUI v. 1,8* with no blocking of ads and I can’t change crawling browser. To me the ads replay seems much better than for a year ago. During crawl I can see all the ads in the crawl windows, so the crawler sees the ads.
Some of the ads are replayed fine, but not all.
I think, it’s “only” a question about harvesting url’s and replay 😊.
Best regards
Tue
Browsertrix Cloud Version
v1.8.0-beta.4-7d985a9
What did you expect to happen? What happened instead?
Missing ads on most used news sites.
replay of news sites are missing most of the ads - some are traced with Archived Page Not Found or not displayed and a few displayed. All ads can be seen in watch crawl window.
Step-by-step reproduction instructions
e.g.
politiken.dk
crawl: "pol frontpage with all context"
https://beta.browsertrix.cloud/orgs/netarkivet-det-kgl-bibliotek/items/crawl/sched-bb9b135d-357-28341060?workflowId=bb9b135d-3573-4901-bdef-a80d35a15741#replay
Archived Page Not Found
Sorry, this page was not found in this archive:
https://0e9755db0ca066211b5983705fdb4922.safeframe.googlesyndication.com/safeframe/1-0-40/html/container.html?n=2
tv2.dk
crawl: tv2.dk frontpage complete context incl. ads
https://beta.browsertrix.cloud/orgs/netarkivet-det-kgl-bibliotek/items/crawl/manual-20231118064936-03e01f26-37d?workflowId=03e01f26-37dd-4fa6-880f-db7bd6dd6679
berlingske.dk frontpage with context
crawl: https://beta.browsertrix.cloud/orgs/netarkivet-det-kgl-bibliotek/items/crawl/manual-20231118095211-a4e6bc32-473?workflowId=a4e6bc32-4733-4a3f-8231-43b6df1c4031#replay
Additional details
No response
The text was updated successfully, but these errors were encountered: