Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: No ads in replay on some sites eventhough the ads are shown in the brave profile or online #1606

Open
tuehlarsen opened this issue Mar 17, 2024 · 9 comments
Labels
bug Something isn't working

Comments

@tuehlarsen
Copy link

tuehlarsen commented Mar 17, 2024

Browsertrix Version

v1.9.4-08ee857

What did you expect to happen? What happened instead?

After the last opgrade to 1.9.4 the ads are not shown any more in replay for tv2.dk even though they are visible in the browserprofile:
here the replay af tv2.dk:
image
and here the the browser profile : tv2.dk med accept af cookies :
image

The same happens for berlingske.dk but here it is not possble to see the adds in the browser profile too - eventhough i have disabled all shields in the brave settings.
Here a snip of adds from berlingske.dk online:
image
and here the browser profile:
image
and here the brave:setttings for shields:
image
I can see all ads in brave with disabled shields here:
image
image

but strangely enough - it still works for politiken.dk here in replay:
image

Reproduction instructions

see above

Screenshots / Video

No response

Environment

No response

Additional details

No response

@tuehlarsen tuehlarsen added the bug Something isn't working label Mar 17, 2024
@tuehlarsen tuehlarsen changed the title [Bug]: No ads in replay on some sites eventhough the ads are shown in the profile [Bug]: No ads in replay on some sites eventhough the ads are shown in the brave profile or online Mar 17, 2024
@tuehlarsen
Copy link
Author

tuehlarsen commented Mar 18, 2024

I checked this morning again and only replay of berlingske.dk can't show the ads. tv2.dk and politiken.dk are replaying some of the ads.
Any hints to what could be wrong with the setup of berlingske.dk concerning ads?

@tw4l
Copy link
Contributor

tw4l commented Mar 18, 2024

Hi @tuehlarsen , in 1.9.4 we changed the default crawler version to the latest 1.0.0 beta, that may be responsible for the change. Could you try that crawl again with the "Previous" crawler channel (which is set to 0.12.4) to see if that works? You can find the crawler channel selector in Edit Workflow under Browser Settings:

Screen Shot 2024-03-18 at 1 28 05 PM

Here's the relevant section in the docs: https://docs.browsertrix.cloud/user-guide/workflow-setup/#crawler-release-channel

@tuehlarsen
Copy link
Author

I tried with the previous crawler version with berlingske.dk - it just ignores the browser profile totally and the accept of cookies.
With the default crawler it crashes again and again with interrupt: 139.

@ikreymer
Copy link
Member

I tried with the previous crawler version with berlingske.dk - it just ignores the browser profile totally and the accept of cookies. With the default crawler it crashes again and again with interrupt: 139.

The crash in this case was due to sitemap parsing - we have a fix for this shortly, webrecorder/browsertrix-crawler#496 - in the meantime, disable 'Use Sitemap' for this crawl and try agian.

@tuehlarsen
Copy link
Author

tuehlarsen commented Mar 19, 2024

now it runs but berlingske.dk with no ads or no ads traces in replay - i saw the ads during the crawl and no cookies accept popup, so it should use the browser profile. Allmost the same with ekstrabladet.dk In replay: there is a few adds in the midle columnpart of the frontpage and only empty black columns in columns to the left and right. The crawler saw all the ads to the left and right and in the midle column, but allmost no ads are shown in replay.
here online snips:
image
image

Here some snips from the crawl:
image
image

And a snip from replay:
image

@tuehlarsen
Copy link
Author

I can see all adds in a brave browser from a danish ip without shields activated, so perhaps a browsertrix replay issue?

@tuehlarsen
Copy link
Author

The different newssites use some different ads providers/frameworks e.g. with display of iframes with html etc.
information.dk does not use google ads but https://www.adnami.io and shows no ads in replay, only empty spaces, while tv2.dk uses a mix of google ads and https://betterbannerscloud.com. berlingske.dk also uses a mix of google ads and https://www.adnami.io/ but uses the google framework in a different way than replay can handle. https://jyllands-posten.dk/ uses a mix https://www.adnami.io/ and google ads.
The best ads replay appatizers are frontpage crawls of politiken.dk and tv2.dk eventhough some ads are missing and we are also running from not danish ip's.
It seems to be a hard work to support these ads frameworks but i think it is important to have the most dominant supported in the replay of a newsites "look and feel" because they interact/overrun the news contents so massively.

@tuehlarsen
Copy link
Author

Re berlingske.dk : When i use the archive.Webpage desktop version from oct. 2023 [ArchiveWeb.page-0.11.3.exe] i can replay traces of the ads and play the videos in the audio/video list : https://beta.browsertrix.cloud/orgs/kb/items/upload/upload-55d89b6d-7561-43e1-a392-76c9ecd89a4f#replay

@tuehlarsen
Copy link
Author

progress: in version 1.9.7 information.dk shows danish ads or traces in offline replay webpage desktop - in stead of empty placeholders!
see
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Triage
Development

No branches or pull requests

3 participants