Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added test to check the right scraped multimedia files #1821

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

pavel-karatsiuba
Copy link
Contributor

@pavel-karatsiuba pavel-karatsiuba commented Mar 29, 2023

The test is scraping the page with different types of multimedia files.

Fixes: #890

@codecov
Copy link

codecov bot commented Mar 29, 2023

Codecov Report

Patch coverage: 100.00% and project coverage change: +0.21 🎉

Comparison is base (4ff19c6) 70.90% compared to head (4ea9d27) 71.11%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1821      +/-   ##
==========================================
+ Coverage   70.90%   71.11%   +0.21%     
==========================================
  Files          24       24              
  Lines        2622     2624       +2     
  Branches      594      596       +2     
==========================================
+ Hits         1859     1866       +7     
+ Misses        657      652       -5     
  Partials      106      106              
Impacted Files Coverage Δ
src/util/rewriteUrls.ts 84.66% <100.00%> (+0.20%) ⬆️

... and 1 file with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

Copy link
Collaborator

@kelson42 kelson42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • What about the different format testing?
  • Are you sure User:Kelson/MWoffliner_CI_reference is actually scraped properly?

@kelson42 kelson42 force-pushed the scraping-multimedia-content branch from bd06c73 to 1485249 Compare April 10, 2023 11:19
@kelson42
Copy link
Collaborator

No feedback here :(

Actually it's not. This is what I get on the first section of the upstream
image

Here is what I get with this branch
image

Here the remarks:

  • The two visible images seems OK. They are blurred, but I guess this is because of the quality loss the recompression brings with.
  • The first of the 3 links is OK as well as it points to a description page which is not existing in the ZIM
  • The two other/last links are wrong. They point to an external link, but they should point to the version embedded in the ZIM (and it should be embedded in the ZIM). If they would be an external link, then should be handled link an external link (with the appropriate CSS icon). But here we should actually include the file and point to it.

@pavel-karatsiuba
Copy link
Contributor Author

On the bottom of the zim page, I see also external links to the .webm file and the .mp3 file.
Do I need to save .webm and .mp3 into zim file or add an icon to show the 'external' file?

@kelson42
Copy link
Collaborator

kelson42 commented Apr 12, 2023

@pavel-karatsiuba none of these links are external. An external link has a precise meaning in Mediawiki, see https://www.mediawiki.org/wiki/Help:Links#External_links

They are direct links to internal resources.

@stale
Copy link

stale bot commented May 26, 2023

This pull request has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.

@stale stale bot added the stale label May 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Multiple problems in scraping of multimedia content
2 participants