Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix/ard #32688

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Bugfix/ard #32688

wants to merge 3 commits into from

Conversation

f-froehlich
Copy link

Please follow the guide below

  • You will be asked some questions, please read them carefully and answer honestly
  • Put an x into all the boxes [ ] relevant to your pull request (like that [x])
  • Use Preview tab to see how your pull request will actually look like

Before submitting a pull request make sure you have:

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

  • I am the original author of this code and I am willing to release it under Unlicense
  • I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

  • Bug fix
  • Improvement
  • New extractor
  • New feature

Description of your pull request and other information

Fixing the ARD extractor because API is changed

Copy link
Contributor

@dirkf dirkf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your work!

Is there a related issue?

I've made some suggestions that should also help to pass the CI tests.

You might like to review the yt-dlp extractor as updated in yt-dlp/yt-dlp#9037 and include any additional functionality from there,. or generally re-align the code.

Comment on lines +362 to +364

_VALID_URL = r'https://(?:(?:beta|www)\.)?ardmediathek\.de/(((?:[^/]+/)?(?:player|live|video|serie|sendung)/(?:[^/]+/)*(?P<id>Y3JpZDovL[a-zA-Z0-9]+))|(((?P<sender>[a-zA-Z0-9\-]+)([/]))?(?P<name>[a-zA-Z0-9\-]+)))'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
_VALID_URL = r'https://(?:(?:beta|www)\.)?ardmediathek\.de/(((?:[^/]+/)?(?:player|live|video|serie|sendung)/(?:[^/]+/)*(?P<id>Y3JpZDovL[a-zA-Z0-9]+))|(((?P<sender>[a-zA-Z0-9\-]+)([/]))?(?P<name>[a-zA-Z0-9\-]+)))'
_VALID_URL = r'''(?x)
https://(?:(?:beta|www)\.)?ardmediathek\.de/
(?:
(?:[^/#?]+/)?(?:player|live|video|serie|sendung)/(?:[^/#?]+/)*
(?P<id>Y3JpZDovL[a-zA-Z0-9]+)|
(?P<sender>[a-zA-Z0-9-]+)/)?(?P<name>[a-zA-Z0-9-]+)
)
'''

Comment on lines -451 to +454
def _match_id(cls, url):
def _match_id(cls, url, group_name = 'id'):
m = cls.__match_valid_url(url)
assert m
return compat_str(m.group('id'))
return compat_str(m.group(group_name))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls revert these and use _match_valid_url() in the extractor, as suggested above.

Comment on lines 412 to +414
video_id = self._match_id(url)
video_name = self._match_id(url, group_name='name')
sender = self._match_id(url, group_name='sender')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
video_id = self._match_id(url)
video_name = self._match_id(url, group_name='name')
sender = self._match_id(url, group_name='sender')
video_id, video_name, sender = self._match_valid_url(url).group('id, 'name', 'sender')

Comment on lines +416 to +421
if '/serie/' in url or '/sendung/' in url:
return self._real_extract_serie(video_id)
elif 'none' != video_name.lower():
return self._real_extract_named_serie(video_name, sender if 'none' != sender.lower() else "ard")
else:
return self._real_extract_video(video_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The URL pattern matches either for video_id or for video_name, so that logic can be used. A non-matched group will be None rather than 'none'.

Suggested change
if '/serie/' in url or '/sendung/' in url:
return self._real_extract_serie(video_id)
elif 'none' != video_name.lower():
return self._real_extract_named_serie(video_name, sender if 'none' != sender.lower() else "ard")
else:
return self._real_extract_video(video_id)
if video_id is not None:
return self._real_extract_serie(video_id)
return self._real_extract_named_serie(video_name, sender if sender is not None else 'ard')

If video_name.lower() == 'none' is an actual possibility, add a test for it and raise ExtractorError( ..., expected=True) for that case. Or wrap the group in `(?!none...) in the pattern so that it doesn't match.

}).encode(), headers={
'Content-Type': 'application/json'
})['data']['playerPage']
f'https://api.ardmediathek.de/page-gateway/pages/ard/item/{video_id}',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Py2 (could use .format(video_id) but this style is used elsewhere):

Suggested change
f'https://api.ardmediathek.de/page-gateway/pages/ard/item/{video_id}',
'https://api.ardmediathek.de/page-gateway/pages/ard/item/' + video_id,

page_number = 0
page_size = 100

while True:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make the loop condition explicit:

Suggested change
while True:
total = traverse_obj(widgets, ('pagination', 'totalElements', T(int))) or 0
while page_number * page_size <= total:

Comment on lines +544 to +546
total = widgets['pagination']['totalElements']
if (page_number + 1) * page_size > total:
break
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No longer needed:

Suggested change
total = widgets['pagination']['totalElements']
if (page_number + 1) * page_size > total:
break

Comment on lines +556 to +563
widgets = self._download_json(
f'https://api.ardmediathek.de/page-gateway/pages/{sender}/editorial/{video_id}',
video_id,
query={'pageSize': str(10), 'pageNumber': 0}
)['widgets']

for widget in widgets:
widget_id = widget['id']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
widgets = self._download_json(
f'https://api.ardmediathek.de/page-gateway/pages/{sender}/editorial/{video_id}',
video_id,
query={'pageSize': str(10), 'pageNumber': 0}
)['widgets']
for widget in widgets:
widget_id = widget['id']
widgets = self._download_json(
'https://api.ardmediathek.de/page-gateway/pages/{0}/editorial/{1}'.format(sender, video_id),
video_id, query={'pageSize': 10, 'pageNumber': 0}
)
for widget_id in traverse_obj(widgets, ('widgets', Ellipsis, 'id')):

Comment on lines +573 to +582
for teaser in page_data['teasers']:
if 'EPISODE' == teaser.get('coreAssetType', None) and teaser['type'] not in ['poster'] and ':' not in teaser['id']:

item = self._real_extract_video(teaser['id'])
item['webpage_url'] = f"https://www.ardmediathek.de/video/{teaser['id']}"
entries.append(item)

total = page_data['pagination']['totalElements']
if (page_number + 1) * page_size > total:
break
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And as before:

Suggested change
for teaser in page_data['teasers']:
if 'EPISODE' == teaser.get('coreAssetType', None) and teaser['type'] not in ['poster'] and ':' not in teaser['id']:
item = self._real_extract_video(teaser['id'])
item['webpage_url'] = f"https://www.ardmediathek.de/video/{teaser['id']}"
entries.append(item)
total = page_data['pagination']['totalElements']
if (page_number + 1) * page_size > total:
break
entries.extend(traverse_obj(page_data, (
'teasers', lambda _, v: 'EPISODE' == ['coreAssetType'] and v.get('type') != 'poster' and ':' not in v['id'],
'id', T(self._mk_teaser))))
total = traverse_obj(page_data, (
'pagination', 'totalElements', T(int))) or 0

Comment on lines +566 to +570
while True:
page_data = self._download_json(
f'https://api.ardmediathek.de/page-gateway/widgets/{sender}/editorials/{widget_id}',
video_id,
query={'pageSize': page_size, 'pageNumber': page_number}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make loop condition explicit:

Suggested change
while True:
page_data = self._download_json(
f'https://api.ardmediathek.de/page-gateway/widgets/{sender}/editorials/{widget_id}',
video_id,
query={'pageSize': page_size, 'pageNumber': page_number}
total = 0
while page_number * page_size <= total:
page_data = self._download_json(
'https://api.ardmediathek.de/page-gateway/widgets/{0}/editorials/{1}'.format(sender, widget_id),
video_id, query={'pageSize': page_size, 'pageNumber': page_number}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants