Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change regular expression #32664

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

change regular expression #32664

wants to merge 1 commit into from

Conversation

eukub
Copy link

@eukub eukub commented Dec 13, 2023

Please follow the guide below

  • You will be asked some questions, please read them carefully and answer honestly
  • Put an x into all the boxes [ ] relevant to your pull request (like that [x])
  • Use Preview tab to see how your pull request will actually look like

Before submitting a pull request make sure you have:

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

  • I am the original author of this code and I am willing to release it under Unlicense
  • I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

  • Bug fix
  • Improvement
  • New extractor
  • New feature

Description of your pull request and other information

fixed regular expression for support unicode digits too

@dirkf
Copy link
Contributor

dirkf commented Dec 15, 2023

What digits from Unicode that are not 0-9 do you expect to be found in a year value?

@eukub
Copy link
Author

eukub commented Dec 15, 2023

What digits from Unicode that are not 0-9 do you expect to be found in a year value?
For example if you get digits like 0123456789 or like 0123456789 instead of ASCII digits 0123456789

The numbers from the first two sequences can be processed using \d

This links can be helpful:
https://docs.python.org/3/library/re.html
https://en.wikipedia.org/wiki/Numerals_in_Unicode

@dirkf
Copy link
Contributor

dirkf commented Dec 15, 2023

Yes, but:

  • this code is run to update the pages of the GH website, and so is rarely used (compared to some code in the YouTube extractor, say) and only by project maintainers
  • the value being matched is the year value that was previously set in the same way, unless one of the said maintainers has weirdly typed a Unicode digit other than 0-9 in a manual update
  • that value is a py2 unicode or py3 str containing the result of the Python datetime library call a couple of lines up, which is going to match '[0-9]{4}' for as long as we care.

I'm not saying that \d is wrong and [0-9] is better: just that the existing code covers the likely cases and I wouldn't have looked there for something to improve.

@eukub
Copy link
Author

eukub commented Dec 15, 2023

Yes, but:

  • this code is run to update the pages of the GH website, and so is rarely used (compared to some code in the YouTube extractor, say) and only by project maintainers
  • the value being matched is the year value that was previously set in the same way, unless one of the said maintainers has weirdly typed a Unicode digit other than 0-9 in a manual update
  • that value is a py2 unicode or py3 str containing the result of the Python datetime library call a couple of lines up, which is going to match '[0-9]{4}' for as long as we care.

I'm not saying that \d is wrong and [0-9] is better: just that the existing code covers the likely cases and I wouldn't have looked there for something to improve.

If your code has less reasons to fail, that's better, isn't it?

@dirkf
Copy link
Contributor

dirkf commented Dec 15, 2023

In general, but arguably this code should fail since no weird Unicode digits should be in the scanned text, entirely under the control of the project. If that should happen, it should be investigated.

Whereas, extractor code has to deal with external web pages that might suddenly happen to contain whatever characters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants