Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the regex python 3.11 compatible #313

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

santhoshtr
Copy link

Python 3.11 throws the error: re.error: global flags not at the start of the expression at position

Ref: https://bugs.python.org/issue47066

@becaHEse
Copy link

becaHEse commented Jul 1, 2023

Thanks for your commits, it worked!

@xtexChooser
Copy link

Thanks, it worked!

@santhoshtr
Copy link
Author

@attardi As python 3.12 is the latest version, people might want to use 3.11 or 3.12. Gentle ping to merge and release new version. Thanks!

@ghost
Copy link

ghost commented Nov 1, 2023

Does the regular expression do what it was supposed to do? I think the proposed change does not change the behavior, so it would be correct, but I am not sure if the regular expression was originally supposed to do something different.

If I understand the documentation for (?aiLmsux) correctly, then (?i) changes the whole regular expression to case-insensitivity no matter where it is placed in the regular expression. Probably to remove this confusing behavior, it's only allowed at the beginning of the regex as of Python 3.11.

Since in this code's regular expression it is listed in a sub group, it looks to me like the original author might have (wrongly) tried to make only this sub group case insensitive. If that's the case I think (?aiLmsux-imsx:...) might be the correct version (available since version 3.6).

The letters set or remove the corresponding flags [...] for the part of the expression.

If indeed case-insensitivity for the whole regular expression is desired, I'd propose to rather use the compile flag re.I. Both regular expressions already use other compile flags, so adding the global case-insensitivity flag alongside them seems more readable. The documentation even states that (?i) is basically an alternative for re.I:

set the corresponding flags: [...] for the entire regular expression. This is useful if you wish to include the flags as part of the regular expression, instead of passing a flag argument to the re.compile() function.

ExtLinkBracketedRegex = re.compile(
    '\[((' + '|'.join(wgUrlProtocols) + ')' + EXT_LINK_URL_CLASS + r'+)\s*([^\]\x00-\x08\x0a-\x1F]*?)\]',
    re.S | re.U | re.I)
EXT_IMAGE_REGEX = re.compile(
    r"""^(http://|https://)([^][<>"\x00-\x20\x7F\s]+)
    /([A-Za-z0-9_.,~%\-+&;#*?!=()@\x80-\xFF]+)\.(gif|png|jpg|jpeg)$""",
    re.X | re.S | re.U | re.I)

Update: Looking at both regular expression, I'd say the global version seems right. Even if the original code looks like the author intended to only make parts of the expression case-insensitive.

Let's assume the original author indeed only wanted to make parts of the expressions case-insensitive. Then in ExtLinkBracketedRegex the URL protocols (bitcoin://, ftp://, and so on) would be set to case-insensitive. But in EXT_IMAGE_REGEX only the file extension (gif, png, and so on) would be set to case-insensitive, but the URL protocol (http://, https://) would be left case-sensitive. That would be quite inconsistent.

@BeautiLu
Copy link

Thanks for your commits, it worked!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants