Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regex update to avoid over-redaction of GitHub issues #325

Merged
merged 1 commit into from
May 29, 2024

Conversation

bismuthsalamander
Copy link
Contributor

Hello - this PR adjusts the regexes found in airflow/include/tasks/extract/github.py that remove boilerplate text from GitHub issue thread text. Before this change, the regular expressions will remove too much text due to the greedy matching.

Consider the following example:

Discussion here.
<!--\r\nThank you. http://chris.beams.io/posts/git-commit/\r\n-->
More discussion here.
<!--\r\nThank you. http://chris.beams.io/posts/git-commit/\r\n-->
Even more discussion here

The two lines containing comments should be removed, but the greedy match in the regular expression <!--\r\nThank you.*?http://chris.beams.io/posts/git-commit/\r\n--> will cause the line More discussion here. to be removed as well.

To fix that behavior, this PR replaces each greedy .* sequence with a lazy .*? sequence so that the minimum (intended) match is removed.

@jlaneve jlaneve merged commit 5a4fed4 into astronomer:main May 29, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants