Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

R1707 (trailing-comma-tuple) checks perform excessive is_message_enabled calls #9608

Closed
correctmost opened this issue May 7, 2024 · 3 comments · Fixed by #9609 or #9620
Closed

R1707 (trailing-comma-tuple) checks perform excessive is_message_enabled calls #9608

correctmost opened this issue May 7, 2024 · 3 comments · Fixed by #9609 or #9620
Assignees
Labels
Enhancement ✨ Improvement to a component Needs PR This issue is accepted, sufficiently specified and now needs an implementation performance
Milestone

Comments

@correctmost
Copy link

correctmost commented May 7, 2024

Bug description

RefactoringChecker::process_tokens has a loop with an embedded self.linter.is_message_enabled("trailing-comma-tuple") call:

for index, token in enumerate(tokens):
token_string = token[1]
if token_string == "elif":
# AST exists by the time process_tokens is called, so
# it's safe to assume tokens[index+1] exists.
# tokens[index+1][2] is the elif's position as
# reported by CPython and PyPy,
# token[2] is the actual position and also is
# reported by IronPython.
self._elifs.extend([token[2], tokens[index + 1][2]])
elif self.linter.is_message_enabled(
"trailing-comma-tuple"
) and _is_trailing_comma(tokens, index):
self.add_message("trailing-comma-tuple", line=token.start[0])

This call gets executed ~1.4 million times when running pylint on the yt-dlp repo. Hoisting the is_message_enabled call outside of the loop can bring the number of executions down to ~1100.

Configuration

[MAIN]
jobs=1

[MESSAGES CONTROL]
disable=all
enable=R1707

[REPORTS]
reports=no
score=no

Command used

Steps to reproduce

git clone https://github.com/yt-dlp/yt-dlp.git
cd yt-dlp
git checkout 5904853ae5788509fdc4892cb7ecdfa9ae7f78e6

cat << EOF > ./profile_pylint.py
import cProfile
import pstats
import sys

sys.argv = ['pylint', '--recursive=y', '.']
cProfile.run('from pylint import __main__', filename='stats')

with open('profiler_stats', 'w', encoding='utf-8') as file:
    stats = pstats.Stats('stats', stream=file)
    stats.sort_stats('tottime')
    stats.print_stats()
EOF

cat << EOF > .pylintrc
[MAIN]
jobs=1

[MESSAGES CONTROL]
disable=all
enable=R1707

[REPORTS]
reports=no
score=no
EOF

python ./profile_pylint.py

Analysis

process_tokens calls is_message_enabled ~1.4 million times:

import pstats

stats = pstats.Stats('stats')
stats.print_callers('is_message_enabled')

 ncalls  tottime  cumtime
1448094    1.515    3.509  pylint/checkers/refactoring/refactoring_checker.py:650(process_tokens)

Pylint output

There are some R1707 errors, but the output is less important than the performance numbers.

Expected behavior

Improved performance via reduced is_message_enabled calls

Pylint version

astroid @ git+https://github.com/pylint-dev/astroid.git@0ccc2e29d4b9ce0f54f9e50fa6df85522083c5de
pylint @ git+https://github.com/pylint-dev/pylint.git@7521eb1dc6ac89fcf1763bee879d1207a87ddefa
Python 3.12.3

OS / Environment

Arch Linux

Additional dependencies

No response

@correctmost correctmost added the Needs triage 📥 Just created, needs acknowledgment, triage, and proper labelling label May 7, 2024
@Pierre-Sassoulas Pierre-Sassoulas added Enhancement ✨ Improvement to a component performance Needs PR This issue is accepted, sufficiently specified and now needs an implementation and removed Needs triage 📥 Just created, needs acknowledgment, triage, and proper labelling labels May 7, 2024
@Pierre-Sassoulas Pierre-Sassoulas self-assigned this May 7, 2024
@Pierre-Sassoulas Pierre-Sassoulas added this to the 3.1.1 milestone May 7, 2024
@Pierre-Sassoulas
Copy link
Member

Wow, this one is shocking, should have been a low hanging fruits during review. Thanks again, this is very valuable.

@Pierre-Sassoulas
Copy link
Member

And in a performance MR of all thing 😄 #8606

Pierre-Sassoulas added a commit to Pierre-Sassoulas/pylint that referenced this issue May 7, 2024
@Pierre-Sassoulas
Copy link
Member

Wait, no, we kinda need to do that on each line (not each token though, and we don't need to recheck everything). The enabling / disabling can be per line.

@Pierre-Sassoulas Pierre-Sassoulas modified the milestones: 3.1.1, 3.2.0 May 7, 2024
Pierre-Sassoulas added a commit to Pierre-Sassoulas/pylint that referenced this issue May 8, 2024
Pierre-Sassoulas added a commit to Pierre-Sassoulas/pylint that referenced this issue May 8, 2024
@jacobtylerwalls jacobtylerwalls modified the milestones: 3.2.0, 3.3.0 May 12, 2024
Pierre-Sassoulas added a commit to Pierre-Sassoulas/pylint that referenced this issue May 13, 2024
Pierre-Sassoulas added a commit to Pierre-Sassoulas/pylint that referenced this issue May 13, 2024
Pierre-Sassoulas added a commit to Pierre-Sassoulas/pylint that referenced this issue May 13, 2024
Performance analysis by correctmost.  Follow up to fix the known limitation will be in pylint-dev#9609

Refs pylint-dev#8606 (follow-up)
Closes pylint-dev#9608
Pierre-Sassoulas added a commit to Pierre-Sassoulas/pylint that referenced this issue May 13, 2024
Performance analysis by correctmost.  Follow up to fix the known limitation will be in pylint-dev#9609

Refs pylint-dev#8606 (follow-up)
Closes pylint-dev#9608
@jacobtylerwalls jacobtylerwalls modified the milestones: 3.3.0, 3.2.0, 3.2.1 May 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement ✨ Improvement to a component Needs PR This issue is accepted, sufficiently specified and now needs an implementation performance
Projects
None yet
3 participants