Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use os.listdir() instead of os.scandir() in FileSession during clean-up #2010

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jiangwen365
Copy link

@jiangwen365 jiangwen365 commented Dec 23, 2023

os.scandir() is more effcient than os.listdir() and is generator based.

What kind of change does this PR introduce?

  • bug fix
  • feature
  • docs update
  • tests/coverage improvement
  • refactoring
  • other

What is the related issue number (starting with #)

What is the current behavior? (You can also link to an open issue here)

What is the new behavior (if this is a feature change)?
when cleaning file-based session storage, os.listdir() returns a huge list of file names for very large session directories. so switching to os.scandir() function – a better and faster directory iterator should be beneficial.

Other information:

Checklist:

  • I think the code is well written
  • I wrote good commit messages
  • I have squashed related commits together after the changes have been approved
  • Unit tests for the changes exist
  • Integration tests for the changes exist (if applicable)
  • I used the same coding conventions as the rest of the project
  • The new code doesn't generate linter offenses
  • Documentation reflects the changes
  • The PR relates to only one subject with a clear title
    and description in grammatically correct, complete sentences

os.scandir() is more effcient than os.listdir() and is generator based.
@@ -578,15 +578,16 @@ def clean_up(self):
"""Clean up expired sessions."""
now = self.now()
# Iterate over all session files in self.storage_path
for fname in os.listdir(self.storage_path):
for entry in os.scandir(self.storage_path):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're changing this, let's go for the higher level pathlib interface already rather then working with the low-level os API.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surprisingly, when I attempted later on to switch from using os.scandir() to pathlib's iterdir(), the performance significantly deteriorated. I am currently using a Windows server, and the calls to os.listdir() or path.iterdir() hardly ever return, whereas scandir() was able to return and start iterating within approximately 30 seconds. (I have about 500K session files) So I would suggest sticking to os.scandir() for better performance.

Copy link
Member

@webknjaz webknjaz Dec 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pathlib is the interface of choice for maintainability reasons. Also, using file-based sessions was never intended for high-performance use-cases specifically because file system scans are dependent on the number of files in directories and the lookups aren't exactly linear. Having a 30-second scan suggests that other file system operations during normal runtime aren't quite fast either.
So I'd say it's an architecture/deployment problem on the app side, not the framework. A typical solution is to use session storages that fit better for this purpose — like memcached or redis.

Could you post the timings/metrics showing the difference? And perhaps, report this issue on the upstream CPython tracker?

@webknjaz webknjaz changed the title Replace os.listdir() to os.scandir() Replace os.listdir() to os.scandir() Dec 23, 2023
@jiangwen365 jiangwen365 changed the title Replace os.listdir() to os.scandir() Replace os.listdir() to os.scandir() in FileSession clean up Dec 24, 2023
@webknjaz webknjaz changed the title Replace os.listdir() to os.scandir() in FileSession clean up Use os.listdir() instead of os.scandir() in FileSession during clean-up Dec 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants