Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

modernization of proselint #1361

Open
wants to merge 249 commits into
base: main
Choose a base branch
from
Open

modernization of proselint #1361

wants to merge 249 commits into from

Conversation

orgua
Copy link
Collaborator

@orgua orgua commented Jan 5, 2024

Hello,

I put some days into improving the codebase. A lot of open issues should be solved by this. BUT changes are far more substantial as planned - so I wanted to ask how to proceed.

This repo is not very active in the last years. I would offer to maintain the project for the time being and also be open for guiding input. Some of the open Pull Request would be beneficial to the project. I also got plans for new features, improving ease-of-use, more tests for correctness and other optimizations.

I'm open for a chat. Mail is in setup.cfg of one of my projects.

If there are other plans for the project I would probably fork and start fresh - this is not my preferred solution.

Biggest Changes

  • featureset of py38
  • cleaner and faster code
  • fixed bugs, undefined behavior, broken legacy code
  • some checks did not work as intended
  • multiprocessed parallelization and other optimizations

Benchmark-Comparison

  • custom set of files, same hardware

Windows

Proselint-main (uncached & cached)

  • Found 104 lint-warnings in 47.579 s
  • Found 104 lint-warnings in 0.878 s

Proselint-modernized

  • Found 108 lint-warnings in 39.930 s (12 files, 617.45 kiByte) -> serialized
  • Found 108 lint-warnings in 13.164 s (12 files, 617.45 kiByte) -> serial files, parallel checks
  • Found 108 lint-warnings in 9.931 s (12 files, 617.45 kiByte) -> global check-executor
  • Found 108 lint-warnings in 0.011 s (12 files, 617.45 kiByte) -> cached

Linux / WSL

Proselint-main (uncached & cached)

  • Found 104 lint-warnings in 37.041 s
  • Found 104 lint-warnings in 0.771 s

Proselint-modernized

  • Found 108 lint-warnings in 34.521 s (12 files, 617.45 kiByte)
  • Found 108 lint-warnings in 8.248 s (12 files, 617.45 kiByte)
  • Found 108 lint-warnings in 6.098 s (12 files, 617.45 kiByte)
  • Found 108 lint-warnings in 0.044 s (12 files, 617.45 kiByte)

Breaking changes

  • cli arg --time -> --benchmark, -b
  • cli arg --debug -> --verbose, -v
  • exit code now returns number of errors
  • cli arg --output-format controls old flags json, compact
  • py >= 3.8
  • api-changes -> web_scripts & plugins untested
  • config adjustments (1 test removed, 1 added)
    • "misc.metaconcepts": False, # TODO: remove, was duplicate of scare_quotes

Whats broken / untested

  • web_scripts
  • plugins
  • Github action

Detailed Changelog

  • featureset of py38
  • use pathlib instead of string based paths
  • add type-hinting
  • replaced memoizer, short-comings:
    • shelves have a good interface but are slow
    • cache-init and -closing was a mess
    • clearing cache resulted in broken states
    • memoize-wrapper was wasting resources (hash of text recalc for every check, ..)
    • age of cache-entries was not considered
    • every cached fn had its own file-cache
    • cache migration was done on every memoize-decoration
  • cache
    • hash test beforehand only once
    • consider age of items
    • memoize lint() instead of checks
  • fix bugs related to:
    • overshadowing builtin names (list, ...)
    • string based path-traversal
    • mix of relative and absolute paths (also relics that deleted file somewhere)
    • varius small bugs (ie. missing comma in element lists)
    • cache was not cleaned up by fn (as it should)
  • lots of modernization from old codebase
    • latest python versions had trouble with proselint
  • update deps
  • reformat using black-style, done by ruff
  • linted codebase with ruff
  • add logger and refactor print-output
    • linter informs about duration
    • results include link to file
    • inform about duration, files scanned and text-size
  • refactor unittest
    • remove classes / boilerplate
    • move tests for checks into separate dir
    • more helpful error-messages
    • remove duplicates
    • repair broken tests
    • lower complexity of tests (each test should only test for one thing)
  • add unittests
    • detect missing check-flags in default-config
    • find unavaible checks for default-config-flags
    • don't fail on wrong config-flags
    • check working cache, speed-test
    • check working cache, same results
  • minimal changes to outer api
  • config
    • correctness of default-config is tested
    • don't fail on faulty config-flags
  • checks
    • remove duplicates
    • fix padding for checks using regex
  • existence_check() - join and padding -> cleared up
    • join is always done
    • padding can be selected py Enum, defaults to separate in text
    • fix checks accordingly
  • put root-scripts into sep dir
  • exithandler for more graceful ctrl+c
  • ppm-wrapper is now more forgiving with small sample-sizes
  • add parallelization
    • checks() are run multiprocessed
    • one global executioner collects tasks for all files to analyze
    • break up the slowest checks to balance the load
  • optimizations were done with profiling and benchmarking
  • overhaul controling output-format by cli

arostkowycz15 and others added 30 commits April 17, 2023 18:45
Bumps [ruby/setup-ruby](https://github.com/ruby/setup-ruby) from 1.95.0 to 1.163.0.
- [Release notes](https://github.com/ruby/setup-ruby/releases)
- [Commits](ruby/setup-ruby@v1.95.0...v1.163.0)

---
updated-dependencies:
- dependency-name: ruby/setup-ruby
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Bumps [codecov/codecov-action](https://github.com/codecov/codecov-action) from 2.1.0 to 3.1.4.
- [Release notes](https://github.com/codecov/codecov-action/releases)
- [Changelog](https://github.com/codecov/codecov-action/blob/main/CHANGELOG.md)
- [Commits](codecov/codecov-action@v2.1.0...v3.1.4)

---
updated-dependencies:
- dependency-name: codecov/codecov-action
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Bumps [actions/setup-python](https://github.com/actions/setup-python) from 2 to 5.
- [Release notes](https://github.com/actions/setup-python/releases)
- [Commits](actions/setup-python@v2...v5)

---
updated-dependencies:
- dependency-name: actions/setup-python
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Bumps [actions/cache](https://github.com/actions/cache) from 2 to 3.
- [Release notes](https://github.com/actions/cache/releases)
- [Changelog](https://github.com/actions/cache/blob/main/RELEASES.md)
- [Commits](actions/cache@v2...v3)

---
updated-dependencies:
- dependency-name: actions/cache
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Bumps [pytest](https://github.com/pytest-dev/pytest) from 6.2.5 to 7.4.3.
- [Release notes](https://github.com/pytest-dev/pytest/releases)
- [Changelog](https://github.com/pytest-dev/pytest/blob/main/CHANGELOG.rst)
- [Commits](pytest-dev/pytest@6.2.5...7.4.3)

---
updated-dependencies:
- dependency-name: pytest
  dependency-type: direct:development
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
…che-3

chore(deps): bump actions/cache from 2 to 3
…tup-python-5

chore(deps): bump actions/setup-python from 2 to 5
chore(deps-dev): bump pytest from 6.2.5 to 7.4.3
Bumps [actions/checkout](https://github.com/actions/checkout) from 2 to 4.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](actions/checkout@v2...v4)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
…eckout-4

chore(deps): bump actions/checkout from 2 to 4
…-ruby-1.163.0

chore(deps): bump ruby/setup-ruby from 1.95.0 to 1.163.0
…decov-action-3.1.4

chore(deps): bump codecov/codecov-action from 2.1.0 to 3.1.4
output was checked for errors
@Nytelife26
Copy link
Member

I have managed to rebase everything successfully, and get pytest to run. A new problem is that TypeAlias is not available on Python versions prior to 3.10. I am not sure how you would like to proceed with that.

.ruff.toml Outdated
@@ -1,5 +1,5 @@

#line-length = 90
line-length = 80
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we at least compromise on 88 chars, as it is widely used by black? 80 chars are a relic from the 80's a bit of from the current sweetspot.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

88 is a non-standard line length. Sure, 80 characters is an old requirement, but I find you can pretty much always achieve what you need to and it forces you to keep things clean.

It's personal preference, I suppose, but since I refactored proselint the first time it has used 80 character line lengths.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

black is the current standard (as it is widely used) in python world - even ruff adopted it and formats the same :)
but ok

Copy link
Contributor

@ferdnyc ferdnyc May 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Black's completely arbitrary, sarcastically-justified 88-character line length is one of its more controversial "opinions", though.

Particularly as it violates PEP8 — an actual standard, not (at best) a "de facto standard" like Black. (Though PEP8 admittedly did itself no favors by specifying 79 characters instead of 80, something that was met with almost immediate, near-universal rejection.)

IIRC line length was also the very first configuration option Black gained, once its original vision of being completely non-configurable collided headfirst with reality.

proselint/checks/__init__.py Outdated Show resolved Hide resolved
proselint/score.py Show resolved Hide resolved
scripts_web/app.py Outdated Show resolved Hide resolved
tests/checks/test_misc_but.py Show resolved Hide resolved
if level == "sentence":
pass
elif level == "paragraph":
if level in {"sentence", "paragraph"}:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thats really wrong, as these are placeholders for custom sub-functionality. not from me, but it had a reason to be like that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was a lint suggestion, and it checks out to me. I believe the plan was to add some eventual logic, but I see no reason not to tidy up until then.

.ruff.toml Outdated
@@ -1,5 +1,5 @@

#line-length = 90
line-length = 80
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

black is the current standard (as it is widely used) in python world - even ruff adopted it and formats the same :)
but ok

proselint/checks/__init__.py Outdated Show resolved Hide resolved
proselint/checks/misc/capitalization.py Outdated Show resolved Hide resolved
@orgua
Copy link
Collaborator Author

orgua commented Jan 25, 2024

I have managed to rebase everything successfully, and get pytest to run. A new problem is that TypeAlias is not available on Python versions prior to 3.10. I am not sure how you would like to proceed with that.

Already replaced one of the typeAlias with a named tuple, will also extend the second as it's cleaner.

That latest force-push really messes up a lot - I had to do a rebase on my current main (this PR) and hoped /dev would be smoother, but its a hot mess. I have to handle 80+ commits indiviually, deleted files are reappearing randomly.
Update: I stopped the rebase and did a merge commit. The PR seems to work - still WIP. Is it ok for you to review it there? I would propose to:

  1. bring the /dev-addition to a working state
  2. update proselint/main next as handling this gets more and more complicated
  3. after that we can talk about the next steps. A bunch of issues can be closed. Do you wanna do a release inbetween or change more core-structures and merge in PRs?

@Nytelife26
Copy link
Member

Already replaced one of the typeAlias with a named tuple, will also extend the second as it's cleaner.

That's good news, thank you.

That latest force-push really messes up a lot - I had to do a rebase on my current main (this PR) and hoped /dev would be smoother, but its a hot mess. I have to handle 80+ commits indiviually, deleted files are reappearing randomly.

I know it's not ideal, and I apologise sincerely for that. Thank you for sticking with me and continuing to work on it.

Update: I stopped the rebase and did a merge commit. The PR seems to work - still WIP. Is it ok for you to review it there?

Of course. I'll take a look shortly.

  1. update proselint/main next as handling this gets more and more complicated

To make this easier, it might be possible to split this into multiple more atomic PRs. I am not sure how you would like to group them, but that would be my suggestion at this stage. I am of course okay with proceeding to review these in whatever structure you like, that's just a comment from my experience.

  1. after that we can talk about the next steps. A bunch of issues can be closed. Do you wanna do a release inbetween or change more core-structures and merge in PRs?

I would like to at least make progress on the existing PRs around refactoring the configuration system and such, but it might be better to release most of these changes first. Having too many breaking changes in one release could prove hard for people to keep up with. The userbase is, to my knowledge, already dwindling from its former state, so driving more people out would not be ideal.

@orgua
Copy link
Collaborator Author

orgua commented Jan 25, 2024

Allright. Don't know what PR-splitting brings, but OK.
You are right - config needs a major overhaul, same for the check-structure. I already opened two issues to collect ideas and coordinate the effort to put it on a future-proof foundation.

reference:

@Nytelife26
Copy link
Member

I am working on the check categories. I have some basic ideas for now, but more is to follow later.

@Nytelife26 Nytelife26 mentioned this pull request Feb 9, 2024
Comment on lines +67 to +68
[lint.isort]
force-single-line = true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[lint.isort]
force-single-line = true

I think this should be removed. It makes imports unwieldy.

Comment on lines +30 to +32
###############################################################################
# Check-Interface used by linter ##############################################
###############################################################################
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find large block comments like this to be ugly, so any other method of separation would be preferred.

@Nytelife26
Copy link
Member

As a follow-up, you were right. Splitting this PR up would be a nightmare. It would be better to proceed with bringing in your dev branch changes, tidy up the commits here, and then merge this before we work on the other proposals.

@Nytelife26 Nytelife26 added this to the 1.0.0 milestone Apr 23, 2024
@Nytelife26
Copy link
Member

note for myself: it would be useful to consider the discussion in #1369 as part of this, or maybe as a later change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cat: maintenance Issues and PRs related to the maintenance of a module. priority: high Issues and PRs that should be resolved as soon as possible. status: wip Issues and PRs that are still a work in progress. type: refactor Issues and PRs related to code cleanup. version: major Issues and PRs with breaking changes belonging to the next major release.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants