refactor: clean up checks #1155

Nytelife26 · 2021-05-18T21:07:28Z

It has come to my attention that a lot of checks within proselint are dubious at best, or misguided. For instance:

"Christiana" being considered archaic - it is the name of a place. and of a riot, too.
The filter that quite literally just checks for text matching "the n-word" - what use is telling people not to use a way to refer to it, if we aren't telling people the harm done by the word itself?
The hundreds of phrases and words in the cursing.nfl check - some of them are just numbers, and others have many variations included, almost like a poorly-designed censoring system, in contrast with using regex.
The categorization of various corporate types as different - why have airlinese, corporate speak, etc in different categories? They should logically be in different subcategories under corporate jargon.
The same for LGBTQ and sexism - why not put both, and more, under a discriminative / exclusive language based module?

Et cetera. I feel it may be necessary to do a refactor of these checks and categorizations with a formal review to make maintainability easier in future and also to maintain a better linguistic ecosystem.

The text was updated successfully, but these errors were encountered:

suchow · 2021-05-19T02:27:10Z

Good ideas, and I share many of your concerns. However, there are several distinct issues that you've bundled here in #1155 and they should be broken down into smaller issues that can be discussed and completed independently. Here are some possible standalone issues:

Refactoring the checks so that they are organized either by the source of the advice (e.g., David Foster Wallace) OR by the domain of the advice (e.g., hyperbole). In general we've been good about this, organizing checks by the domain of the advice, but there may be some vestiges of an earlier organizational scheme.
Determining the right categorization scheme for checks and groups of related checks (e.g., should airlinese be a subcategory of corporate jargon, should sexism and ageism be subcategories of a discriminatory-language module?)
Improving the archaism check to distinguish archaic vs. modern senses of the same character string.
Improving or perhaps deleting the nword.py check.
Improving or perhaps deleting the cursing.nfl check.
Crafting a principled approach to determining what makes a check dubious or misguided and applying that approach consistently across all of proselint, both retrospectively and going forward, perhaps defining it in a policy document.
Making sure that all the messages are informative.

suchow · 2021-05-19T02:35:37Z

Also, note that cursing.nfl defaults to off, probably for exactly the reason that it produces far too many false alarms in its current state to be useful:

proselint/proselint/.proselintrc

Line 13 in 372ebf0

"cursing.nfl" : false,

As a more general point, we've been wary of any checks that attempt to categorically ban words. The only time that's seemed like a good idea so far is for needless variants, where the determination has already been made for us that the word has no need.

Nytelife26 · 2021-05-19T15:25:53Z

they should be broken down into smaller issues that can be discussed and completed independently.

Strong suggestion, actually. Ultimately I just wanted to put this down as an RFC to get people's thoughts prior to doing any real work.

but there may be some vestiges of an earlier organizational scheme.

I believe so, as that's what we saw with the split between dfw.uncomparables (which didn't exist) and uncomparables.misc. I'll check through them if a cleanup like this does occur.

Determining the right categorization scheme for checks and groups of related checks

That would definitely be the right thing to do going forward I think. I foresee it making maintenance quite a lot easier, and will overall help people to understand the general scope of these checks better.

Improving the archaism check to distinguish archaic vs. modern senses of the same character string.

That likely falls under the same problem we discussed relating to flag-based parsing honestly.

Improving or perhaps deleting [cursing.nword and cursing.nfl].

I would suggest improving them rather than deleting altogether. Principally speaking, many of these things are genuinely words that should be avoided in most contexts, and if we can tighten the error margin much more and make them more definitive, they may very well be suitable for our usage.

Crafting a principled approach to determining what makes a check dubious or misguided and applying that approach consistently across all of proselint, both retrospectively and going forward, perhaps defining it in a policy document.

I would be more than happy to do this. Ultimately it would be good to concretely define and lay out our process for making these decisions and the criteria required for linguistic constructs. For part of this, we could use something similar to my language suitability evaluation framework

Making sure that all the messages are informative.

That would be quite an easy fix, too. Perhaps one best placed in the same restructure as a categorization evaluation.

cursing.nfl defaults to off

I wasn't aware of that, actually - thanks for the tip. I'll be sure to consider .proselintrc and our ability to set defaults in future.

we've been wary of any checks that attempt to categorically ban words

That's for the best, things like that can get authoritative or out of hand quite quickly. It's nice to see these things taken as seriously as they should be. It'll be easier for us to make those decisions once a framework is in place.

suchow · 2021-05-19T16:45:11Z

@Nytelife26 Thanks for the response, we're on the same page on every point :)

Nytelife26 · 2024-02-18T15:10:15Z

This relates to #1362.

Nytelife26 added type: refactor Issues and PRs related to code cleanup. priority: medium Issues and PRs that should be resolved soon. cat: rfc Issues that propose changes, or a Request For Comments. labels May 19, 2021

Nytelife26 mentioned this issue Apr 23, 2024

Ignored check names mismatched #1341

Open

Nytelife26 added this to the 1.0.0 milestone Apr 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: clean up checks #1155

refactor: clean up checks #1155

Nytelife26 commented May 18, 2021

suchow commented May 19, 2021

suchow commented May 19, 2021

Nytelife26 commented May 19, 2021 •

edited

suchow commented May 19, 2021

Nytelife26 commented Feb 18, 2024

refactor: clean up checks #1155

refactor: clean up checks #1155

Comments

Nytelife26 commented May 18, 2021

suchow commented May 19, 2021

suchow commented May 19, 2021

Nytelife26 commented May 19, 2021 • edited

suchow commented May 19, 2021

Nytelife26 commented Feb 18, 2024

Nytelife26 commented May 19, 2021 •

edited