Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MAINT] Ensure consistent capitalisation when referencing functions named after people #2104

Open
ThomasHepworth opened this issue Mar 27, 2024 · 0 comments

Comments

@ThomasHepworth
Copy link
Contributor

Is your proposal related to a problem?

While reviewing spelling errors identified by the spellchecker, @zslade and I have noticed that we're very inconsistent when it comes to capitalising names, when used in functions class names.

For example, both ctl.name_comparison and cl.levenshtein_at_thresholds present these functions in lowercase:

import splink.duckdb.duckdb_comparison_library as cl
import splink.duckdb.comparison_template_library as ctl

print(ctl.name_comparison("first_name", phonetic_col_name = "first_name_dm"))
print(cl.levenshtein_at_thresholds("a"))

producing the following outputs:

Comparison 'Exact match vs. Names with phonetic exact match vs. First_Name within levenshtein threshold 1 vs. First_Name within damerau-levenshtein threshold 1 vs. First_Name within jaro_winkler thresholds 0.9, 0.8 vs. anything else' of "first_name" and "first_name_dm".
Similarity is assessed using the following ComparisonLevels:
    - 'Null' with SQL rule: "first_name_l" IS NULL OR "first_name_r" IS NULL
    - 'Exact match first_name' with SQL rule: "first_name_l" = "first_name_r"
    - 'Exact match first_name_dm' with SQL rule: "first_name_dm_l" = "first_name_dm_r"
    - 'Damerau_levenshtein <= 1' with SQL rule: damerau_levenshtein("first_name_l", "first_name_r") <= 1
    - 'Jaro_winkler_similarity >= 0.9' with SQL rule: jaro_winkler_similarity("first_name_l", "first_name_r") >= 0.9
    - 'Jaro_winkler_similarity >= 0.8' with SQL rule: jaro_winkler_similarity("first_name_l", "first_name_r") >= 0.8
    - 'All other comparisons' with SQL rule: ELSE

<Comparison Exact match vs. A within levenshtein thresholds 1, 2 vs. anything else with 4 levels at 0x106d88c20>

Describe the solution you'd like

To achieve consistency and adhere to standard conventions in our documentation and the wider literature, we propose capitalizing the names in such instances.

Additionally, we've noticed occasional partial capitalisation, notably with cases such as "jaro-winkler", due to the use of the capitalize() method. This only modifies the first letter of the string, leading to outputs like Jaro-winkler from "jaro-winkler".capitalize().

@ThomasHepworth ThomasHepworth changed the title [MAINT] Ensuring Consistent Capitalization in Names of Functions and Classes Named After People [MAINT] Ensuring Consistent Capitalisation in Names of Functions and Classes Named After People Mar 27, 2024
@ThomasHepworth ThomasHepworth changed the title [MAINT] Ensuring Consistent Capitalisation in Names of Functions and Classes Named After People [MAINT] Ensuring consistent capitalisation when referencing functions named after people Mar 27, 2024
@ThomasHepworth ThomasHepworth changed the title [MAINT] Ensuring consistent capitalisation when referencing functions named after people [MAINT] Ensure consistent capitalisation when referencing functions named after people Apr 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant