Add more end-user tools and scripts #202

kristian-clausal · 2023-01-20T12:15:56Z

I've added a usertools/ folder with three starting scripts. Two of them are for sorting our .json output files so that you get consistent sorting between runs, the third is a word search that trawls through a wiktextract .json file, with a toggle for regex, filtering by language(s) and a max output count.

If you have anything that you would like to add there, just put up a pull request with a new file in usertools/, or if you have specific requests post here.

These are meant to be small command-line scripts for the most part, but even that's not a must as long as it could be helpful to someone somewhere down the line. It would be very nice if they are simple and easy to understand even for people new to programming, so that they can be edited (and resubmitted as new variant scripts).

kristian-clausal · 2023-01-26T12:52:40Z

The sorting scripts aren't nearly enough to get a working diff. I've committed json-compare-samples.py which takes two files, indexes one of them trying to give each json object its own key (which doesn't work when there's not enough distinguishing info and many "Noun" "Noun" "Noun" sections inside the same etymology...), then the other file is jumped through and each line has a one in N chance (--one-in-a) to be chosen as a sample. The sample is also wrung through the same process to craft a key that should correspond with one in the index of the first file, and then those two lines are compared using difflib if they are different; this outputs something like a diff for each object being compared, comparing lines of strings.

kristian-clausal added enhancement New feature or request good first issue Good for newcomers labels Jan 20, 2023

kristian-clausal self-assigned this Jan 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add more end-user tools and scripts #202

Add more end-user tools and scripts #202

kristian-clausal commented Jan 20, 2023

kristian-clausal commented Jan 26, 2023 •

edited

Add more end-user tools and scripts #202

Add more end-user tools and scripts #202

Comments

kristian-clausal commented Jan 20, 2023

kristian-clausal commented Jan 26, 2023 • edited

kristian-clausal commented Jan 26, 2023 •

edited