Skip to content

Navigation Menu

Explore
For
- Enterprise
- Teams
- Startups
- Education
By Solution
Resources
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

ilri / csv-metadata-quality Public

Notifications You must be signed in to change notification settings
Fork 0
Star 13

Code
Issues 1
Pull requests 7
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Releases: ilri/csv-metadata-quality

Releases · ilri/csv-metadata-quality

Version 0.6.1

23 Feb 08:50

alanorth

This tag was signed with the committer’s verified signature.

alanorth Alan Orth

GPG key ID: 0FB860CC9C45B1B9

Learn about vigilant mode.

This commit was signed with the committer’s verified signature.

alanorth Alan Orth

GPG key ID: 0FB860CC9C45B1B9

Learn about vigilant mode.

Compare

Choose a tag to compare

Version 0.6.1 Latest

Latest

Fixed

Missing region check should ignore subregion field, if it exists

Changed

Use SPDX license data from SPDX themselves instead of spdx-license-list because it is deprecated and outdated
Require Python 3.9+
Don't run fix.separators() on title or abstract fields
Don't run whitespace or newline fixes on abstract fields
Ignore some common non-SPDX licenses
Ignore __description suffix in filenames meant for SAFBuilder when checking for uncommon file extensions

Updated

Python dependencies

Assets 2

All reactions

Version 0.6.0

02 Sep 13:40

alanorth

This tag was signed with the committer’s verified signature.

alanorth Alan Orth

GPG key ID: 0FB860CC9C45B1B9

Learn about vigilant mode.

This commit was signed with the committer’s verified signature.

alanorth Alan Orth

GPG key ID: 0FB860CC9C45B1B9

Learn about vigilant mode.

Compare

Choose a tag to compare

Version 0.6.0

Changed

Perform fix for "unnecessary" Unicode characters after we try to fix encoding issues with ftfy
ftfy heuristics to use is_bad() instead of sequence_weirdness()
ftfy fix_text() to not change “smart quotes” to "ASCII quotes"

Updated

Python dependencies
Metadata field exclude logic

Added

Ability to drop invalid AGROVOC values with -d when checking AGROVOC values with -a <field.name>
Ability to add missing UN M.49 regions when both country and region columns are present. Enable with -u (unsafe fixes) for now.

Removed

Support for reading Excel files (both .xls and .xlsx) as it was completely untested

Assets 2

All reactions

Version 0.5.0

08 Dec 13:34

alanorth

This commit was signed with the committer’s verified signature.

alanorth Alan Orth

GPG key ID: 0FB860CC9C45B1B9

Learn about vigilant mode.

Compare

Choose a tag to compare

Version 0.5.0

Added

Ability to check for, and fix, "mojibake" characters using ftfy
Ability to check if the item's title exists in the citation
Ability to check if an item has countries, but no matching regions (only suggests missing regions if there is a region field in the CSV)

Updated

Python dependencies

Fixed

Regular expression to match all citation fields (dc.identifier.citation as well as dcterms.bibliographicCitation) in experimental.correct_language()
Regular expression to match dc.title and dcterms.title, but ignore dc.title.alternative check.duplicate_items()
Missing field name in fix.newlines() output

Assets 2

All reactions

Version 0.4.7

17 Mar 08:03

alanorth

This tag was signed with the committer’s verified signature.

alanorth Alan Orth

GPG key ID: 0FB860CC9C45B1B9

Learn about vigilant mode.

This commit was signed with the committer’s verified signature.

alanorth Alan Orth

GPG key ID: 0FB860CC9C45B1B9

Learn about vigilant mode.

Compare

Choose a tag to compare

Version 0.4.7

Changed

Fixing invalid multi-value separators like | and ||| is no longer classified as "unsafe" as I have yet to see a case where this was intentional
Not user visible, but now checks only print a warning to the screen instead of returning a value and re-writing the DataFrame, which should be faster and use less memory

Added

Configurable directory for AGROVOC requests cache (to allow running the web version from Google App Engine where we can only write to /tmp)
Ability to check for duplicate items in the data set (uses a combination of the title, type, and date issued to determine uniqueness)

Removed

Checks for invalid and unnecessary multi-value separators because now I fix them whenever I see them, so there is no need to have checks for them

Updated

Run poetry update to update project dependencies

Assets 2

All reactions

Version 0.4.6

11 Mar 10:20

alanorth

This tag was signed with the committer’s verified signature.

alanorth Alan Orth

GPG key ID: 0FB860CC9C45B1B9

Learn about vigilant mode.

This commit was signed with the committer’s verified signature.

alanorth Alan Orth

GPG key ID: 0FB860CC9C45B1B9

Learn about vigilant mode.

Compare

Choose a tag to compare

Version 0.4.6

Added

Validation of dcterms.license field against SPDX license identifiers

Changed

Use DCTERMS fields where possible in data/test.csv

Updated

Run poetry update to update project dependencies

Fixed

Output for all fixes should be green, because it is good

Assets 2

All reactions

Version 0.4.5

04 Mar 19:41

alanorth

This tag was signed with the committer’s verified signature.

alanorth Alan Orth

GPG key ID: 0FB860CC9C45B1B9

Learn about vigilant mode.

This commit was signed with the committer’s verified signature.

alanorth Alan Orth

GPG key ID: 0FB860CC9C45B1B9

Learn about vigilant mode.

Compare

Choose a tag to compare

Version 0.4.5

Added

Check dates in dcterms.issued field as well, not just fields that have the word "date" in them

Updated

Run poetry update to update project dependencies

Assets 2

All reactions

Version 0.4.4

21 Feb 11:27

alanorth

This tag was signed with the committer’s verified signature.

alanorth Alan Orth

GPG key ID: 0FB860CC9C45B1B9

Learn about vigilant mode.

This commit was signed with the committer’s verified signature.

alanorth Alan Orth

GPG key ID: 0FB860CC9C45B1B9

Learn about vigilant mode.

Compare

Choose a tag to compare

Version 0.4.4

Added

Accept dates formatted in ISO 8601 extended with combined date and time, for example: 2020-08-31T11:04:56Z
Colorized output: red for errors, yellow for warnings and information, green for changes

Updated

Run poetry update to update project dependencies

Assets 2

All reactions

Version 0.4.3

26 Jan 13:28

alanorth

This tag was signed with the committer’s verified signature.

alanorth Alan Orth

GPG key ID: 0FB860CC9C45B1B9

Learn about vigilant mode.

This commit was signed with the committer’s verified signature.

alanorth Alan Orth

GPG key ID: 0FB860CC9C45B1B9

Learn about vigilant mode.

Compare

Choose a tag to compare

Version 0.4.3

Changed

Reformat with black
Now requires Python 3.7+ for pandas 1.2.0

Updated

Run poetry update to update dependencies
Expand check/fix for multi-value separators to include metadata with invalid separators at the end, for example Kenya||Tanzania||

Assets 2

All reactions

Version 0.4.2

06 Jul 11:08

alanorth

This tag was signed with the committer’s verified signature.

alanorth Alan Orth

GPG key ID: 0FB860CC9C45B1B9

Learn about vigilant mode.

This commit was signed with the committer’s verified signature.

alanorth Alan Orth

GPG key ID: 0FB860CC9C45B1B9

Learn about vigilant mode.

Compare

Choose a tag to compare

Version 0.4.2

Changed

Add field name to the output for more fixes and checks to help identify where the error is
Minor optimizations to AGROVOC subject lookup
Use Poetry instead of Pipenv

Updated

Update python dependencies to latest versions

Assets 2

All reactions

Version 0.4.1

15 Jan 10:22

alanorth

This tag was signed with the committer’s verified signature.

alanorth Alan Orth

GPG key ID: 0FB860CC9C45B1B9

Learn about vigilant mode.

This commit was signed with the committer’s verified signature.

alanorth Alan Orth

GPG key ID: 0FB860CC9C45B1B9

Learn about vigilant mode.

Compare

Choose a tag to compare

Version 0.4.1

Changed

Reduce minimum Python version to 3.6 by working around the is_normalized() that only works in Python >= 3.8

Assets 2

All reactions

Previous 1 2 Next

Previous Next

Footer

© 2024 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.