Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some wrong ROR IDs #6

Open
psmukhopadhyay opened this issue Apr 25, 2021 · 3 comments
Open

Some wrong ROR IDs #6

psmukhopadhyay opened this issue Apr 25, 2021 · 3 comments

Comments

@psmukhopadhyay
Copy link

We have noticed a few wrong ROR IDs during our attempt to create a subset of India-specific results from the OpenEditor dataset (editors1_ror_and_countries.csv and editors2_ror_and_countries.csv).

The classic two cases as examples are as follows:

A) Indian Institute of Science, Bangalore: the corresponding records for this premier Indian institute show wrong ror IDs in all rows/records - https://ror.org/05j873a45 - This ror ID is actually for Indian Institute of Soil Science (IISS, भाकृअनुप-भारतीय मृदा विज्ञान संस्थान, Website - http://www.iiss.nic.in/index.html)

B) Christian Medical College Vellore, Vellore, India: the corresponding records for this institute show wrong ror ID in all rows/records - https://ror.org/01vj9qy35 - This ror ID is actually for Christian Medical College, Ludhiana (another CMC in another city and state in India) (Website - http://cmcludhiana.in/medical_college/)

Possible reasons:

An API call to ROR database (in affiliation field) for Indian Institute of Science like - https://api.ror.org/organizations?filter=country.country_code:IN&affiliation=Indian+Institute+of+Science - shows a few results (around 14) with following data in json format
++++++++++
{"number_of_results":10,"items":[{"substring":"Indian Institute of Science","score":0.92,"matching_type":"COMMON TERMS","chosen":true,"organization":{"id":"https://ror.org/05j873a45","name":"Indian Institute of Soil Science","email_address":null,"ip_addresses":[],"established":1988,"types":["Facility"],"relationships":[{"label":"Indian Council of Agricultural Research","type":"Parent","id":"https://ror.org/04fw54a43"}],"addresses":[{"lat":23.309722,"lng":77.403056,"state":null,"state_code":null,"city":"Bhopal","geonames_city":{"id":1275841,"city":"Bhopal","geonames_admin1":{"name":"Madhya Pradesh","id":1264542,"ascii_name":"Madhya Pradesh","code":"IN.35"},"geonames_admin2":{"name":"Bhopāl","id":1275842,"ascii_name":"Bhopal","code":"IN.35.444"},"license":{"attribution":"Data from geonames.org under a CC-BY 3.0 license","license":"http://creativecommons.org/licenses/by/3.0/"},"nuts_level1":{"name":null,"code":null},"nuts_level2":{"name":null,"code":null},"nuts_level3":{"name":null,"code":null}},"postcode":null,"primary":false,"line":null,"country_geonames_id":1269750}],"links":["http://www.iiss.nic.in/index.html"],"aliases":[],"acronyms":["IISS"],"status":"active","wikipedia_url":"https://en.wikipedia.org/wiki/Indian_Institute_of_Soil_Science","labels":[{"label":"भाकृअनुप-भारतीय मृदा विज्ञान संस्थान","iso639":"hi"}],"country":{"country_name":"India","country_code":"IN"},"external_ids":{"ISNI":{"preferred":null,"all":["0000 0000 9288 3664"]},"Wikidata":{"preferred":null,"all":["Q18125957"]},"GRID":{"preferred":"grid.464869.1","all":"grid.464869.1"}}}},{"substring":"Indian Institute of Science","score":0.84,"matching_type":"PHRASE","chosen":false,"organization":{"id":"https://ror.org/04dese585","name":"Indian Institute of Science Bangalore","email_address":null,"ip_addresses":[],"established":1909,"types":["Education"],"relationships":[],"addresses":[{"lat":13.021275,"lng":77.565769,"state":null,"state_code":null,"city":"Bengaluru","geonames_city":{"id":1277333,"city":"Bengaluru","geonames_admin1":{"name":"Karnataka","id":1267701,"ascii_name":"Karnataka","code":"IN.19"},"geonames_admin2":{"name":"Bangalore Urban","id":1277331,"ascii_name":"Bangalore Urban","code":"IN.19.572"},"license":{"attribution":"Data from geonames.org under a CC-BY 3.0 license","license":"http://creativecommons.org/licenses/by/3.0/"},"nuts_level1":{"name":null,"code":null},"nuts_level2":{"name":null,"code":null},"nuts_level3":{"name":null,"code":null}},"postcode":null,"primary":false,"line":null,"country_geonames_id":1269750}],"links":["http://www.iisc.ernet.in/"],"aliases":[],"acronyms":["IISc"],"status":"active","wikipedia_url":"http://en.wikipedia.org/wiki/Indian_Institute_of_Science","labels":[{"label":"ఇండియన్ ఇన్ స్టిట్యూట్ ఆఫ్ సైన్స్","iso639":"te"},{"label":"இந்திய அறிவியல் கழகம்","iso639":"ta"},{"label":"ਭਾਰਤੀ ਵਿਗਿਆਨ ਅਦਾਰਾ","iso639":"pa"},{"label":"ഇന്ത്യൻ ഇൻസ്റ്റിറ്റ്യൂട്ട് ഓഫ് സയൻസ്","iso639":"ml"},{"label":"ಭಾರತೀಯ ವಿಜ್ಞಾನ ಸಂಸ್ಥೆ","iso639":"kn"},{"label":"भारतीय विज्ञान संस्थान","iso639":"hi"},{"label":"ભારતીય વિજ્ઞાન સંસ્થા","iso639":"gu"},{"label":"ভারতীয় বিজ্ঞান সংস্থা","iso639":"bn"}],"country":{"country_name":"India","country_code":"IN"},"external_ids":{"ISNI":{"preferred":null,"all":["0000 0001 0482 5067"]},"FundRef":{"preferred":"100007780","all":["100007780","100007871","100008044","100009935"]},"OrgRef":{"preferred":null,"all":["37533"]},"Wikidata":{"preferred":null,"all":["Q948720"]},"GRID":{"preferred":"grid.34980.36","all":"grid.34980.36"}}}},........
++++++++++++++++++++

We can easily understand now that what is the reason for wrong ror ID in this case. The first one i.e Indian Institute of Soil Science has been picked up the process. In fact we have also observed that to be on the safe side score=1.0 is a better condition than chosen==true for extracting ror IDs through API call (but I am not quite sure that you have also adopted API path for ror ID or you are fetching ror IDs through some other means).

We found a total of 455 records (India-specific only) initially with wrong ror IDs in a total of 8170 records having ror IDs (out of 10316 records with affiliated country as India).

I am attaching a csv file containing these 455 records (rorORI column is the ror ID as available in the dataset and rorOEM is the corrected ror ID as fetched for our subset of data)

no-match-report.csv

@psmukhopadhyay psmukhopadhyay changed the title Some wrong ROR ID Some wrong ROR IDs Apr 25, 2021
@ml4rrieu
Copy link

ml4rrieu commented Apr 2, 2022

I also found some problems here (Paris, France) with the ROR matching :
Université de Paris-XII, France, Europe associated to https://ror.org/05f82e368
University of Paris associated to https://ror.org/05f82e368

whereas this are two different universities. (you can find 13 univ. at Paris starting with "univ of paris" but ending differently).

(Kudos for the tools and data ! really appareciate).

@amandafrench
Copy link

Hi @psmukhopadhyay and @ml4rrieu and @andreaspacher -- apologies for belated commenting and out of the blue tagging! I'm the new Technical Community Manager for ROR, and we're beta testing some improvements to our API's ?affiliation matching parameter that I think would help the issues listed here. Also figured it'd be wise to give users a heads up about the forthcoming changes in any case.

One of many changes is that we've removed a lot of false positives. See for instance the difference between the same search on the production server and the staging server, where we're beta testing the changes:

https://api.ror.org/organizations?affiliation=Indian%20Institute%20of%20Science%2C%20Bangalore

https://api.staging.ror.org/organizations?affiliation=Indian%20Institute%20of%20Science%2C%20Bangalore

The request for feedback and link to more documentation and examples is at https://github.com/ror-community/ror-roadmap/discussions/77 -- let us know what you think!

@psmukhopadhyay
Copy link
Author

psmukhopadhyay commented Sep 8, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants