Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backslashes mixed with RI #68

Open
birderboone opened this issue Feb 19, 2019 · 5 comments
Open

Backslashes mixed with RI #68

birderboone opened this issue Feb 19, 2019 · 5 comments

Comments

@birderboone
Copy link
Contributor

This is a reminder to myself. IN the example of 'IB/USP, Genetica e Biologia/G-1755-2017' in RI, the WOK for some reasons tores the backslash for the name and uses it as a seperator. This will break authors_parse because it doesnt know how to handle the output. AT this point I dont know a way around this, so a skip will have to be introduced

@embruna
Copy link
Collaborator

embruna commented Feb 25, 2019

Can you make a specific rule for this case?

@birderboone
Copy link
Contributor Author

birderboone commented Feb 28, 2019

In my workflow I fixed this instance as its only this group 'IB/USP' that ive yet found to do it, so perhaps we can have it output and error with the specific line thats bringing in problems. And the user can fix it manually before rerunning

It should be noted the file im working from was not downloaded by me, but handed to me in a different format from a colleague that I had to turn into an acceptable format for refsplitr. Its possible that the '
'IB/USP, Genetica e Biologia/G-1755-2017' is solely a creation of my colleague. The whole issue seems really sloppy from a programmatic perspective so that might be more likely than assuming WoS programmers thought this was acceptable.

@embruna
Copy link
Collaborator

embruna commented Feb 28, 2019

Woah...I think you just solved the mystery of why three of the records were throwing the importing of WOS records last week: look at the institutions of the three:

RI
--
IB/USP, Genetica e Biologia/G-1755-2017
IB/USP, Genetica e Biologia/G-1755-2017
MNHN/CNRS/UPMC/IRD, UMR BOREA/B-2312-2012

That's it...it must be because they are using the slashes. So maybe the alternative is to look for slashes in this column at the time of import and convert slashes to dashes?

@birderboone
Copy link
Contributor Author

birderboone commented Feb 28, 2019

It depends on where the slash is, if its in the raw WOS file then we can't have the function do it because it uses the slash as a seperator so it would have to guess which one is the seperator and which ones in the name. Which would confuse it particularly in the 3rd example you showed.

I dont have the raw data for my example, so I cant check what my example was like in the raw data. See if you can find it those reference in your original WoS file. If references_read added the slash then its easily fixeable, if its in the raw quoted just like taht then we'll have to make it skip it or come up with a new solution.

For what its worth, I don't believe references_read adds in slashes ever, so I'm assuming its in the raw data, and hence a weird quirk of WoS

@birderboone
Copy link
Contributor Author

Actually no, ';' is counted as the seperator for each author and the / is assumed to seperate the name from the OI. So I can always just have it assume the last entry is the OI, and convert the rest to dashes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants