Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How many times does the term "github.com" appear in the EPMC article database #1

Open
KirstieJane opened this issue Nov 29, 2017 · 3 comments
Assignees
Labels
Projects

Comments

@KirstieJane
Copy link
Member

I think @Islast has a getpapers (by Content Mine) implementation going....

@KirstieJane KirstieJane added this to Backlog in Code cite via automation Nov 29, 2017
@Islast
Copy link
Collaborator

Islast commented Nov 29, 2017

Hi! I have been using the getpapers tool:

getpapers --query 'github.com' -o githubdotcom
returned 8845 unique papers

getpapers --query 'GitHub' -x -o GitHub
returned 2872 unique papers and 2840 full text xmls

@KirstieJane KirstieJane moved this from Backlog to Doing in Code cite Nov 29, 2017
@Islast
Copy link
Collaborator

Islast commented Nov 29, 2017

I'm having a little trouble finding complete documentation of the EuPMC query format, but it seems that '*github*' is the best search term, as in
getpapers --query '*github*' -o github
This returns 10971 results

@Islast
Copy link
Collaborator

Islast commented Nov 29, 2017

the query
getpapers --query '' -o all
generates 1791432 results, although it would take 2 hours to download the metadata

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Code cite
  
Doing
Development

No branches or pull requests

2 participants