Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenAlex integration #404

Open
yhan818 opened this issue Nov 20, 2023 · 4 comments
Open

OpenAlex integration #404

yhan818 opened this issue Nov 20, 2023 · 4 comments

Comments

@yhan818
Copy link

yhan818 commented Nov 20, 2023

Summary

Create a new function for convert2df (file, dbsource, format), where file = new file, dbsource=openalex, format=JSON / CSV, which convert openalex outputs to df for Bibliometrix.

Objectives
OpenAlex is the 2nd largest biblio database with 245 M records. It makes sense to integrate openalex with data visualization.

In the current library, there is a function convert2df(file, dbsource, format), which can convert export files downloaded from SCOPUS, Clarivate Analytics WOS, Digital Science Dimensions, PubMed or Cochrane CDSR website. OpenAlex output files can be JSON, Excel, CSV.

@massimoaria
Copy link
Owner

Bibliometrix 4.1.4 (on GitHub) already supports openalex collections through the official package openalexR.
openalexR allows downloading collections from OpenAlex using API. The package includes a function, called oa2bibliometrix(), that converts an openalex collection into a bibliometrix dataframe.

We were the first to believe in the value of OpenAlex, and in fact we invested many resources in creating the official openalexR library (https://docs.ropensci.org/openalexR/) that links R to the OpenAlex API (https://docs.openalex.org/how-to-use-the-api/api-overview#client-libraries).

here an example:

library(openalexR)
#> Thank you for using openalexR!
#> To acknowledge our work, please cite the package by calling `citation("openalexR")`.
#> To suppress this message, add `openalexR.message = suppressed` to your .Renviron file.

works_search <- oa_fetch(
  entity = "works",
  title.search = c("bibliometric analysis", "science mapping"),
  cited_by_count = ">50",
  from_publication_date = "2020-01-01",
  to_publication_date = "2021-12-31",
  options = list(sort = "cited_by_count:desc"),
  verbose = TRUE
)
#> Requesting url: https://api.openalex.org/works?filter=title.search%3Abibliometric%20analysis%7Cscience%20mapping%2Ccited_by_count%3A%3E50%2Cfrom_publication_date%3A2020-01-01%2Cto_publication_date%3A2021-12-31&sort=cited_by_count%3Adesc
#> Getting 1 page of results with a total of 144 records...

M <- oa2bibliometrix(works_search)

library(bibliometrix)
#> Please note that our software is open source and available for use, distributed under the MIT license.
#> When it is used in a publication, we ask that authors properly cite the following reference:
#> 
#> Aria, M. & Cuccurullo, C. (2017) bibliometrix: An R-tool for comprehensive science mapping analysis, 
#>                         Journal of Informetrics, 11(4), pp 959-975, Elsevier.
#> 
#> Failure to properly cite the software is considered a violation of the license.
#>                         
#> For information and bug reports:
#>                         - Take a look at https://www.bibliometrix.org
#>                         - Send an email to info@bibliometrix.org   
#>                         - Write a post on https://github.com/massimoaria/bibliometrix/issues
#>                         
#> Help us to keep Bibliometrix and Biblioshiny free to download and use by contributing with a small donation to support our research team (https://bibliometrix.org/donate.html)
#> 
#>                         
#> To start with the Biblioshiny app, please digit:
#> biblioshiny()

summary(biblioAnalysis(M))
#> 
#> 
#> MAIN INFORMATION ABOUT DATA
#> 
#>  Timespan                              2020 : 2021 
#>  Sources (Journals, Books, etc)        100 
#>  Documents                             144 
#>  Annual Growth Rate %                  -15.38 
#>  Document Average Age                  2.54 
#>  Average citations per doc             105.8 
#>  Average citations per year per doc    30.47 
#>  References                            10166 
#>  
#> DOCUMENT TYPES                     
#>  article      144 
#>  
#> DOCUMENT CONTENTS
#>  Keywords Plus (ID)                    508 
#>  Author's Keywords (DE)                0 
#>  
#> AUTHORS
#>  Authors                               614 
#>  Author Appearances                    674 
#>  Authors of single-authored docs       9 
#>  
#> AUTHORS COLLABORATION
#>  Single-authored docs                  11 
#>  Documents per Author                  0.235 
#>  Co-Authors per Doc                    4.68 
#>  International co-authorships %        50 
#>  
#> 
#> Annual Scientific Production
#> 
#>  Year    Articles
#>     2020       78
#>     2021       66
#> 
#> Annual Percentage Growth Rate -15.38 
#> 
#> 
#> Most Productive Authors
#> 
#>        Authors        Articles    Authors        Articles Fractionalized
#> 1  SATISH KUMAR              9 WALEED M. SWEILEH                    3.00
#> 2  NAVEEN DONTHU             5 SATISH KUMAR                         2.35
#> 3  NITESH PANDEY             5 LENNART ANTE                         1.33
#> 4  AMANDEEP DHIR             4 NAVEEN DONTHU                        1.15
#> 5  DEBIDUTTA PATTNAIK        3 NITESH PANDEY                        1.10
#> 6  WALEED M. SWEILEH         3 AMANDEEP DHIR                        1.08
#> 7  WENG MARC LIM             3 AYYOOB SHARIFI                       1.00
#> 8  XINXIN WANG               3 EMINE CAN‐GÜVEN                      1.00
#> 9  XU ZHANG                  3 FRANCISCO BENITA                     1.00
#> 10 ABSALOM E. EZUGWU         2 HAMID DERVIŞ                         1.00
#> 
#> 
#> Top manuscripts per citations
#> 
#>                                                  Paper                                    DOI   TC TCperYear   NTC
#> 1  NAVEEN DONTHU, 2021, JOURNAL OF BUSINESS RESEARCH            10.1016/j.jbusres.2021.04.070 2093     697.7 19.84
#> 2  SURABHI VERMA, 2020, JOURNAL OF BUSINESS RESEARCH            10.1016/j.jbusres.2020.06.057  503     125.8  4.74
#> 3  NAVEEN DONTHU, 2020, JOURNAL OF BUSINESS RESEARCH            10.1016/j.jbusres.2019.10.039  360      90.0  3.40
#> 4  KIRTI GOYAL, 2020, INTERNATIONAL JOURNAL OF CONSUMER STUDIES 10.1111/ijcs.12605             310      77.5  2.92
#> 5  JOSÉ A. MORAL-MUÑOZ, 2020, PROFESIONAL DE LA INFORMACION     10.3145/epi.2020.ene.03        308      77.0  2.90
#> 6  H. KENT BAKER, 2020, JOURNAL OF BUSINESS RESEARCH            10.1016/j.jbusres.2019.11.025  223      55.8  2.10
#> 7  YUETIAN YU, 2020, ANNALS OF TRANSLATIONAL MEDICINE           10.21037/atm-20-4235           220      55.0  2.07
#> 8  HUALIN XIE, 2020, LAND                                       10.3390/land9010028            203      50.8  1.91
#> 9  SONG XU, 2020, INTERNATIONAL JOURNAL OF PRODUCTION RESEARCH  10.1080/00207543.2020.1717011  192      48.0  1.81
#> 10 LIN ZHAO, 2020, PROCESS SAFETY AND ENVIRONMENTAL PROTECTION  10.1016/j.psep.2019.11.014     190      47.5  1.79
#> 
#> 
#> Corresponding Author's Countries
#> 
#>         Country Articles   Freq SCP MCP MCP_Ratio
#> 1  CHINA              41 0.3154  21  20     0.488
#> 2  SPAIN              14 0.1077  11   3     0.214
#> 3  INDIA              10 0.0769   5   5     0.500
#> 4  ITALY               6 0.0462   3   3     0.500
#> 5  FINLAND             5 0.0385   1   4     0.800
#> 6  USA                 5 0.0385   1   4     0.800
#> 7  GEORGIA             4 0.0308   0   4     1.000
#> 8  ECUADOR             3 0.0231   0   3     1.000
#> 9  SLOVENIA            3 0.0231   1   2     0.667
#> 10 SOUTH AFRICA        3 0.0231   1   2     0.667
#> 
#> 
#> SCP: Single Country Publications
#> 
#> MCP: Multiple Country Publications
#> 
#> 
#> Total Citations per Country
#> 
#>            Country      Total Citations Average Article Citations
#> 1  CHINA                           3581                      87.3
#> 2  GEORGIA                         2652                     663.0
#> 3  INDIA                           1017                     101.7
#> 4  SPAIN                            958                      68.4
#> 5  USA                              656                     131.2
#> 6  ITALY                            525                      87.5
#> 7  DENMARK                          503                     503.0
#> 8  FINLAND                          409                      81.8
#> 9  SLOVENIA                         282                      94.0
#> 10 ECUADOR                          266                      88.7
#> 
#> 
#> Most Relevant Sources
#> 
#>                                  Sources        Articles
#> 1  JOURNAL OF BUSINESS RESEARCH                       11
#> 2  SUSTAINABILITY                                     11
#> 3  JOURNAL OF CLEANER PRODUCTION                       8
#> 4  ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH        4
#> 5  TECHNOLOGICAL FORECASTING AND SOCIAL CHANGE         4
#> 6  GLOBALIZATION AND HEALTH                            3
#> 7  SAFETY SCIENCE                                      3
#> 8  ANNALS OF TRANSLATIONAL MEDICINE                    2
#> 9  CHEMOSPHERE                                         2
#> 10 ECOLOGICAL INDICATORS                               2

Created on 2023-11-21 with reprex v2.0.2

For the time being, we have chosen not to integrate openalex formats into the convert2df() function because the explore.openalex.org website is still in beta and therefore the adopted formats are not yet final.

Also, the collection downloaded with the API implemented in the openalexR library contains a more complete set of metadata than those exported through their website.

Surely an integration into convert2df will be done in the near future.

@polyrobin
Copy link

Trying to use Biblioshiny with OpenAlex data. In Biblioshiny we can import a file from openalexR, but there is no information on the format of this file. I have exported a json file from openalexR, but that file is not supported it seems (gives an error about a magic number or possibly corrupted file). What format do I need to export and how do I do this?

@massimoaria
Copy link
Owner

massimoaria commented Jan 18, 2024

Hi,
Biblioshiny supports OpenAlex data saved as data.frame into an .RData file.

Here an example:

library(openalexR)
#> Thank you for using openalexR!
#> To acknowledge our work, please cite the package by calling `citation("openalexR")`.
#> To suppress this message, add `openalexR.message = suppressed` to your .Renviron file.

works_search <- oa_fetch(
  entity = "works",
  title.search = c("bibliometric analysis", "science mapping"),
  cited_by_count = ">50",
  from_publication_date = "2020-01-01",
  to_publication_date = "2021-12-31",
  options = list(sort = "cited_by_count:desc"),
  verbose = TRUE
)
#> Requesting url: https://api.openalex.org/works?filter=title.search%3Abibliometric%20analysis%7Cscience%20mapping%2Ccited_by_count%3A%3E50%2Cfrom_publication_date%3A2020-01-01%2Cto_publication_date%3A2021-12-31&sort=cited_by_count%3Adesc
#> Getting 1 page of results with a total of 144 records...

M <- oa2bibliometrix(works_search)

save(M, file="openalex_data.rdata")

Then load the file "openalex_data.rdata" in Biblioshiny using the "load bibliometrix file" option in Data menu.

massimoaria added a commit that referenced this issue Feb 23, 2024
@massimoaria
Copy link
Owner

Hi,
we added the full support to OpenAlex csv file format.

The function convert2df now accepts dbsource="openalex" and format "csv" to import and convert OpenAlex csv files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants