Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aggregate project citations from maintainer-curated documents #186

Open
Peter-Metz opened this issue Aug 5, 2020 · 10 comments
Open

Aggregate project citations from maintainer-curated documents #186

Peter-Metz opened this issue Aug 5, 2020 · 10 comments
Labels
enhancement New feature or request

Comments

@Peter-Metz
Copy link
Collaborator

This issue builds on the discussion started in #181 about aggregating PSL project citations.

A brief overview of what @rickecon and @jdebacker have suggested so far:

  • Each project curates a living document that lists places where the model is cited. @rickecon suggested that this doc could be a markdown file that pulls relevant citations from a separate, potentially more compressive, references.bib file. Alternatively, this document could be a citations.bib file with just the relevant citations.
  • @jdebacker suggested creating a GH action in the PSL-Infrastructure repo that periodically collects the separate documents, aggregates them into a single file, and publishes it on pslmodels.org
@chusloj
Copy link
Contributor

chusloj commented Aug 7, 2020

@MattHJensen and I have curated a list of citations for (mostly) Tax Calculator in Zotero. Following suggestions from\ @rickecon and @jdebacker I will take a first stab at creating a references.bib file for Tax Calculator that exports citations from Zotero in .bib format regularly using a GH action.

@chusloj
Copy link
Contributor

chusloj commented Aug 19, 2020

@rickecon – Just to clarify, in #181, were you suggesting that the markdown file with each project's own citations (other people citing the project) is manually edited, or automatically updated every time the references.bib file was updated?

@jdebacker
Copy link
Member

jdebacker commented Aug 19, 2020

FYI @chusloj I've put two .bib files into the OG-USA repo:

  1. references.bib -- this has references that the OG-USA docs cite
  2. citations.bib -- this has references of places where the OG-USA model was cited/referenced

You can see the References page here and the Citations page here (note that the citations are incomplete - I was just testing this way of doing things).

I see the following advantages of separate files for references and citations:

  1. Easily identifies the differences between the two types of references.
  2. PSL-Infrastructure can read the citations.bib files directly (rather than having to scrape markdown files) and then use BibTeX (as available in Jupyter-Book) to format the citations as the PSL-Infrastructure project deems fit (as opposed to scraping markdown where references may have a different format than what PSL-Infrastructure would like to present).
  3. If we place the citations.bib files in the top-level directory of each PSL repo (as we have done in OG-USA), that makes it easy to create a script to find these files. You could put a references.bib in the top directory, but those files contain lots of references that are not as relevant to others looking through the repo. My opinion is that I'd rather project easily the list of places where the model has been used - rather than the list of references upon which the model draws. The later would be in the repo, but in subdirectories for documentation.

The drawback I see to separate references files is that there maybe some duplication across files -- e.g., some citations of the where OG-USA is used are to academic papers that we reference as places to look for further detail on the theory underlying the model, so these references are repeated in references.bib and citations.bib.

I think this can work well, but will be interested in what you and @rickecon think about its utility and implementability.

@chusloj
Copy link
Contributor

chusloj commented Aug 19, 2020

@jdebacker Thank you for this comment. How would you auto-update the citations.bib file? I'm trying to pin down an automated way to do so.

@jdebacker
Copy link
Member

jdebacker commented Aug 19, 2020

@chusloj asks:

How would you auto-update the citations.bib file? I'm trying to pin down an automated way to do so.

The citations.bib file in each repo would be the responsibility of the maintainers of that repo to keep up to date.

PSL-Infrastructure would have a file similar to citations.md in the OG-USA repo. It could look something like:

# Citations and use cases of PSL Models

## Tax-Calculator
```{bibliography} https://github.com/PSLmodels/Tax-Calculator/blob/master/citations.bib
```

## OG-USA
```{bibliography} https://github.com/PSLmodels/OG-USA/blob/master/citations.bib
```

## PCI-China
```{bibliography} https://github.com/PSLmodels/PCI-China/blob/master/citations.bib
```
.
.
.
.

Of course, you could also put the citations on separate pages or insert additional content between the lists of citations.

With a file or files like this, PSL-Infrastructure would have a GH Action that compiles this citations.md (or the several of them) file each night, rending it as HTML, and pushing to the PSL-Infrastructure host (e.g., GH-pages).

I think this should work, but I haven't tried it and maybe missing something in these steps.

@chusloj
Copy link
Contributor

chusloj commented Aug 20, 2020

Here's an idea. A new citations.html page can be created been created that lists each project as a hyperlink, where the hyperlink re-directs a user to the citations page on each project's Jupyter Book documentation site.

To use Tax-Calculator as an example, I'm thinking the following:

  1. Scrape the whole .bib file of citations for Tax-Calculator from Zotero using something similar to the following:
curl -H 'Zotero-API-Version: 2' -H 'Zotero-API-Key: <key>' 'https://api.zotero.org/users/6708260/items?format=bibtex'
  1. Use Launchd, the same program that @Peter-Metz uses to update the PSL_catalog.json file daily, to scrape this .bib file regularly (probably daily) and push it to the Tax-Calculator Jupyter Book docs – The .bib file renders automatically as a page with formatted citations which @jdebacker shows in Aggregate project citations from maintainer-curated documents #186.
  2. The GH action which builds the JB docs daily will take care of the rest.

Alternatively, a new "citations" link under each project on the catalog page could be created which re-directs a user to that project's citations on its Jupyter Books documentation site.

I've tried looking around available GitHub Actions for the ability to download files via curl, but nothing panned out. I'm not well-versed in GH Actions so I welcome any suggestions for GH actions that fit this use case.

@MattHJensen

@Peter-Metz
Copy link
Collaborator Author

@chusloj that work flow for Tax-Calculator sounds promising -- it might be worth opening an issue in the tax-calc repo.

In my view, the downside of listing hyperlinks is that it would require projects to create docs websites, and that's not a requirement for PSL inclusion. Also, it would generally be useful to collect citations in a single document as @jdebacker suggested. I'd be very happy to participate in the development of a tool that does this.

@chusloj
Copy link
Contributor

chusloj commented Aug 21, 2020

@Peter-Metz Thanks for your input. Aside from creating a Jupyter Books page that can automatically format citations, we can make a markdown file that cites each of the references in a .bib file and uses pandoc to auto-format the citations, but that markdown file would have to be manually updated each time a new reference is added. The new development tool you suggest could use something such as this to automatically write markdown files.

@chusloj
Copy link
Contributor

chusloj commented Aug 21, 2020

The discussion for Tax-Calculator citations specifically has been continued at Tax-Calculator#2470.

@MattHJensen
Copy link
Contributor

In #186 (comment), @jdebacker suggested listing citations by project.

That raises the question, how would works that rely on several projects be included? For example, most projects using OG-USA also rely on Tax-Calculator and TaxData. Would such a work appear three times in the PSL-infrastructure citations document?

Perhaps that's the best place to start while projects are putting together these citations docs initially, but over time we may want to adopt some SHOULD style guidelines so that we can easily identify common citations, list each citation once, and include tags or similar for the PSL projects they rely on. E.g., a prettified version of:

WORK CITING PSL BIB INFO [TaxData][Tax-Calculator][OG-USA]

@Peter-Metz Peter-Metz added the enhancement New feature or request label Feb 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants