Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking papers using this code and the diversity statement #11

Open
dalejn opened this issue Jun 16, 2020 · 4 comments
Open

Tracking papers using this code and the diversity statement #11

dalejn opened this issue Jun 16, 2020 · 4 comments

Comments

@dalejn
Copy link
Owner

dalejn commented Jun 16, 2020

Develop automated methods of collating papers that have used the tool and diversity statement (Python w/ CrossRef API using the Zenodo DOI https://zenodo.org/record/3672110).

We plan to include an analysis of the collection of papers that use to code/diversity statement to compare their citation balance to a random selection of similar papers that do not.

@koudyk
Copy link
Contributor

koudyk commented Jun 16, 2020

I'm interested in collaborating on this issue!

@koudyk
Copy link
Contributor

koudyk commented Jun 18, 2020

I thought I'd give you an update on what I've done so far @jastiso and @dalejn

Sorry this is so long... too tired to make it shorter 😴

exploring crossref

I was looking on the crossref website, and it says "Our public APIs include Cited-by counts but not the actual works." So I don't think Crossref will be the best tool for this task (unless you know of a feature I haven't found; let me know if so!)

exploring other sources

To get an idea of what kinds of results I should get, I searched for the paper & code DOIs on Google Scholar and using opencitations. Here are the numbers of results, each linked to the url that I used to search:

Google Scholar opencitations.net
paper 13 4
code 4 0

manually getting the citing papers to start

Since there weren't many results, I went through them manually and copied their Diversity Statements (and other data) into this spreadsheet.

observations about the citing papers

  • 10/13 citing articles were preprints
    • 7 are on arXiv, and I'm pretty sure they don't have DOIs, so they might not be find-able by a tool that uses DOIs
  • 10/13 articles had Danielle Bassett as an author

visualizing whether papers have statements, and what they cite

I thought it might be useful to see whether articles cite the paper and/or the code, and whether they all have diversity statements, so I made this figure (code). I thought it might help you decide how to search for papers. E.g., one paper with a diversity statement cites the paper, but not the code.

visualizing the percentages listed in citing papers' diversity statements

And I figured I might as well do a visualization (code) of the percentages reported in the diversity statements, since I had manually copied and pasted all the diversity statements into a spreadsheet.

next steps

Next, I'll try finding a way to automate finding papers that cite the doi. I know this was supposed to be the goal, but I did this manual stuff to figure out what I should expect.

@dalejn
Copy link
Owner Author

dalejn commented Jun 18, 2020

Wow, thank you for working on this, @koudyk. This is awesome! That's a great point to think about in your Venn diagram. Nothing comes immediately to mind, but I wonder if there's a way to automatically search for the boilerplate text of the Diversity Statement itself. That might require a full-text search when the papers are deposited into something like PubMed. Or maybe it's possible to perform through the pre-print servers' PDFs?

For the last visualization on percentages listed in citing papers' diversity statements, could you change the graph to depict the percent over/under-citation compared to expected benchmarks? The expected benchmarks are reported by Dworkin et al. are 6.7% for woman(first)/woman(last), 9.4% for man/woman, 25.5% for woman/man, and 58.4% for man/man. Also, could the predicted gender categories be labeled man/woman? This distinction is made to note that our analysis does not consider sex implied by male/female.

Another interesting thing we could look at is how much impact using the statement and tools had on men compared to women. I wonder if we could port some of the code from cleanBib.ipynb to analyze the first/last-authors' predicted gender of the list of citing papers your code collects. Then, we could make a figure similar to Fig 3 of the Dworkin et al. paper.

@koudyk
Copy link
Contributor

koudyk commented Jun 18, 2020

Thanks @dalejn!

"Or maybe it's possible to perform through the pre-print servers' PDFs?"
I've heard that it's hard to get text from PDFs, but I've never tried it! I wonder if it might be easier to wait until more citing papers are published, so that their text is available in a more machine-readable format.

Re the visualization, here's a figure with relative percentages. Thanks for pointing out the distinction between man/woman and male/female! I changed the figure labels accordingly.

I didn't get around making separate figures according to the predicted gender of the authors of citing papers. It would be very interesting and I'd be down to revisit it, maybe when there are more citing papers published.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants