Skip to content
This repository has been archived by the owner on May 29, 2018. It is now read-only.
cranmer edited this page Oct 9, 2014 · 4 revisions

What is this project?

This project aims to develop tools to answer a variety of questions about the usage of scientific software. A few motivating examples include:

  • What software is out there?
  • How healthy is any given software project?
  • Which projects are used frequently? Maybe conditioned on a certain domain (eg, astronomy or biology).
  • What code should I use to solve problem X?
  • What is the dependency structure of a group of software projects?

Applications and use cases

Some motivating examples and target applications include:

  • Health monitoring of software: how "alive" is a given project?
  • Trend analysis: how does a project's usage change over time?
  • Search and recommendation: what's out there? what should I use? what do other people use?
  • tool to produce recommended citation when someone is using a tool (possibly including major dependencies)

Implementation, information sources

As a first pass, we can leverage existing sources of informaiton:

  • Software distributions (ubuntu, anaconda, raw source code) to extract dependency structure between projects
  • GitHub (and their API) to monitor analytics (eg downloads) and forking structure
  • arXiv papers, to monitor citation and publications
    • may be able to go via DataCite or equivalent for DOIs
    • check what impactstory.org is doing
    • orchid may also help

Note: the dependency structure is likely to be crucial here, due to academics' tendency to incompletely cite individual packages. (Everyone uses BLAS, but almost nobody cites it directly.)

Points for discussion

  • Dead links: what happens when graduate students graduate? (could monitor this specifically)
  • Identifying citations and urls that correspond to software
  • Entity disambiguation
  • Should we include data sets, or limit attention purely to software?
  • Beyond arXiv: how to deal with other domains?
  • How do we deal with poor citation quality? DOI's tend to be the first thing to get cut for space constraints. Can we motivate publication venues to encourage proper software citation practices?
  • note new "Software Discoverability Index" that has some thoughtful critique http://t.co/4NfXnYX6i5
  • should offer up an API for others as well
  • do we offer ability for users to claim contribution to project ?
  • can we offer help to users by pointing to a good place to seek support (if it's not obvious)?