Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking download statistics #27

Open
rouson opened this issue Nov 12, 2015 · 12 comments
Open

Tracking download statistics #27

rouson opened this issue Nov 12, 2015 · 12 comments

Comments

@rouson
Copy link

rouson commented Nov 12, 2015

One thing I think is very important when going after funding is having data on the usage and impact of the software. This could take several forms:

  1. Download statistics.
  2. A list of installations at major sites, e.g., ask CINECA if they would install it for users.
  3. Support letters for grant proposals (if you know users, take the time get to know them and how FOODIE is helping them so that you cultivate the relationships required to solicit such letters and also of course for lots of other reasons).
  4. Citations via Google Scholar or a similar tool. This is where it will be very useful to get an initial conference paper in the pipeline early if that gets the work into the literature sooner than a journal article. Let's keep an eye out for workshops Jeff hosts next year. He submits the workshops as proposals to larger conferences so the schedule is only set when the conferences announce acceptance of the proposals. I think his calls for papers usually go out at least 6 months before the conference with submission deadlines at least 3 months before the conference (these are very rough estimates from memory but I don't recall for sure).

Item 1 above is something the most challenging and is the subject of the remainder of this post.

One is that we post our own tar ball with releases, e.g., see "opencoarrays-1.1.1.tar.gz" at https://github.com/sourceryinstitute/opencoarrays/releases/tag/1.1.1. Then we use the following external tool to track downloads of the aforementioned tar ball: http://www.somsubhra.com/github-release-stats/. Enter "sourceryinstitute" and "opencoarrays" into the two fields on the latter page. Sadly even this only provides incomplete data because if someone downloads either of the tar balls that GitHub automatically posts to the aforementioned URL, we get no information on those tar balls. I've even thought about renaming the tar ball "download-this-one.tar.gz" or something similar. Very frustrating.

To help make installation easier for users, I also asked Alessandro to develop a Portfile that enables installation via MacPorts package management software on OS X. That Portfile now exists, but it turns out that MacPorts will only report downloads if the user also installs the mpstats port. Arrggghh. We ask people to do so, but we have no control over it and the corresponding web page shows only one download: http://stats.macports.neverpanic.de/categories/38/ports/25626#installs_over_time, which I'm almost certain is wrong. Our tar ball has been downloaded from GitHub over 275 times. It doesn't seem believable to me that the MacPorts installations would be less than more than 100 times smaller than the tar ball downloads now that we mention MacPorts prominently in our installation instructions.

I hope people can push on the maintainers of GitHub to do a better job with tracking such data, but at least it's good the we have something. That's better than nothing, which is what BitBucket offers.

Also, I have a web developer investigating whether we can use Google Analytics to track clicks on the link to the OpenCoarrays release tar ball from the main page on www.opencoarrays.org, but I don't have an answer on that yet either. I'm really amazed at how difficult it can be to get data that I would think would be easily accessible. Nonetheless, I believe strongly it's worth the effort.

@milancurcic
Copy link
Contributor

Thank you for this info, Damian. There exists one (very limited) view into Github repo clones and page visitors:

https://github.com/Fortran-FOSS-Programmers/FOODIE/graphs/traffic

Unfortunately, it gives you the number of clones and unique cloners only for past 2 weeks. I never found a way to see beyond past 2 weeks, but I suspect that data may be stored for all visits and clones since birth of a repository. Accessing this information? I don't know.

There is an answer on this Stack Overflow post about how to incorporate Google Analytics into README.md to track page views - which is not quite what we are after, but better than nothing.

http://stackoverflow.com/questions/10056638/how-to-get-github-clone-stats

@rouson
Copy link
Author

rouson commented Nov 12, 2015

Sent from my iPhone

On Nov 12, 2015, at 11:12 AM, Milan Curcic notifications@github.com wrote:

Thank you for this info, Damian. There exists one (very limited) view into Github repo clones and page visitors:

https://github.com/Fortran-FOSS-Programmers/FOODIE/graphs/traffic

Unfortunately, it gives you the number of clones and unique cloners only for past 2 weeks. I never found a way to see beyond past 2 weeks, but I suspect that data may be stored for all visits and clones since birth of a repository. Accessing this information? I don't know.

Unfortunately that information is not stored beyond two weeks or at least it's not accessible to users. I checked with GitHub support on this. The whole situation is very frustrating. I'm really surprised by how bad this situation is. I can't imagine submitting a funding proposal that relates to a released software package without being able to cite usage statistics. Actually to be more precise, I'm only using download statistics as a proxy for usage statistics because the latter would be impossible to determine without spyware. Nonetheless, many reviewers of such a proposal would deem this information to be indispensable.

There is an answer on this Stack Overflow post about how to incorporate Google Analytics into README.md to track page views - which is not quite what we are after, but better than nothing.

http://stackoverflow.com/questions/10056638/how-to-get-github-clone-stats

Thanks for that. I haven't checked the link yet, but I think it's a great to have traffic information that is accumulated for more than two weeks. Capturing README page views could be a good indicator of overall interest in the software.


Reply to this email directly or view it on GitHub.

@szaghi
Copy link
Member

szaghi commented Nov 13, 2015

Hi @rouson @milancurcic @zbeekman and All,

thank you for the interesting talk of yesterday! I will take care about of your suggestions/critics I hope in the next week. Here, just some thoughts about funding research.

Unfortunately, I am not allowed to attend at any interesting conferences in numerical/HPC/scientific-software or similar topics: my research Institute (CNR-INSEAN) is under strong economic crisis (as well as Italy and Europe in general from 2008), so I can attend to only 1 conference per year strictly related to Naval or Maritime topics. However, it would be very welcome if you consider worth to present this work to a conference. In the case you think it is worth to present our work, I can help for doing presentation materials (slides, tests, examples, screencasts...) for you. For the founding research I think my help is very small. I can help with some low level work...
I think that the automation of gathering download statistics is not so impossible, let me think about it some days. I am thinking, for example, to OpenCoarrays. For example, for @rouson it could be important to capture the number of OpenCoarrays installations the Arch Linux users like me made. This should be possible. Archers can install OpenCoarrays from AUR that does not track installations, but the AUR packages approach download/compile/install from sources... in particular the current AUR package of OpenCoarrays download the release 1.1.2 directly from GitHub! Because there is the possibility to count GitHub downloads, Damian should be able to count the number of Archers installing OpenCoarrays from AUR. Arch Linux is one of the most popular Rolling Release distro. For the other side of the moon Ubuntu is probably the most popular Standard Release distro. I am not sure, but I think it is possible to package an OpenCoarrays deb file that starts from downloading the sources from GitHub (it is a long time that I stopped using Ubuntu, but I can check).

So, assuming that we can favorite the OpenCoarrays and FOODIE installations by means of distros repositories, I am thinking to something like the following to gathering statistics:

  • count the GitHub release downloads: GitHub seems to disable this stats due performance problems on its servers, but these information are there (at least for 2 weeks) and we can collect them (see the link of @milancurcic and mine on previous posts); in particular:
    • we can routinely (daily) gather release downloads number (adding a scheduled daemon on our servers that download the day-stats) and save them locally;
    • use Travis CI to update global (month/year) stats (for example by means of another scheduled daemon that push into the master branch the day-file-stats and triggering Travis CI to run a script that update the global stats);
    • put this stats into the README of project.

I am not sure this is feasible before I try to do, but I am optimistic on that. What do you think?

See you soon.

@rouson
Copy link
Author

rouson commented Nov 15, 2015

These all sound like good ideas. I look forward to hearing what you find out as you work out the solutions.

Damian
510-600-2992 (mobile)

On Nov 13, 2015, at 2:47 AM, Stefano Zaghi notifications@github.com wrote:

Hi @rouson https://github.com/rouson @milancurcic https://github.com/milancurcic @zbeekman https://github.com/zbeekman and All,

thank you for the interesting talk of yesterday! I will take care about of your suggestions/critics I hope in the next week. Here, just some thoughts about funding research.

Unfortunately, I am not allowed to attend at any interesting conferences in numerical/HPC/scientific-software or similar topics: my research Institute (CNR-INSEAN) is under strong economic crisis (as well as Italy and Europe in general from 2008), so I can attend to only 1 conference per year strictly related to Naval or Maritime topics. However, it would be very welcome if you consider worth to present this work to a conference. In the case you think it is worth to present our work, I can help for doing presentation materials (slides, tests, examples, screencasts...) for you. For the founding research I think my help is very small. I can help with some low level work...
I think that the automation of gathering download statistics is not so impossible, let me think about it some days. I am thinking, for example, to OpenCoarrays. For example, for @rouson https://github.com/rouson it could be important to capture the number of OpenCoarrays installations the Arch Linux users like me made. This should be possible. Archers can install OpenCoarrays from AUR that does not track installations, but the AUR packages approach download/compile/install from sources... in particular the current AUR package of OpenCoarrays download the release 1.1.2 directly from GitHub! Because there is the possibility to count GitHub downloads, Damian should be able to count the number of Archers installing OpenCoarrays from AUR. Arch Linux is one of the most popular Rolling Release distro. For the other side of the moon Ubuntu is probably the most popular Standard Release distro. I am not sure, but I think it is possib le to pa ckage an OpenCoarrays deb file that starts from downloading the sources from GitHub (it is a long time that I stopped using Ubuntu, but I can check).

So, assuming that we can favorite the OpenCoarrays and FOODIE installations by means of distros repositories, I am thinking to something like the following to gathering statistics:

count the GitHub release downloads: GitHub seems to disable this stats due performance problems on its servers, but these information are there (at least for 2 weeks) and we can collect them (see the link of @milancurcic https://github.com/milancurcic and mine on previous posts); in particular:
we can routinely (daily) gather release downloads number (adding a scheduled daemon on our servers that download the day-stats) and save them locally;
use Travis CI to update global (month/year) stats (for example by means of another scheduled daemon that push into the master branch the day-file-stats and triggering Travis CI to run a script that update the global stats);
put this stats into the README of project.
I am not sure this is feasible before I try to do, but I am optimistic on that. What do you think?

See you soon.


Reply to this email directly or view it on GitHub #27 (comment).

@szaghi
Copy link
Member

szaghi commented Nov 15, 2015

@rouson and all,
I have done some experiments (very quckly, not accurate). I can do what I said before with only the assets of release, not with the release itself. In other words, github let me to count the downloads number of your assets (compiled binaries for example), but not the releaee itself. Very frustrating! For OpenCoarrays I see you provide an asset so the next week I will try to create a PR to count your downloads.

See you soon.

@rouson
Copy link
Author

rouson commented Nov 16, 2015

On Nov 15, 2015, at 2:24 AM, Stefano Zaghi notifications@github.com wrote:

@rouson https://github.com/rouson and all,
I have done some experiments (very quckly, not accurate). I can do what I said before with only the assets of release, not with the release itself. In other words, github let me to count the downloads number of your assets (compiled binaries for example), but not the releaee itself. Very frustrating! For OpenCoarrays I see you provide an asset so the next week I will try to create a PR to count your downloads.

Thanks for investigating this.

D

@zbeekman
Copy link
Member

I just want to add two resources I found:

  1. Homebrew (OS X) package manager will make download information public if you ask them to. For example here are the analytics for a2ps: https://bintray.com/homebrew/bottles/a2ps/view#statistics
  2. Bitdeli as some interesting analytics, and you can embed them in the readme, but it seems they may be having some technical problems at the moment...

@rouson
Copy link
Author

rouson commented Nov 23, 2015

On Nov 23, 2015, at 10:31 AM, Izaak Beekman notifications@github.com wrote:

I just want to add two resources I found:

Homebrew (OS X) package manager will make download information public if you ask them to. For example here are the analytics for a2ps: https://bintray.com/homebrew/bottles/a2ps/view#statistics https://bintray.com/homebrew/bottles/a2ps/view#statisticsAwesome! Please make the request on behalf of the OpenCoarrays team too if you don’t mind.
Bitdeli https://bitdeli.com/bitdeli-demo as some interesting analytics, and you can embed them in the readme, but it seems they may be having some technical problems at the moment…

Same as above. If you’d like to add this to the OpenCoarrays README too, please do. If so, please clone the OpenCoarrays repository and submit a pull request. Either Alessandro or I will review and accept the request and I’ll add you to the contributors list on opencooarrays.org.

Damian

@szaghi
Copy link
Member

szaghi commented Nov 24, 2015

@zbeekman

Thank you Zaak, your help is always valuable! I have tried Bitdeli in the past (maybe 1 year later). but it always had some problems... If I remember correctly it was due to some GitHub API changes.

I completely forget the Homebrew installs! Feel free to add an Homebrew formula, your help is very very appreciated, I am a poor Arch Linux user... never coming close to a shining Mac :-)

Let me know how can help you for making FOODIE homebrew-enabled.

P.S. to all: I am quite busy these days, I am sorry for my silence... I am trying to complete a disaster recovery... the data on a cluster and on my own backup blown up subsequently in few weeks, sigh! I am trying to play with ddrescue & Co., any suggestions are welcome :-(

@zbeekman
Copy link
Member

@szaghi I will try to write & submit a formula for FOODIE. Before I do that, however, I need to update FoBiS.py and FORD formula, and write an opencoarrays formula. I hope I'll get to FOODIE next week.

@szaghi
Copy link
Member

szaghi commented Nov 27, 2015

@zbeekman Take your time, this does not matter. Thank you very much for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants