Measuring the Progress of AI Research

This repository contains a Jupyter Notebook, which you can see live at https://eff.org/ai/metrics. It collects problems and metrics / datasets from the artificial intelligence and machine learning research literature, and tracks progress on them. You can use it to see how things are progressing in specific subfields or AI/ML as a whole, as a place to report new results you've obtained, and as place to look for problems that might benefit from having new datasets/metrics designed for them, or as a source to build on for data science projects.

At EFF we're also interested in collecting this data to understand the likely implications of AI, but to begin with we're focused on gathering it.

Original authors: Peter Eckersley and Yomna Nasser at EFF

With contributions from: Gennie Gebhart and Owain Evans
Inspired by and merging data from:

Rodrigo Benenson's "Who is the Best at X / Are we there yet?" collating machine vision datasets & progress
Jack Clark and Miles Brundage's collection of AI progress measurements
Sarah Constantin's Performance Trends in AI
Katja Grace's Algorithmic Progress in Six Domains
The Swedish Computer Chess Association's History of Computer Chess performance
Qi Wu et al.'s Visual Question Answering: A survey of Methods and Datasets
Eric Yuan's Comparison of Machine Reading Comprehension Datasets

How to contribute to this notebook

This notebook is an open source, community effort. You can help by adding new metrics, data and problems to it! If you're feeling ambitious you can also improve its semantics or build new analyses into it. Here are some high level tips on how to do that.

1. If you're comfortable with git and Jupyter Notebooks, or are happy to learn

If you've already worked a lot with git and IPython/Jupyter Notebooks, here's a quick list of things you'll need to do:

Install Jupyter Notebook and git.
- On an Ubuntu or Debian system, you can do:
```
sudo apt-get install git
sudo apt-get install ipython-notebook || sudo apt-get install jupyter-notebook || sudo apt-get install python-notebook
```
- Make sure you have IPython Notebook version 3 or higher. If your OS doesn't provide it, you might need to enable backports, or use pip to install it.

Install this notebook's Python dependencies:

On Ubuntu or Debian, do:

    sudo apt-get install python-{cssselect,lxml,matplotlib{,-venn},numpy,requests,seaborn}

On other systems, use your native OS packages, or use pip:

    pip install cssselect lxml matplotlib{,-venn} numpy requests seaborn

Fork our repo on github: https://github.com/AI-metrics/AI-metrics#fork-destination-box
Clone the repo on your machine, and cd into the directory it's using
Configure your copy of git to use IPython Notebook merge filters to prevent conflicts when multiple people edit the Notebook simultaneously. You can do that with these two commands in the cloned repo:
```
git config --file .gitconfig filter.clean_ipynb.clean $PWD/ipynb_drop_output
```
```
git config --file .gitconfig filter.clean_ipynb.smudge cat
```
Run Jupyter Notebok in the project directory (the command may be ipython notebook, jupyter notebook, jupyter-notebook, or python notebook depending on your system), then go to localhost:8888 and edit the Notebook AI-progress-metrics.ipynb to your heart's content
Save and commit your work (git commit -a -m "DESCRIPTION OF WHAT YOU CHANGED")
Push it to your remote repo
Send us a pull request!

2. If you want something very simple

Microsoft Azure has an IPython / Jupyter service that will let you run and modify notebooks from their servers. You can clone this Notebook and work with it via their service: https://notebooks.azure.com/EFForg/libraries/ai-progress. Unfortunately there are a few issues with running the notebook on Azure:

arXiv seems to block requests from Azure's IP addresses, so it's impossible to automatically extract information about paper when running the Notebook there
The Azure Notebooks service seems to transform Unicode characters in strange ways, creating extra work merging changes from that source

Notes on importing data

Each .measure() call is a data point of a specific algorithm on a specific metric/dataset. Thus one paper will often produce multiple measurements on multiple metrics. It's most important to enter results that were at or near the frontier of best performance on the date they were published, though this isn't a strict requirement and it's nice to have a sense of the performance of the field, or of algorithms that are otherwise notable even if they aren't the frontier for a specific problem.
When multiple revisions of a paper (typically on arXiv) have the same results on some metric, use the date of the first version (the CBTest results in this paper are an example)
When subsequent revisions of a paper improve on the original results (example), use the date and scores of the first results, or if each revision is interesting / on the frontier of best performance, include each paper
- We didn't check this carefully for our first ~100 measurement data points :(. In order to denote when we've checked which revision of an arXiv preprint first published a result, cite the specific version (https://arxiv.org/abs/1606.01549v3 rather than https://arxiv.org/abs/1606.01549); that way we can see which previous entries should be double-checked for this form of inaccuracy.
Where possible, use a clear short name or acronym for each algorithm. The full paper name can go in the papername field (and is auto-populated for some papers). When matplotlib 2.1 ships we may be able to get nice rollovers with metadata like this. Or perhaps we can switch to D3 to get that type of interactivity.

What to work on

If you know of ML datasets/metrics that aren't included yet, add them
If there are papers with interesting results for metrics that aren't included, add them
If you know of important problems that humans can solve, and machine learning systems may or may not yet be able to, and they're missing from our taxonomy, you can propose them
Look at our Github issue list perhaps starting with those tagged as good volunteer tasks.
You can also add missing conferences / journals to the venue-to-date mapping table (unhide the source code and search for conference_dates):

FAQ

Q: What's the point of this project? How does it tie in with the EFF's mission?

Given that machine learning tools and AI techniques are increasingly part of our everyday lives, it is critical that journalists, policy makers, and technology users understand the state of the field. When improperly designed or deployed, machine learning methods can violate privacy, threaten safety, and perpetuate inequality and injustice. Stakeholders must be able to anticipate such risks and policy questions before they arise, rather than playing catch-up with the technology. To this end, it’s part of the responsibility of researchers, engineers, and developers in the field to help make information about their life-changing research widely available and understandable

Q: Why haven't you included dataset X?

There are a tiny number of us and this is a large task! If you'd like to add more data, please send us a pull request

Q: Do you track other things besides how well-solved particular tasks are? For instance, the speed and efficiency of training?

No, but we'd love to. If you are motivated to help organize that data, please dive in and improve the notebook!

Q: Have you thought about how to visualise this data and make it more accessible?

We've considered a variety of things, but decided that the iPython notebook was ultimately the most accessible for now. We're very open to suggestions about visualizations and accessibility, so feel free to reach out if you have any ideas!

Also, if you'd like to build visualizations on top of this project, all the data we use is available in the easily digestible JSON format in progress.json. If you do so, let us know and we'll try to link to it.

Q: Is this an EFF project?

Yes, but we'd like it to grow to be a self-sustaining community effort supported by a coalition of organizations. EFF did the initial work of making the Notebook, but we built on several excellent datasets collected by many other people, and had a number of productive collabrative discussions most especially with people at OpenAI and the Future of Humanity Institute in preparing the document. We will strive to keep the authorship section of the Notebook accurate as others continue to contribue.

Q: When will artificial general intelligence happen?

We don't know--and this project is not meant to answer this question. Instead, we’re interested in compiling data to guide evidence-based conversations about the state of the art in various corners of AI and machine learning research.

Name		Name	Last commit message	Last commit date
Latest commit History 360 Commits
archive		archive
data		data
export-api/v01		export-api/v01
images		images
js		js
scrapers		scrapers
video		video
.gitattributes		.gitattributes
.gitconfig		.gitconfig
AI-progress-metrics.ipynb		AI-progress-metrics.ipynb
LICENSE.md		LICENSE.md
README.md		README.md
custom.css		custom.css
datasets-to-explore.txt		datasets-to-explore.txt
ipynb_drop_output		ipynb_drop_output
sanitise.py		sanitise.py
scales.py		scales.py
taxonomy.py		taxonomy.py

License

AI-metrics/AI-metrics

Folders and files

Latest commit

History

Repository files navigation

Measuring the Progress of AI Research

How to contribute to this notebook

1. If you're comfortable with git and Jupyter Notebooks, or are happy to learn

2. If you want something very simple

Notes on importing data

What to work on

FAQ

About

Resources

License

Stars

Watchers

Forks

Languages