Skip to content

GSoC application template

Antonin Delpeuch edited this page Mar 8, 2020 · 2 revisions

This page explains the requirements for Google Summer of Code applications, and how we evaluate them.

Guidelines

Proposals for projects should be short documents which explain what you would like to implement. They can (but do not have to) be based on the project ideas we provide, but they should be a bit more developed than the ideas. Show you have thought about the feature, explored the code to think how it could tie in with the existing architecture. You don't need to work out all the details - some hurdles might appear only when you actually start coding, but we like to see that you have done a bit of research ahead. It also helps if you can give a few goals you have for this feature. Explain how it is going to be experienced by users. If you plan to work on the UI, it could be useful to draw a few UI mocks, for instance.

Overall, it really helps if you have played with OpenRefine yourself. Would this new feature be useful to you? Which sort of data cleaning project would you use it on? This is a guarantee that your implementation is going to be useful for other users.

Besides our Developer mailing list link above, we encourage students to also signup for our Users mailing list (https://groups.google.com/forum/#!forum/openrefine) to see first hand other users issues with OpenRefine and thus to inspire additional project ideas for students.

Below is a fictional proposal for a mysterious feature.

Example proposal

I would like to work on adding Foo support to the Bar extension, as described by issues #1234 and #3456.

User experience

Once my feature is implemented, users will discover a new panel in the preferences section, where they will be able to enable Foo integration with a checkbox. When the integration is enabled, this will reveal fields to configure it using the following settings:

  • the hyperproxy URI (localhost by default), as a text input;
  • the sampling rate (1 by default), as a slider from 0 to 1;
  • the cutting pattern (chosen from the following options: Checkerboard, Voronoi, Kitay-gorod, Watercolour).

When the hyperproxy URI is invalid, this will be flagged to the user by a red flag next to the corresponding text input.

When Foo integration is enabled, project pages will use the supplied hyperproxy to spice up the rows view. Each row will sample itself via the cutting pattern, and this will be rendered by shades of pink for the user:

proposed UI for Foo sampling

(The logos for starred and flagged rows will need to be adapted to improve the rendering).

The shade of pink represents the absolute value of the cutting level as determined by the pattern and the rate.

Moreover, users will be able to access the cutting level programmatically in GREL, using the formula row.cuttingLevel. This will return a complex number corresponding to the cutting level.

Architecture

The hyperproxy will be queried both from the frontend (to display the shades of pink) and the backend (to compute row.cuttingLevel when required).

For the frontend, the grid rendering code will be altered to add calls to the hyperproxy, which will be reachable thanks to its CORS configuration. Calls to the hyperproxy will be asynchronous, so that the grid can be rendered quickly even if the computation of the cutting levels by the hyperproxy takes longer. When the hyperproxy responds, the shades of pink will be inserted in the corresponding rows. Because hyperproxies support computing cutting levels for up to 12 samples at a time, a single query to the hyperproxy will be sufficient when viewing 5 or 10 rows at a time, but in general more queries will be needed (for instance when using the records mode).

For the backend, we will add a GREL field to the Row class (just like #2363 does). When this field is accessed, this will trigger an HTTP request to the hyperproxy to compute the cutting level of the corresponding row. To avoid computing the cutting level twice for the same row, a cache will be added. This cache will be invalidated every time we change the hyperproxy settings by comparing the current settings to the settings used to compute the cached value, at retrieval time. In cases of error (if the hyperproxy is unreachable or too slow) we will return an error message in the form of a string: hyperproxy is not properly awake, check back later.

Potential pitfalls

TBC

Minimal viable contribution

TBC

Main design questions to settle

TBC

Clone this wiki locally