Skip to content

GSoC Outreachy 2023 Ideas

Antonin Delpeuch edited this page Feb 9, 2023 · 7 revisions

Here is a list of projects which could be internship topics for OpenRefine's participation in Outreachy or Google Summer of Code in 2023.

Potential mentors are encouraged to add their project descriptions here, following the template below. For examples, you can check out the previous years: GSoC Outreachy 2022 Ideas and GSoC 2020 Ideas. We are coordinating our participation in those programmes in this thread.

Reconciliation improvements

  • Difficulty (rough estimate): medium
  • Description: This project would focus on improving the reconciliation UI and extend support for features from the Reconciliation Service API specification. Its task ranges from small improvements to high impact issues like authentication support.
  • Expected outcomes: Various improvements to the reconciliation UI and at least one high-impact issue being resolved.
  • Skills required/preferred: JavaScript & Java
  • Possible mentors: @Abbe98
  • Relevant issues: #2015, 5558, #4877, #4722, #4922, #4239, #4224, #3829, #2916, #5605 (Under discussion: #5603, #5604)

Syntax highlighting for expressions

  • Difficulty (rough estimate): hard
  • Description: OpenRefine offers text fields where users can type expressions in various languages: GREL, Python and Clojure. Those text fields do not support syntax highlighting for any of those languages. Adding support for syntax highlighting would help users grasp the structure of the expressions they work with in a visual fashion. Implementing support for this raises the following challenges:
    • This syntax highlighting will likely need to be based on an existing library. We need to survey the available Javascript libraries for this, and evaluate them on various criteria (features, size, license, browser support, project health…) to find a fitting one.
    • The GREL language is OpenRefine's own expression language, meaning that no existing Javascript library will have native support for highlighting it. This support needs to be built by us, based on GREL's fairly simple grammar. This will involve some visual design choices.
    • Expression languages supported by OpenRefine are extensible. For instance, refine-js adds support for Javascript as expression language. Therefore, it should also be possible for such extensions to add support for syntax highlighting as well.
  • Expected outcomes: syntax highlighting is supported for at least one expression language
  • Skills required/preferred: Javascript, HTML & CSS (this should be exclusively be front-end work)
  • Possible mentors: @wetneb
  • Relevant issues: #153

User-defined clustering

  • Difficulty: hard
  • Description: Our binning clusterers let the user choose between various methods to generate bins in which the values are spread. Extensions can define new binning methods, but writing an extension is still quite some work. It would be even better if users could simply provide an expression (GREL, Jython, Clojure…) which would compute the bin in which a given value falls in. That would potentially let users better adapt the binning strategy to their own uses cases. User-defined distances could also be used for kNN-based clustering.
  • Expected outcomes: A new clustering method which accepts a user-defined expression, either as a binning or kNN clusterer, potentially both.
  • Skills required/preferred: both the backend (Java) and frontend (HTML/CSS/JS) will need adapting
  • Possible mentors: @wetneb
  • Relevant issues: #4301

Template

  • Difficulty (rough estimate):
  • Description:
  • Expected outcomes:
  • Skills required/preferred:
  • Possible mentors:
  • Relevant issues:
Clone this wiki locally