Skip to content

GSOC 2019 Project Ideas

Jim Crist edited this page Mar 28, 2019 · 4 revisions

Thanks for your interest in applying for Google Summer of Code with Dask. Please see the student instructions page for full instructions.


Here are some ideas for GSoC students in 2019. This list is by no means exhaustive; if you have an idea that you think would be a good GSoC project, please create an issue and discuss it with the community before submitting your proposal.

Cythonize Parts of Dask's Core Algorithms

Potential Mentors: Jim Crist

Description: Currently the core parts of Dask (graph algorithms, optimization, the scheduler, etc...) are all written in pure Python. While generally efficient, these graph operations can sometimes become a bottleneck when working with large graphs. There is potential here for a speedup using Cython or other native code. This would be experimental work, and would start by profiling the scheduler under various problems to find slow points and iterate on optimizations.

Expected Skills: A good candidate would be familiar with Python, and have some experience reasoning about Python performance. Experience with Cython would be a plus, but not a requirement.

Parallelize GeoPandas with Dask

Potential Mentors: Joris Van den Bossche

Description: GeoPandas is a package that adds support for geospatial data to pandas' DataFrame and Series objects. Dask-GeoPandas (https://github.com/mrocklin/dask-geopandas/) is an attempt to further bridge GeoPandas and Dask to parallellize and distribute GeoPandas. For now, this experimental package is not more than a proof-of-concept, and the goal of this project would be to further develop the Dask-GeoPandas interaction.

Expected Skills: A good candidate would be familiar with Python, and have some experience with geospatial data and ideally with reasoning about spatial algorithms (e.g. spatial indexing).