These notebooks should be run sequentially using the docker containers provided below.
- The first notebook fetches and creates the dataset.
- The second notebook vectorizes the code sequence and description sequence and trains 3 seq2seq models:
- Seq2Seq model from function tokens -> docstring
- Seq2Seq model from api seq -> docstring
- Seq2Seq model from method name -> docstring
- This notebook trains an AWD LSTM model for docstring using FastAI's implementation.
- This notebooks trains the final joint embedder from code to docstring vectors.
- In this notebook, we build a search engine that uses the trained networks to output query results.
- This notebook evaluates the model.
In order to run these sets of notebooks (1 - 6), we would highly suggest using these docker containers:
-
hamelsmu/ml-gpu: Use this container for any gpu bound parts.
-
hamelsmu/ml-cpu: Use this container for any cpu bound parts.