Skip to content
This repository has been archived by the owner on Aug 6, 2021. It is now read-only.

ChristianSch/numerflow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

52 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

numerflow

Data workflows for the numer.ai machine learning competition

Tasks

Currently implemented:

  • fetch and extract the datasets
  • train and predict
  • automatic upload

Task Documentation

FetchAndExtractData

Fetches the dataset zipfile and extracts the contents to output-path.

Parameters

  • output-path: where the datasets should be saved eventually (defaults to ./data/)
  • dataset-path: URI of the remote dataset

TrainAndPredict

Trains a Bernoulli Naïve Bayes classifier and predicts the targets. Output file is saved at output-path with a custom, timestamped file name.

Parameters

  • output-path: where the datasets should be saved eventually (defaults to ./data/)
  • dataset-path: URI of the remote dataset

UploadPredictions

Uploads the predictions of not already uploaded.

Parameters

  • output-path: where the datasets should be saved eventually (defaults to ./data/)
  • dataset-path: URI of the remote dataset
  • usermail: user email
  • userpass: user password
  • filepath: path to the file ought to be uploaded

Usage

Prepare the project:

pip install -r requirements.txt --ignore-installed

If not alread done create an API key here with at least the following permissions:

  • Upload submissions.
  • View historical submission info.
  • View user info, (e.g. balance, withdrawal history)

To run the complete pipeline:

env PYTHONPATH='.' luigi --local-scheduler --module workflow Workflow --secret="YOURSECRET" --public-id="YOURPUBLICID"