Skip to content

ohmeow/fsdl_2022_course_project

Repository files navigation

fsdl_2022_course_project

Our project is to create an augmented ML approach course creators can use to streamline the generation of lecture summaries and chapter markers based on lesson videos.

Functionalities of Course Co-pilot

The basic workflow is:

  1. User opens a link to a YouTube video lecture in our application and asks Course Co-Pilot to process it
  2. User can view status of requests via the “Get Predictions” button.
  3. User can view predicted topic boundaries, headlines, & content summaries for processed videos.
  4. User can correct and save generated content (planned later to use in data flywheel)
  5. User will be able to export results as chapter markers to use in YouTube(planned later)
  6. User will be able to export results in a quarto friendly format for posting to a web page or blog.(planned later)

Why

In our own experience, we have noticed that such content either doesn’t get done, is time consuming, and/or requires work from outside parties. In particular, we noted in the below courses we’ve been a part of:

  1. Fast.ai course - During the course students manually create youtube chapter markers, lesson transcripts, and summaries on the forums.

  2. FSDL course - The chapter markers and lesson notes are later created manually and then shared on the FSDL website usually 1 week after the each lesson.

How our application is structured?

System Diagram

What have we done so far?

Let’s look at the dataset, ML library, API, and web application we created for our prototype system

Dataset

Since we had to train summarization models and topic segmentation models, we manually created our dataset from a bunch of youtube videos ranging from videos from fastai lessons, FSDL lesson to random videos teaching something.

Dataset Link

Dataset Schema

ML library: course_copilot

We leveraged nbdev framework to create a python package which acted as our framework for Model training and model serving. We integrated Wandb for experiment tracking and fine tuning models with sweeps. We created Model trainers for task of topic segmentation and summarization. The timing of our project coincided with release whisper which we used for creating transcription of youtube video URL you are passing. This helps to provide the required data for creating topic segments and summaries.

fsdl_2022_course_project

nbdev based Model Trainer for Topic Segmentation, Experiment tracking with W&B

Backend API

For the backend, we used FastAPI for creating APIs. Our API is leveraging dagster as the workflow engine to create tasks for running inference jobs from creating transcripts of video with whisper, running topic segmentation and running the summarization models.

fsdl-2022-group-007-app

Course Copilot APIs

Web Application

We created our front-end web application using Vue3 and Quasar. It is deployed to github pages from our repo.

fsdl-2022-group-007-web

Topic summaries and chapter summaries generated

Future Plans

  • Improve quality of training data
  • Allow users to save their corrected headlines and summaries
  • Add ability for users to update topic spans
  • Implement data flywheel
  • Implement chapter marker and quarto export features
  • Add authentication/authorization

Install

pip install course_copilot

Setting up your development environment

Please take some time reading up on nbdev … how it works, directives, etc… by checking out the walk-thrus and tutorials on the nbdev website

Step 1: Create conda environment

After cloning the repo, create a conda environment. This will install nbdev alongside other libraries likely required for this project.

mamba env create -f environment.yml

Step 2: Install Quarto:

nbdev_install_quarto

Step 3: Install hooks

nbdev_install_hooks

Step 4: Add pre-commit hooks (optional)

If using VSCode, you can install pre-commit hooks “to catch and fix uncleaned and unexported notebooks” before pushing to get. See the instructions in the nbdev documentation if you want to use this feature. https://nbdev.fast.ai/tutorials/pre_commit.html

Step 5: Install our library

pip install -e '.[dev]'

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •