Skip to content

Latest commit

 

History

History
148 lines (103 loc) · 6.99 KB

README.md

File metadata and controls

148 lines (103 loc) · 6.99 KB

Contributors Stargazers Issues npm Code Coverage

uvic-course-scraper

UVic Course Scraper is a Node.js library that parses information from University of Victoria (UVic) course calandar and course schedule information sources. It uses Cheerio under the hood to parse HTML.

As a developer, you would use this to parse HTML and JSON from Kuali and BAN1P which would be retrieved by any method like fetch etc.

Install

npm install @vikelabs/uvic-course-scraper

API

The following table provides descriptions of the methods available on the object generated by UVicCourseScraper().

Method Description
getAllCourses() Returns KualiCourseCatalog[] with all active courses in the Kuali catalog
getCourseDetails(pid: string) Returns KualiCourseItem with details for the course with the given pid
getCourseSections(subject: string, code: string, term: string) Returns ClassScheduleListing[] with section details for all sections of the course in the given term
getSectionSeats(term: string, crn: string) Returns DetailedClassInformation with the seats and waitListSeats for the course

Example

const { UVicCourseScraper } = require('@vikelabs/uvic-course-scraper');

// get all courses from the Kuali course catalog
const allCourses: KualiCourseCatalog[] = await UVicCourseScraper.getAllCourses();
const courseTitle: string = allCourses[0].title;

// get course details for course with pid 'ByS23Pp7E' (in this case thats ACAN 225)
const courseDetails: KualiCourseItem = await UVicCourseScraper.getCourseDetails('ByS23Pp7E');
const courseDescription: string = courseDetails[0].description;
const courseLectureHours: string = courseDetails[0].hoursCatalogText.lecture;

// get course sections for CSC 111 in spring 2021
const courseSections: ClassScheduleListing[] = await UVicCourseScraper.getCourseSections('202101', 'CSC', '111');
const courseSectionCode: string = courseSections[0].sectionCode;

// get seats for course section with CRN 10953 in spring 2021 (in this case thats ECE 260 - A01)
const sectionSeats: DetailedClassInformation = await UVicCourseScraper.getSectionSeats('202101', '10953');
const sectionTotalSeats: number = sectionSeats.seats.capacity;

Developing

  1. Clone the repo:
    git clone https://github.com/VikeLabs/uvic-course-scraper.git
    
  2. Run npm install
  3. Optionally, experiment with example.ts using npx ts-node-dev src/example/example.ts to get a feel for how cheerio and RegEx works on the type of sites our project is scraping.
  4. Find an unassigned task on ZenHub to work on.
  5. Create a new branch using git checkout -b <branch-name> (make sure it's up to date with master)
  6. Commit the changes you've made and push to GitHub to create a Pull Request.

Testing

This project uses Jest testing framework. You can execute tests by running npm test.

This will execute tests using Jest files with the extension *.test*.

npx jest --watch will put Jest into watch mode, which will execute tests as files change.

Developer Tools

This repository contains a CLI to make development related tasks easier.

npm run dump -- --term 202009 --type courses
  • Dumps the course details for the 202009 term.
  • Outputs to a courses.json file.
npm run dump -- --term 202009 --type schedules
  • Dumps the schedule details for all 202009 term classes.
  • This schedule details corresponds to the Class Schedule Listing page view on BAN1P.
  • This command can only be run after dumping courses data.
npm run dump -- --term 202009 --type class --crn 10953
  • Dumps the HTML of a "Detailed Class Information" page for a given term and CRN.
npm run dump -- --term 202009 --type sections
  • Dumps the section details for all 202009 term classes by crn.
  • This command can only be run after dumping schedules data.

Target Pages

The following are some of the pages we are currently parsing.

Schedule Information (BAN1P)

Class Schedule Listing

Class Schedule Listing - ECE 260 - 202009

This is where all the information for a specific class will be parsed such as when the term is, location, CRN, etc. You can change the query string parameters term_in, subj_in, and crse_in to anything you'd like to view other class listings. For example, 202101, CHEM, and 101 could be put in the respective locations.

Detailed Class Information

Detailed Class Information

This is where all the information for a specific section of a class will be parsed such as the class and waitlist capacity. You can change the parameters term_in, crn_in, to anything you'd like to view other class listings. For example, 202101 and 12345 could be put in the respective locations.

Course Information (Kuali)

The course information from this source is mostly in JSON already so this library does not do much and is mainly used to create a list of courses for other processes. However, there is some parsing done. The preAndCorequisites field is HTML so we intend to parse this.

Kuali Courses Catalog Info

This is the JSON file which contains basic information about every course being offered and some courses that were offered recently.

To get more detailed information about a course, one much make another request using the pid value from the above JSON

Kuali Course Info

This contains detailed information about a class like:

  • Description
  • Requirements
  • Pre and co-requisites