Skip to content

ainzzorl/soccer-team-adjectives

Repository files navigation

Overview

Parse Reddit's /r/soccer to associate adjectives with soccer teams. Given an archive of comments, find out what adjectives best describe teams.

Usage

  • Install Ruby.
  • Install bundler: http://bundler.io/
  • From the root directory, run ./scripts/run.rb --input-file INPUT-FILE --config-file CONFIG-FILE [--phases PHASES] [--debug]. The input file must be a .csv with comment body in the first column and comment id in the second.

Example: ./scripts/run.rb --input-file input/sample.csv --config-file config/teams.yaml. Note that the output for it is likely to be empty because there are too few adjectives in the sample input, and they are likely to be excluded by the popularity filter.

Real data can be downloaded from https://bigquery.cloud.google.com/dataset/fh-bigquery:reddit_comments

Query tables with SELECT body, id FROM <table-name> WHERE subreddit = 'soccer'.

Algorithm

Phase 1

Count team name/adjective pairs used in the same sentence.

Phase 2

  • Filter out blacklisted adjectives (nationalities, colors, ...).
  • Exclude N most popular adjectives: they are too generic.
  • Score adjectives. Promote somewhat unusual words.
  • Keep only M adjectives per team.

Phase 3

Export results to .csv files per league.

Results and Publications

Original post on Reddit

BBC Article

Mirror Article

About

Parse Reddit's /r/soccer to associate adjectives with soccer teams

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages