Skip to content

amamenko/auto-mashup

Repository files navigation

MIT License LinkedIn


Auto Mashup

Music Mashups Automated with Node.js
Featuring Songs from Billboard Charts

Website · YouTube Channel · Buy Me a Coffee · Instagram · Report Issue

Background

A mashup, according to Merriam-Webster's Dictionary, is "a piece of music created by digitally overlaying an instrumental track with a vocal track from a different recording." The idea of mashups has been around since the late 1960s - the first such creation, arguably, was found on Harry Nilsson's 1967 album Pandemonium Shadow Show, which features a cover of The Beatles' "You Can't Do That" with his own vocal recreations of more than a dozen other Beatles songs on the same instrumental track.

Ideally, the vocal track of one song is superimposed seamlessly onto the instrumental track of a separate song, modifying the musical keys and tempos where necessary to achieve a perfect mix. Often, when selecting songs for a mashup, mashup creators search for songs that have similar musical keys, tempos, and modes. This not only allows for a better-sounding mix, but increases the audience's ability to recognize distinct elements of both selected songs, while avoiding something like a "chipmunk"-sounding effect on the audio.

Indeed, due to the fact that there are only so many chords, keys, tempos, song structures, time signatures, and modes, popular songs on the Billboard charts can, and often do, sound alike. The weekly Billboard magazine tracks the most popular trending songs across various genres of music and displays various charts containing music rankings on their website. Charts include the Hot 100 and the Billboard Global 200, as well as greatest-of-all-time (GOAT) charts such as GOAT Hot 100 Songs, GOAT Songs of the '90s, and GOAT Songs of the Summer.

In 2021, American singer-songwriter Olivia Rodrigo added two members of American rock band Paramore as co-writers of her single “Good 4 U” due to the similarities between her own song and Paramore’s 2007 song “Misery Business." Multiple mashups of the two songs can be be found online. This is not an isolated incident - many songs across multiple genres and decades sound similar or can be manipulated to sound similar - although perhaps not due to directly lifting elements from other musical works.

Notable mashup artists including Neil Cicierega and Girl Talk have released entire mashup albums such as Mouth Sounds and Feed the Animals, respectively. Platforms such as Tik Tok and YouTube have no dearth of mashup material - such creations have clearly become exceedingly popular. The question is - with the formulaic concepts and techniques involved in creating a musical mashup, can mashup-creation be automated?

Functionality

The Auto Mashup project spans two repositories.

The first repository (the one you are visiting right now) represents the Node.js and Express server that handles the weekly regular Billboard chart scraping, monthly GOAT Billboard chart scraping, individual song data/audio stem acquisition logic, and writing to a Contentful Content Management System.

The second repository (https://github.com/amamenko/auto-mashup-mix) contains both the client-side website logic of auto-mashup.vercel.app (built with React) and the Node.js/Express server that uses the song data acquired in Contentful to find and create automated song mashups with FFMPEG and create and upload a weekly video of mashups to the Auto Mashup YouTube channel and a post to the @automaticmashup Instagram page.

The basic functionality of this repository's code logic is:

Billboard Chart Scraping:

  • Use cheerio to scrape song position data from charts on billboard.com using logic modified from billboard-top-100.
  • Remove any old entries and their associated instrumental and accompaniment audio assets if the song is no longer present on any Billboard chart.
  • Scrape regular Billboard charts via a CRON job set up via node-cron every Wednesday and greatest-of-all-time Billboard charts on the first Sunday of every month. The selected Billboard charts include:

Regular Billboard Charts GOAT Billboard Charts
The Hot 100 GOAT Hot 100 Songs
Billboard Global 200 GOAT Hot 100 Songs by Women
Radio Songs GOAT Songs of the Summer
Hot Dance/Electronic Songs GOAT Songs of the '80s
Hot Rap Songs GOAT Songs of the '90s
Hot R&B/Hip-Hop Songs GOAT Hot R&B/Hip-Hop Songs
Hot R&B Songs GOAT Adult Alternative Songs
Hot Alternative Songs GOAT Alternative Songs
Hot Country Songs GOAT Hot Country Songs
Hot Mainstream Rock Songs GOAT Mainstream Rock Songs
Mexico Airplay GOAT Pop Songs
Hot Latin Songs GOAT Adult Pop Songs


Song Data Acquisition:

  • Use the Spotify Web API to get an audio analysis for every track on the Billboard chart with a Spotify song ID (noting the song's tempo, key, and mode, in addition to the track name and artist name). Songs with a time signature other than 4/4 (common time) are excluded since 4/4 time is by far the most popular time signature and to make mashup creation a more streamlined effort.
  • Search YouTube using logic modified from usetube for applicable videos that meet certain minimum standards (e.g. videos do not contain blacklisted terms either in their video title, video description title, channel name, or channel description) and contain closed-captions. The timestamped closed-captions available on these videos are then compared to lyrics found on Genius (acquired via the Genius Lyrics API) with a string and character comparison function that attributes timestamps to the various sections of the song.
  • If a video with an adequate number of successfully timestamped song sections is found, its MP3 audio is downloaded using yt-dlp. Audio is then trimmed to a maximum of 3 minutes long with fluent-ffmpeg.
  • The MP3 audio is then split into instrumental and vocal stem MP3 files uploaded to an output AWS S3 bucket using a custom Dockerized AWS Lambda function that relies on Deezer's Spleeter. Spleeter is a song separation library that uses pretrained models written in Python and Tensorflow. Note that a local Node.js implementation of Spleeter is possible as noted by my comment on this issue, however, Spleeter requires a substantial amount of RAM that quickly overwhelms an AWC EC2 t2.micro instance. Even with modular implementations of Spleeter, the splitting process (even with a base 2-stem model) is memory-intensive.
  • Every beat position of the instrumental stem MP3 file is then determined for beatmatching purposes using the essentia.js library.
  • Vocal and accompaniment audio assets are uploaded to a Contentful CMS using Contentful's Content Management API. These assets are then associated with the song's entry. The enty is subsequently populated with all of the acquired data.

Deployment

Server deployed via AWS EC2 instance. Client-side website deployed with Vercel.

License

Distributed under the MIT License. See LICENSE.txt for more information.

Contact

Auto Mashup - automaticmashup@gmail.com

Avraham (Avi) Mamenko - avimamenko@gmail.com

Project Link: https://github.com/amamenko/auto-mashup

Acknowledgements