Skip to content

1j01/audio-sponge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Procedural Song Generator

Maybe it music!β„’ (...It's not GOOD music, but maybe you can define it as such, as music - that's up to you. That's up to you - that's your mission, if you choose to accept it: a quest to categorize a cacophony as candidly as you can as: music.)

What it does

  • Accepts search terms in the UI (e.g. "a cappella", "coin collect", "grass sound effect")

  • Collects audio by searching SoundCloud, OpenGameArt, and/or the filesystem (TODO: more sources - maybe Napster, maybe Spotify, maybe general web search engines filtered to audio files (Bing and DuckDuckGo have filetype:mp3 support but still need filtering to actual mp3 files).)

  • Gets a MIDI file from BitMidi (using search terms, falling back to a random MIDI if it doesn't find anything)

  • Takes random samples from the audio it collected (TODO: Detect beats for percussion, detect pitch/harmonics for melody)

  • Uses a MIDI file for structure and plays the samples pitch shifted for the melody and left as is for the percussion, with some reverb. (TODO: crazy random effects like randomly changing knobs on this, maybe experiment with generating or modifying melody / rhythm, phasing in and out parts, ...)

  • Starts playing while generating (TODO: fix horrible performance problems)

  • Lets you download audio .ogg files and attribution .html files (with related filenames so you can keep them together easily) (TODO: ideally, embed attribution information in audio file metadata. TODO: track number of samples per source and don't show sources where no samples were used (as can happen))

Project Story

I started this project with the goal of making an infinite everchanging generative audio station, inspired by versificator. (I didn't find the repo for it then and may have just worked on / messed around with that if I had found it.)

I managed to make something, but before getting to making it sound good, I ran into some problems:

  1. I couldn't deploy to heroku (or many other free services) and just have it stream an audio file like a radio station - heroku would cut off the response at 30s. I would have to use websockets or something to stream, or rearchitect it so it does the audio generation on the client. (It's a bit of a moot point now, since Heroku is no longer free, and I'm not planning to put any money into this project. You'll have to run it locally.)
  2. Streaming audio is not too difficult to get working, but harder to get scalable. The way I implemented it it could stream to multiple clients (cool), but if a client paused the stream, the server would indefinitely buffer audio for that client. You could literally DOS it by pausing the stream and waiting. ShoutCast/IceCast would be better (but they would work better with discrete songs...)
  3. Installing native dependencies is hard. I had gone with Node.js so I could use the somewhat familiar-to-me Web Audio API (also so I could possibly transition to doing it on the client)
  4. While listening to the stream, if I heard something I liked, I couldn't easily save it. I had to already have been recording the system audio and then pause and cut that.
  5. I wanted to do visualization on the client, but this would mean meticulously marshalling data between the server and client, and keeping it SYNCED with the audio stream. This would be a RIDICULOUSLY complicated way to approach this problem.
  6. I wanted to accept search terms on the client, but if it just fed into a public audio stream, trolling/moderation would be a concern. (Also if you wanted to try generating something you'd like to keep private, there wouldn't be a space for that you'd have to set up the whole dev environment.)

So I decided to rewrite it to generate individual songs.

I did that, and I made a reasonably nice looking UI, and I made it so you can enter search terms for creative control over the output, and I made it show attribution per song, instead of the long list of attribution for a pool of sounds that it might be using which didn't correspond much to what you were hearing.
I made it use MIDI files as a basis for song structure - perhaps it will only create "covers" of songs, but I'm fine with that.

But just as I thought I had a good basis to work from and was ready to start doing the creative work of trying to make it sound good/interesting a significant amount of the time, I ran into performance problems.
With more complex MIDI files, it would lag and stutter the audio, which made it very unpleasant to listen to.
You could pause it and wait for it to finish, and playing it back it would sound fine, but if you downloaded and played it in VLC, it would actually include the stuttering. So not only is it painful to listen to, it's inconsistent, and can give you a false sense of sonic security. You might save something and then come back to it months later and be like, "I thought this one sounded really cool! This sounds like shit...", or going thru a collection, maybe just "this is a bad one, delete."

There are several potential solutions to this:

  • Limit the number of MIDI tracks; I've tried this and it kinda helps but not always, because there could be a lot of notes in one track
  • Use Firefox? Firefox seemed better, but I forget how much
  • Order the notes before scheduling them? I don't know if this matters
  • Use a scheduler loop instead of scheduling all the BufferSourceNodes at once; this is the most likely useful solution
  • Use OfflineAudioContext and just ditch the playback-while-generating feature
    • Maybe could render a short preview with normal or offline AudioContext, but for the full song and download use OfflineAudioContext
  • Rewrite in some other language that has a decent audio processing ecosystem, where I would have more control over the performance

It's a bummer tho, and since this is a project for fun, well, I might just not work on this again. Who knows.

I do have ideas for a further rewrite tho.

I think it would be a lot cooler to use video for the sources. Check out the video branch for a prototype of this.

  • Don't need as many audio providers. All the major search engines have Video as a search category.
    • So just crawl the web for videos right? Might actually need lots of code for scraping videos from various websites, idk.
      • Well, there's youtube-dl which works with various sites actually; that would probably be a huge boon!
  • I've had this idea of pairing audio effects with corresponding video effects, like:
    • a muffling sound with a blur
    • volume = opacity
      • fading in when fading in
      • fading out when fading out
    • a grunge effect = increased contrast maybe
    • pitch could be indicated with Y position, or with scaling (lower = larger, higher = smaller)
    • delay = delay
    • merging audio sources = simply averaging pixels, or perhaps something more additive
    • reverb convolution is applied to pixels over time, to produce a visual echo
    • bit crush = reduce bitrate of video
    • other things could shift color channels, skew the video, etc.
  • Use closed caption data and speech recognition to find where words are uttered, and combine them together into sentences/lyrics
    • This way you could search for a famous line in a movie, and get it to sample that actual line and not just that scene
    • If there aren't (good) results for the whole phrase the user enters, search for sub-phrases and words and try to find where those are uttered in videos, to splice them together in a Frankenstein's monstrosity
    • Just generally sampling from whole words probably makes things more interesting/pleasant, if we can do at least word boundary detection

Also, it would be good to make song generation reproducible, and maybe export to a known DAW / video editor format for human editing.

Externalities for reproducibility:

  • Search is completely unreliable - new content appears, search algorithms change etc. - but the results can be simply cached
  • Fetching URLs is unreliable, but audio/video content is usually static
    • Internet access can be unavailable
    • Videos can be taken down
    • Sites can go down
    • Videos can be edited, which is likely to ruin timecodes
    • Content-addressable storage would be better, but might not be practical
    • Caching media can work but it's a lot heavier than a list of search results; video can take up a lot of space
  • (Random decisions can be made pseudorandom easily)

Project Structure

Written in CoffeeScript with Socket.IO.

The code is split into server/ and client/.

server/server.coffee is the main entry point for the server.

server/gather-audio.coffee collects audio sources, using several audio providers located in server/audio-providers/)

client/app.coffee is the main entry point for the client app.

client/song.coffee generates the songs.

Fontello is currently used for some icons representing the audio providers but this could change to using favicons and be simpler. Workflow: drag config.json to Fontello, update font, download zip and replace fontello/

License

MIT licensed, see LICENSE.

Development Setup

  1. Clone the repository
  2. Install Node.js if you don't have it already
  3. Run npm i in a terminal/command-prompt to install dependencies
  4. npm start to start the server
  5. Wait for it to say "Listening on http://localhost:3901" and open that URL

Configuration

Copy template.env to a new file called simply .env

SoundCloud

To enable the SoundCloud audio provider, add a SOUNDCLOUD_CLIENT_ID key the .env file. You'll need to get a client ID somehow.

FileSystem

To enable the filesystem audio provider, set FS_enabled to true in gather-audio.coffee, and add an AUDIO_SOURCE_FILES_GLOB key the .env file, e.g.

AUDIO_SOURCE_FILES_GLOB=C:\Users\Isaiah\Music\**\*.mp3

It must be a full path. % or $ variables are NOT supported. Uses glob syntax.

Deployment Notes

No longer deployed, as Heroku has removed their free tier.

Problems and Suggestions

Open an issue!