Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I want to import a part of music data. #1386

Open
tmbops1991 opened this issue Aug 13, 2022 · 6 comments
Open

I want to import a part of music data. #1386

tmbops1991 opened this issue Aug 13, 2022 · 6 comments

Comments

@tmbops1991
Copy link

Hello, I have a question. I am an excited user of this library.
Is there any way to specify a duration for the audio file to be loaded, like the python library librosa.load?
If not, is there any way to add this functionality?
We would like to use the above method to reduce the amount of data processing when decoding a song.

@hughrawlinson
Copy link
Member

Hi! Meyda operates on the buffers you give it, so if you load the audio from disk in your code and pass the appropriate buffers, that’s what Meyda will calculate the features on. Or are you looking for this feature in the CLI?

@tmbops1991
Copy link
Author

tmbops1991 commented Aug 13, 2022

Thank you for your reply. It is very encouraging.

The execution environment is not CLI.
My execution environment is a bit special.
I am running within Max8 node.js in Ableton.
It is called Max for live.

My goal is to do DeepLearning while DJing. To do this, I am creating an environment for music analysis. Then I came across this library.
When doing DeepLearning, I want to convert the music for input into an image.
I want to MFCC the music and convert it into an image.

I would like to build this environment, but I am encountering some problems.

  1. we are doing Meyda.extract(['mfcc'],buffer) using the node wav library you recommend, but music longer than 45 seconds takes too long to decode and the process does not finish.
  2. decode by AudioContext.decodeAudioData is not possible, ERROR: not power of two occurs.
  3. why is there no error when using node wav, but when using AudioContext.decodeAudioData, an error occurs and I cannot proceed?

For 1, we think we can solve this problem by specifying some time periods in the music and decode them.
As for 2 and 3, I have spent hours exploring this git forum trying to solve it, but can't. I have changed Meyda.buffersize and tried many times, but to no avail.
I think the story on this link is applicable to my current problem.

I would really appreciate any good advice you can give me.

I'm sorry if my English is not good enough to read it.
Translated DeepL

@hughrawlinson
Copy link
Member

hughrawlinson commented Aug 14, 2022

Sorry that the error you encountered wasn't clear enough! The issue is that the length of your buffer always needs to be a power of 2. So, if you can cut your audio into a length of that size, you will be able to use this method.

An important piece of context is that for the purposes of audio analysis, you can add silence to the start and end of your buffer and you will and up with the same result - so if you have 45 seconds of audio at a CD quality sample rate, you have 45*44100 samples (buffer length 1,984,500). The next power of 2 is 2^21, 2,097,152. This means that you can add 2^21-45*44100 zeros to the end of your signal and then you will be able to extract audio features for this audio.

Another approach would be to take your 45 seconds of audio and cut it into chunks, then extract the features from these chunks and take the average result as representative of the whole sound. For example, splitting the signal into chunks that are of length 2^16 would result in chunks that are roughly 1.5 seconds long. An additional benefit of this approach is that you will have information about how the audio features change over the course of the sound, which you can use in machine learning applications as a feature in and of itself.

I hope that helps, let me know if I can clarify further!

@tmbops1991
Copy link
Author

tmbops1991 commented Aug 15, 2022

I would love to know how to split audio into chunks, decode only the specified chunk portion and pass it to meyda. how do I chunk it after reading in fs?

@hughrawlinson
Copy link
Member

If you had an array that was 11 elements long (like an audio buffer with 11 samples), and you wanted to chunk that buffer up into arrays that are 4 elements long, you could do something like:

const myArray = [0,0,0,0,0,0,0,0,0,0,0];


// array is the buffer, and n is the size of the chunks
function chunk(array, n) {
  // Copy the array to avoid modifying the original
  let myArrayCopy = [...array];

  // Create a second array to store each of the chunks
  let chunks = [];
  // While there are still enough elements in the array to create a chunk of the right size...
  while (myArrayCopy.length >= n) {
    // Take the first n elements, remove them from the array, and push them as an array onto the chunks we'll return
    chunks.push(myArrayCopy.splice(0,n));
  }

  // Here you may be left with some remaining samples. You can decide whether to discard them, to add zeros
  // to the end of the array as I described in my previous comment, or to return an incomplete chunk - whatever is
  // appropriate for your code. Just make sure to only pass buffers to meyda of the correct length.

  return chunks;
};

That code would return an array as follows:

const chunkedArray = chunk(myArray, 3);
console.log(chunkedArray);
[
  [0,0,0],
  [0,0,0],
  [0,0,0],
]

So now you end up with multiple buffers representing a shorter part of the signal. To go back to my example above, if you have an audio recording that you loaded from disk that is 1,984,500 samples long, you can chunk that array of samples by doing chunk(buffer, Math.pow(2, 16)). This will leave you with an array of 30 new buffers, each of which is 65536 samples long (about 1.5 seconds of audio each). There's a remainder of 18420 samples at the end of the original recording. You can choose to either discard these samples or to add 47116 zeroes (approximately a second of silence) to the end of your signal to pad out the buffer to contain the same number of elements as the other buffers.You can then run each of these through meyda, because the buffer size of each is a power of two. You can then take the average of the resulting audio features, and take that to represent your original recording.

@SUDDSDUDDS
Copy link

Okay that's okay I'll let you know if I want it lol

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants