Convert pixel data to a wav file
./img2wav 96000.0 2.0 in.jpg out.wav
mkdir build
cd build
cmake ..
A single sine wave of a given frequency over the interval t can be generated as follows:
where A is the ampltiude of the wave,
For instance an A440 note at 96000hz is:
This generates the amplitude over time graph:
And the corresponding spectrogram (frequency over time)
Notice how we have just one frequency existing at 440Hz.
We can generate multiple frequencies in the spectrogram easily:
So to generate a 440Hz, 1200hz and 765hz wave at 96000hz:
Now that we know how to generate frequencies, we need to determine which frequencies to draw and how much of each should exist. To do this we create a heatmap by gray scaling an image and mapping the [0, 255] color value to an amplitude (A) in [0.01, 0.1]. Images usually contain 3 color values per pixel. In order to generate a heat map for our image we'll convert these 3 color values to a single color using the gray scale Luma algorithm.
pixels[y][x] = r * 0.299 + g * 0.587 + b * 0.114;
Once we have our pixel data prepared we need to determine how large the image should be.
The height of our image is going to be constrained within the frequencies displayed in our spectrogram. And the width will be evenly spread out over however many seconds we want to transmit the data for. Here's some code to explain this.
time_s = 2 # length of image in seconds
max_height = 96000 # spectrogram height in hz
scale = max_height / height # linearly scale our frequencies
target = (sample_rate * time_s) / width # width in time for each pixel
for x in range(0, width):
for y in range(0, height):
heat = map(pixels[y][x], 0, 255, 0.01, 0.1) # [0, 255] -> [0.01, 0.1]
t = 0
while (t < target):
freq = y * scale # or (height - y) * scale to flip the image upside down
output += heat * sin(2 * pi * (freq / sample_rate) * t)
t += 1
Audio data is meant to be within the range of [-1, 1] and our process of summing frequencies may put us out of this range. A quick and dirty way of normalizing the input is to divide the audio data by the absolute maximum value.
def normalize(data):
return data / np.max(data)
A simple python script can generate a spectrogram from our audio files
import matplotlib.pyplot as plt
from scipy import signal
from scipy.io import wavfile
sample_rate, samples = wavfile.read("out.wav")
frequencies, times, spectrogram = signal.spectrogram(samples, sample_rate)
plt.pcolormesh(times, frequencies, spectrogram)
plt.imshow(spectrogram)
plt.ylabel("Frequency [Hz]")
plt.xlabel("Time [sec]")
plt.show()